Is This Helping?

TLDR: A framework for making data science useful.

Data is the new oil. AI is the new electricity. Great!

Unrefined, oil is just a messy goo which suffocates all living things which come close. Electricity, if not harnessed properly, can provide some shocking surprises. Neither is all that helpful in and of itself.

Useful data science can be defined narrowly as the application of statistical and computer science methods to data to provide actionable insights. Without the word “actionable,” data science can be a lot like an oil spill or an electric shock. Plenty of mess and noise, very little value.

In the 3 layer decision architecture, the action or decision, is represented on the right side. This is the change which is to be made based on the decision. It is the motivating purpose for the whole architecture, and should be the starting point for any project.

While the Decision Architecture flows left to right, planning should flow right to left.

Simon Sinek has carved a place for himself next to Dale Carnegie and Peter Drucker with a simple, but far too often neglected concept called, “Start With Why.” The advice that Sinek first shared in his 2009 TED talk applies to individuals, organizations, and projects. Without the core motivating “why” it is easy to lose the cohesiveness of the corresponding outer layers. Once the “why” is well understood, the “how” and then the “what” come together.

Start with Why for personal leadership as well as for building useful data science

Nothing else really matters if the Why is not aligned to the stakeholders of the decision process. Refining oil, fueling into a car, and starting the engine do nothing to help if you are lost without GPS or a map. Using your electricity to power a laptop for writing a blog post don’t do much if you don’t have anything to say. 🤘 Meta!

Solving for the “Why” of an analysis requires humility and patience. The role for Product Management in organizations has significantly matured over the past ten years to solve for “why” and then how and what in the development of technical products. Product managers have learned that progress is not about being the most brilliant technologist ushering in the next big thing from some skunkworks. Nor is it about taking dictation direct from the same consumers to build exactly what they ask for. The design process is a delicate balance between expertise and empathy.

“Some people say, “Give the customers what they want.” But that’s not my approach. Our job is to figure out what they’re going to want before they do. I think Henry Ford once said, “If I’d asked customers what they wanted, they would have told me, ‘A faster horse!'”

Steve Jobs – established Apple’s “Why”

It is useful to deploy the though model of “problem space” and “solution space” The Lean Product Playbook. This framework helps to align the “Why” for decision projects.

Problem space is the set of needs that stakeholders of the decision have, what kinds of impacts decisions can make, and what value the automation provides.

Solution space is all the ways that a decision process could make use of the available resources.

Not all possible solutions map to useful problems. Some problems can’t be solved by the project team implementing a new decision process. It is good to acknowledge these limitations up front to manage expectations and plan time for discovery. The more innovative the effort, the less well defined these spaces will be. Failure is to be expected and encouraged.

Most of decision architecture sits on the left, as it describes a solution. But if the output decisions and resulting actions aren’t in alignment with the problem space, the data science project will not be useful.

When learning about the solution and product space for the first time, projects should be a portfolio of small experiments. Each “sprint” is looking for possible connections in the alignment space. Many will fail. It is far better to fail fast, early, and in expensively.

Product managers understand this, and seek to continuously learn more about the problem space and guide the selections of product features in the solution space to align with useful problems. When this works, great value can be unlocked for all stakeholders. When it does not work it leads to results like the Ford Edsel, New Coke, and Google Wave.

The disciplines of actionable data science and product management are closely related. Innovating a new solution with data and technology depends on a thoughtful mapping of problem to solution. Choosing features, model architecture, or data structures all require domain expertise, and hyper-sensitivity to the “why” of the problem being solved.

Exercises such as asking 5 whys, considering the analytics hierarchy of needs, and treating projects like a Lean Startup can help. Start with “why” and go forth with decision architecture for the right reasons.

Analytics’ Hierarchy of Needs

There’s a hierarchy of needs when it comes to delivering a product or service. Just like Maslow’s Hierarchy of human needs, the items at the top are not important if the bottom isn’t first provided.

This mental model can apply in lots of variations. Here is an example using analytics and actionable data science:

1) Is the solution going to be accessible to the people who will need it?
2) Does it make things better and provide value in some way?
3) Does it work on every relevant variation of the problem?
4) Is it faster than the status quo or other options?
5) Is it easy to use?

Using this model, you can justify building something that is hard to use and slow, if it is still accessible, valuable, and relevant.

But if you can’t show the value or provide the outputs to the right stakeholders, it doesn’t matter fast or easy the solution is.

Tell Me Something New

TLDR: A few things to think about when thinking about adding new data to your decision process.

Everywhere one looks there are thought pieces about more data, big data, hot data, and fast data. More does not necessarily equal better. It is critically important to start with a thorough understanding of why data. In a decision process, what actions are being chosen between and what are the implications, stakeholders, success factors, and risks associated with bringing new data into the process.

Not all data contains something useful or unique. It is quite normal for most data to be unhelpful. This may be part of why “big data” solutions seem to be on the downward slope of the hype cycle. While the technology to store anything and everything has improved, the uses of all that data are not expanding at the same extreme rate. This has led many a “data lake” to turn into “data swamps”

When considering a new dataset for some decision process, it will be important to thoroughly examine it for a number of issues:

  • Is it from a trusted source?
  • Are there gaps in the data which must be ignored or treated in some way?
  • How many transformations has it undergone up until this point
  • Could a more “pure” version be sourced further upstream?
  • Does it cover the full population, proportionately?
  • Will the information in this data be useful under all decision circumstances, or only in certain segments?
  • How does this information link to data already available to the process?
  • Are the values providing information that can’t already be derived? (E.g. If I know birthdate, I don’t need a data field for age)

Linking data can be a difficult task, particularly when sourced from different places or contexts. Transformations and assumptions are necessary to bring data together. Observations about individuals need to map to their associated accounts. Observations about events in time need to map against information that remains static. Locations can be expressed in a number of different ways, and may require third datasets to map together (e.g. latitude and longitude mappings into regional boundaries). Each change of data away from its native structure poses the risk of introducing new errors.

Even without errors from mappings, data can be collected in specific contexts that are not relevant to the analysis at hand. Sensors can be calibrated differently. Forms can have pre-filled answers. Values can be aggregated or summarized in ways that obscure other useful details. Skepticism must always be deployed to detect suspicious patterns and ensure that data which may contain new information is included properly.

Once that data has been acquired and linked, it can still be of questionable value. For example, values which are perfectly correlated to each other provide no new information. Values which vary completely randomly from the target decision metrics may also convey little or no use. Exploratory data analysis should always be a part of the process, and analysis like principal component analysis can help clarify the value of the new fields.

After all of the above, it may be worth also considering if the new information is actually worth the effort. That worth can be evaluated by considering the motivating purpose for the analysis. While it is fun to create new data analysis, sometimes proven alternatives already exist. Sometimes a vendor can provide the dataset in a clean, consistent fashion better than your internal efforts. Data vendors have an economic incentive to do a good job with data collection where other parts of your organization may not.

As an example, FICO scores used in credit risk analytics are themselves the results of many transformations and calculations which factor in income, credit history, outstanding debt, and other credit risk factors. Lenders use these scores because they do not have the same breadth of information available to calibrate all people in a country and apply a rating. Lenders only have data about their active customers in their accounts, not anything about what happens at competitors or other kinds of financial institutions. A credit score is not a perfect measure, and can be improved upon by the lender by using in-house data. But it is a very good baseline.

Decision Architecture

Decision Architecture

Announcing the start of a new series of posts relating to making data science actionable. I’ll to go deeper into how data goes from a repository of “stuff” into a system and process that uses the best of what technology is available to make an action happen. This group of related posts will be tagged #DecisionArchitecture.

We will explore practical applications of data and analytics and share how all the pieces fit together. It will start with people and with organizational strategy and culture. Topics will also touch on processes which direct the products, services, and technology deployed around taking action from data. Sometimes posts will be technical and sometimes anecdotal, but they should all tie to the theme of “Making Useful Data Science Happen.”

Topics to cover (an evolving table of contents):

  1. Define Decision Architecture
    • Framework for making Data Science useful
    • Rules-based and rule-free decisions
  2. Case studies
    • Examples of well-designed decision architectures
    • Real-time fraud detection
    • Credit decision models
    • Advertising examples
    • Examples of decision architecture failures
  3. Components
    • Databases
      • Event-driven architecture
      • High speed / in-memory databases
      • Slow speed / “big data” storage
    • Technologies
      • Hardware
      • Modular design
      • APIs
      • Containerization
      • Lambda / Kappa Architecture
    • Tools & Vendors
  4. Processes
    • Data governance
    • Model management
    • Product management
  5. People
    • Key skills
    • Roles and organizational structure
  6. History
    • Early decision models
    • Innovations
    • Key people and organizations
    • Thoughts on the future

This series will be a place to share notes and collect content around these related topics. Please share your thoughts and comments to help make #DecisionArchitecture useful, interesting, and entertaining!