AI and Data Strategy

Photo by Yancy Min on Unsplash

Artificial Intelligence is a hot topic, even outside of tech companies.

Against this backdrop, Dr. Hugo Bowne-Anderson hosts episode 1 of what promises to become a monthly series of webcasts titled: How AI Can Improve Your Data Strategy for DataCamp, the on-line Python, SQL, and R coding school. They appear to be pivoting their focus towards business enterprises, from individual students and universities.

Indeed, employees in many business roles today need a degree of competence in the emerging fields of Data Science, Machine Learning, and Artificial Intelligence to understand when to use the tools, when not to use them, and how to talk the talk with data professionals. As early as 1962, Dr. John Tukey, mathematician and inventor of the term “bit,” called all of these roles “practitioners.” As of 2020, Data Science tools are now central to the decision maker’s toolkit. Some business executives are particularly prone to demanding AI for most everything. There certainly is opportunity in broadening these skill sets across the enterprise.


In When Translation Problems Arise Between Data Scientists and Business Stakeholders, Revisit Your Metrics, Katie Malone aptly remarks that

“the metrics which quantify outcomes are generally very different for data scientists and business stakeholders, making it likely that each side struggles to understand and speak in terms that are familiar to the other side.”

A lack of data acumen across the business is a central challenge faced by consumers of analytics. Undergraduate education for all majors in recent decades has certainly not kept up. Everyone at any level, C-level to new employees, should be looking into data the same way that you were expected to start using e-mail 25 years ago.

Recognizing the need, DataCamp offers a 4-hour interactive video-based series Data Science for Business that is a good start. Get ahead of the curve and learn a few basics now. In addition, many companies are organizing with multiple business units reporting to the CEO, along with a standalone digital AI unit, where the company can matrix in AI talent to different divisions to drive widespread education and cross-functional projects.

AI hubs


A Robust and Principled Data Strategy

“A real strategy involves a clear set of choices that define what the firm is going to do and what it’s not going to do.”

Harvard Business Review in late 2017 published a critique titled Many Strategies Fail Because They’re Not Actually Strategies. Indeed, many of these processes fail because the firm does not have something novel and worth executing.

Consultants come in, do their work, and document the new strategy in a PowerPoint presentation and a weighty report. Town hall meetings are organized, employees are told to change their behavior, balanced scorecards are reformulated, and budgets are set aside to support initiatives that fit the new strategy. And then nothing happens.

Many Strategies Fail Because They’re Not Actually Strategies

Critics are quick to point out that strategy is not just a top-down process. Successful strategy execution is seldom a one-way trickle-down cascade of decisions.

Especially with data, a robust, principled approach will make change the default. People’s habits in organizations are notoriously sticky and persistent. Habits certainly don’t change by telling people in a town hall meeting that they should act differently. People are often not even aware that they are doing things in a particular way and that there might be different ways to run the same process.

For a successful strategy implementation process, put the default the other way around: Change it unless it is crystal clear that the old way is substantially better. Execution involves change. Embrace it.


Data Strategy is a Function of Business Strategy

Two insights from What’s Your Data Strategy?

More than 70% of employees have access to data they should not, and 80% of analysts’ time is spent simply discovering and preparing data.

On average, less than half of an organization’s structured data is actively used in making decisions—and less than 1% of its unstructured data is analyzed or used at all.

Having a Chief Digital Officer and a data-management function is a start, but neither can be fully effective in the absence of coherent strategy for organizing, governing, analyzing, and deploying information assets throughout the enterprise. Though the “plumbing” work may not be as sexy as predictive models and dashboards, they are vital to achieving high performance decision making.

The authors present frameworks of defensive and offensive strategy elements. For example, tight security and a single source of truth are defensive objectives. Determining an organization’s current and desired positions on the spectrum force leaders to make trade-offs.

Data-Strategy Spectrum


What is Data?

Data is not a holy grail. The word data translates literally to “things given.” It is an active process of collection and action. Nothing illustrates this better than Monica Rogati’s AI Hierarchy of Needs

AI Hierarchy of Needs

If the goal is AI functionality at the top, then at the bottom the organization needs a very robust collection and storage foundation. Transformation, aggregation, and labeling are then built on that. Growth with online experimentation and machine learning becomes possible, but deployment to customers really requires the disciplined orchestration of all of the components of the triangle. The most impactful and effective AI strategies will stand on the shoulders of robust data science capabilities.


What Is AI?

  • Data science produces insights

  • Machine learning produces predictions

  • Artificial intelligence produces actions

Andrew Ng, in his superb AI Transformation Playbook, speaks to a Virtuous Cycle, where better product yields more users, more users yields more data, and more data yields a more refined and better product.

Virtuous Cycle of AI

The phrase AI is often applied as a catch-all to the creation of systems capable of making intelligent decisions. There is a distinction to be made, between Artificial General Intelligence (AGI) and narrow AI.

Narrow AI is built to achieve a specific task, like language translation. AGI is the intelligence anticipated by the Turing test, requiring that a human being be unable to distinguish the machine from another human being by using the replies to questions put to both. We are a long ways off from AGI.


What is Data Science?

Making discoveries and creating insights from data and communicating these insights and discoveries to non-technical stakeholders

Christopher Berry writes in his blog Insights Revisited that to be an insight, it must always be:

  • New information

  • Executable

  • Causes action

  • Profitable

Succinctly, an insight is a piece of information that you didn’t know before, which can be feasibly executed, culturally acceptable and of a scale relevant to the firm, and causes a decision to be made that wouldn’t have been made otherwise, and results in profit or a sustainable competitive advantage.

The measure of success for data science is the creation of insight.


What Data Scientists Really Do, According to 35 Data Scientists

Hugo Bowne-Anderson interviewed prominent data scientists across a wide array of industries and academic disciplines and asked them about what their jobs entail. They describe a wide range of work, including the massive online experimental frameworks for product development at booking.com and Etsy, the methods Buzzfeed uses to implement a multi-armed bandit solution for headline optimization, and the impact machine learning has on business decisions at Airbnb.

At least in the tech industry, data scientists lay a solid data foundation in order to perform robust analytics. Then they use online experiments, among other methods, to achieve sustainable growth. They build machine learning pipelines and personalized data products to better understand their business and customers and to make better decisions. In other words, in tech, data science is about infrastructure, testing, machine learning for decision making, and data products.

Great strides are being made in industries other than tech. All understand that working practitioners make their daily bread and butter through data collection and data cleaning; building apps and dashboards; statistical inference; communicating results to key stakeholders; and convincing decision makers of their results.

Many are skeptical not only of the fetishization of artificial general intelligence by mainstream media but also of the buzz around machine learning and deep learning.

A recurring theme is that skills, so necessary today, are likely to change on a relatively short timescale. The key skills are not the abilities to build and use deep-learning infrastructures. Instead they are the abilities to learn on the fly and to communicate well in order to answer business questions, explaining complex results to nontechnical stakeholders.

We’re approaching a consensus that ethical standards need to come from within data science itself, as well as from legislators, grassroots movements, and other stakeholders. Part of this movement involves a reemphasis on interpretability in models, as opposed to black-box models. That is, we need to build models that can explain why they make the predictions they make.

Access to insights is not the same as access to raw data. The needs of the business are served when the work delivers insights. As a result, Data Science practitioners find themselves involved to ensure integrity all of the way from collection through to decision making.

What Data Scientists Really Do


What is Machine Learning?

The science and art of giving computers the ability to learn from data without being explicitly programmed

Examples are everywhere. They include your e-mail spam filter (supervised) and document clustering (unsupervised). They include Siri and Alexa and Facebook’s image recognition. Each is given a general math model space, and the algorithm improves as more data arrives.

On machine learning, Dr. Tom Mitchell of Carnegie Mellon:

Machine Learning Operational Definition

The vast majority of business problem applications of machine learning are supervised, leveraging labeled training data. Sources of training data include historical data, experiments, and even crowd-sourcing (like ReCaptcha, Facebook, & Amazon Mechanical Turk)

Across industry verticals, machine learning is widespread. It is used by quants in financial trading, and for detecting network breaches in cybersecurity. It is used widely in health care, like in epidemiology for disease diagnosis. Machine learning is used in transport and hospitality, at Uber and AirBnB. It is the recommendation engine behind Netflix. One day soon in the insurance business, there will be Damage Inspection with AI - Automating Claims Processing from mobile phone images.


Ethics and Bias

The impacts of AI decision tools can be both desirable and undesirable. More recent headlines:

Amazon scraps secret AI recruiting tool that showed bias against women

The Algorithmic Colonization of Africa

Google Plans Not to Renew Its Contract for Project Maven, a Controversial Pentagon Drone AI Imaging Program

Given the many ethical challenges that are possible, every enterprise with an AI strategy should also focus on widening the diversity of the pool of practitioners and include proper review for ethics.


Foundations of a Successful Data Strategy

In a 2019 DataFramed Podcast Taras Gorishnyy of McKinsey guides his clients to secure:

  • Executive Support

  • Analytics Vision

  • Build the Data Foundations

  • Distribution of Skills & Establish Data Culture

  • Establish Impact of Analytics Early in the Process

All 5 components must be orchestrated together.


How are Firms that are Facing a Shortage of Talent, Going to Fill their Requirements for Data Scientists?

There is a certainly huge bottleneck today in finding qualified practitioners through hiring. The best workaround in filling the data skills gap is that company culture has to embrace learning with the experienced professionals in place already to deliver the business results.

Again, the tools are likely to change. The key skills are not the deep-learning algorithms. They are the abilities to learn on the fly and to communicate well in order to answer business questions, explaining complex results to nontechnical stakeholders.

Did you find this page helpful? Consider sharing it 🙌

Engineer and analyst