Predictions for 2019

It’s that time of year again. Time to drive stakes in the ground about the year ahead.

Looking Back

First, a brief look back to see if last year’s predictions aged well. You can read them here. Recap below.

My predictions come with a lifetime guarantee: if you are not completely satisfied with them, I will return to you, in cash, the amount you paid to read them.

Data Science Matures. It’s hard to measure data science maturity, which makes this an excellent topic for predictions. A sign that the Wild West days of data science are over: growing interest in collaborative data science platforms. Platforms like Cloudera Data Science Workbench, Dataiku Data Science Studio, and Domino Data Science Platform are doing well in the marketplace. There is enough commercial interest that Forrester now evaluates these tools separately from tools that target “citizen” data scientists.

Automated Machine Learning Gets Real. You might think that last year I was plugging DataRobot, my current employer. It’s the other way around. I wrote the prediction before joining DataRobot.

Here is a list of vendors in the data science and machine learning space who claim to offer automation:

  • Amazon Web Services
  • Ayasdi
  • BigML
  • Dataiku
  • Databricks
  • DataRobot
  • Google Cloud Platform
  • IBM
  • Microsoft Azure
  • RapidMiner
  • Salesforce
  • SAP
  • SAS
  • SparkBeyond
  • SparkCognition
  • Willis Towers Watson

That’s not a complete list by any means. As of this writing, there are 3,808 companies listed in Crunchbase in the Machine Learning category. Many of these claim to automate the process.

Some of those claims are BS. But the proliferating message tells you two things about automation. First, people who say that you cannot automate machine learning are losing the argument. Second, the idea of automation resonates with customers.

Data Scientists Discover GDPR Applies to Data Scientists. Much of the current interest in interpretability stems from a perception that GDPR mandates explainable models. It actually doesn’t do that — there is much nonsense written about GDPR — but it’s fair to say that a lot of hair caught fire on May 25.

Much of the nonsense about GDPR stems from the fact that GDPR itself sets out broad aims, not compliance detail. GDPR designates the European Data Protection Board (EDPB) as the compliance authority. EDPB is still getting up and running: looking for office space in Brussels, shopping for furniture, checking out the Moules Marinières at Chez Leon, figuring out how to give directions to the loo in 24 languages.

So far, how many binding decisions have EDPB issued? None. Aucun. Keiner. Nessuna. Ninguna. Nenhum. Geen. Nici unul. Żaden. Žádný. Jih ni. Nijedan. Bat ere ez. Ingen. Puudub. Κανένας. Níl aon cheann. Nė vienas. Нито един. Nav. Nikto. Egyet sem. Ei mitään.

In other words, you may think you know what GDPR means for data science practice, but you don’t. So STFU.

Update. A previous version of this blog included an incorrect translation of “None” in Hungarian. We apologize for the error, blame Google Translate, and note that the food in Budapest is really good.

Cloud, Blah, Blah, Blah, Blah…. Yeah. That was a layup.

IBM: Four More Quarters of Decline. Oh, Wait… After a brief flirtation with revenue growth in Q4 2017, IBM returned to its declining ways in Q1 2018. I’m sure that the 18% increase in Q4 2017 sales immediately followed by a 15% decline in Q1 2018 sales was purely happenstance and not the result of IBM sellers gaming 2017 sales.

Note that IBM’s hardware business is its strong point. In Q3 2018, revenue for “Cognitive Solutions,” including all of the Watson stuff, declined 6%. Technology Services and Cloud Platforms declined 2%.

Looking Ahead

So, what’s coming in 2019? Here are three predictions. (A reminder that I work for DataRobot. My bias is obvious. Opinions are mine, not DataRobot’s.)

AI strategy moves to the front of the queue. What’s the most significant constraint holding up investment in AI? Hiring skilled people, right? Wrong. According to McKinsey, the biggest obstacle to AI adoption is the lack of a clear strategy.

Big whoop, you think. A company that sells strategy says you need more strategy. My local fish merchant says I should eat more herring.

But McKinsey is right. Many organizations simply do not know what they want to do with AI.

Moreover, the absence of strategy complicates hiring. Do you need more data engineers or do you need folks who understand the ins and outs of recurrent neural networks? You can’t possibly know how to staff up if you haven’t decided what you’re going to do.

Chief AI Officers Emerge. Who owns AI in the organization?

Some folks think that the Chief Data Officer should own AI. That’s a terrible idea.

Data is like office furniture. You need data for AI. You also need furniture, so your data scientists have a place to sit. But nobody talks about putting the office manager in charge of AI.

Put an executive in charge of X, and you get more X. CDOs want your business to hoover up every speck of data and store it in cavernous data warehouses or vast data lakes that nobody uses. They measure success in petabytes.

Hey, look at all these petabytes.

My data lake has more petabytes than your data lake.

As if you need more petabytes. You don’t use 80% of the petabytes you already have.

Put a CDO in charge of AI, and she’ll spend her time trying to prove that AI justifies her previous investment in a vast data lake. Hey, look. Our new AI says people buy beer and diapers together. Isn’t that special? We would not know that if we didn’t have this data lake.

By the way, the “beer and diapers” story isn’t apocryphal. In 1992, a Teradata team analyzed 1.2 million market baskets from 25 drug stores. They discovered that between 5:00 p.m. and 7:00 p.m. customers purchased beer and diapers together. Teradata spent the next ten years touting that discovery at trade shows.

The most important part of the story is the part Teradata didn’t talk about: the customer never did anything with that “insight.” The customer, it seems, didn’t give two fucks what people put in their baskets as long as they paid.

In every large database, there are millions of patterns. Of these, people care about a select few.

You need a Chief AI Officer tasked with building AI that people actually use. Put an executive in charge of AI adoption, and you will get more AI adoption. This is a good thing.

Democratization gets real. Remember way back in 1994, when ISL introduced Clementine? Clementine was the first machine learning package to offer a drag-and-drop icon-based user interface. (SPSS acquired ISL, and eventually rebranded the software as SPSS Modeler. IBM acquired SPSS, and rebranded it as Watson Machine Learning.)

SPSS Modeler and SAS Enterprise Miner were supposed to democratize data science. Same for Alpine, Alteryx, BigML, Dataiku, IBM Watson Analytics, IBM Watson Studio, KNIME, Microsoft Azure Machine Learning Studio, Predixion Software, RapidMiner, and Statistica. With so many drag-and-drop tools on the market for so many years, data science must be entirely democratized by now.

Oh, wait. It isn’t.

Analysts who keep predicting the triumph of citizen data scientists ought to pause and ponder why not.

First, people have to want to do data science. It might surprise you to learn that not everyone cares about machine learning. A friend and former colleague is a senior executive at one of the larger consumer banks in North America. She plays a crucial role in risk management and credit policy. She doesn’t tell her friends that the bank needs more LSTM neural nets for text analytics, because they’re so much better than bag-of-words.

Second, making something easier doesn’t necessarily end the need for specialists. We’ve had practical robotic surgery for fifteen years. That doesn’t mean I’m going to rent a surgical robot at Home Depot and remove my own gall bladder if I get acute cholecystitis. I’ll go and see Dr. Cutter at Mass General. If she uses a robot, that’s her business.

Third, people who are serious about putting AI to work in your organization don’t care whether data science tools are “easy to use.” They will walk across beds of hot coals to get data they need and eat nails to tune a model. “Code-free” isn’t a benefit for people who learned how to code ten years ago.

AI democratization isn’t all about tools. It’s about your organization:

  • Redefining roles
  • Rethinking the workflow
  • Finding better ways to identify and prioritize AI opportunities

Democratization does not mean that the receptionist will build models between phone calls. It means that the receptionist contributes to an AI project that will improve customer call handling. People contribute to projects in many different ways. They don’t all need to know how to hyperparameterize a Generalized Linear Model.

Vendors that understand this will thrive. They will help customers define strategy, define roles, rethink the workflow, and identify AI opportunities.

Vendors that think democratization is all about software won’t thrive. They will continue to believe that if they can add one more icon to the drag-and-drop palette, data science will be totally democratized. They will wonder why customers don’t flock to their tools.

That is all.


  • Once heard at a customer meeting for one of the big players –

    “With this tool, the citizen scientist can build predictive models using drag and drop modules!”, proclaimed the egomaniac in charge of the wonder software.

    “We don’t want people mucking about in models. They are complicated and unless you know what you’re doing, you will wreak a lot of havoc! This is a bad idea!” responded a statistician from a major customer. Statisticians from other customers agreed.

    The egomaniac stood there, listened to the complaints, and remained silent. “This changes nothing!” he thought silently to himself. The development train rolled on as if the customers never spoke a word….

  • Statisticians don’t like other people muscling in on their business. Like doctors who don’t want nurses to give out aspirin.

    • If I remember correctly, they were noting that it’s important to know the strengths and limitations of each technique to ensure that the model generalizes appropriately.

      There are many fields that “look easy”, but in reality, require a great deal of background knowledge that isn’t always obvious or apparent. I found their arguments valid. I believe this particular company will be a footnote in history within 10 years, perhaps even 5.

      • Ensuring that a model generalizes requires an understanding of sampling and validation methods and not a detailed understanding of the algorithms themselves. Black boxes can generalize.

        Users need a detailed understanding of the algorithm to prepare data prior to model training. Fortunately, it’s possible to automate this. It’s aspirin, not brain surgery.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.