2018 in AI/ML
Well, 2018 is dead and gone. Time to take a look back at the year in AI/ML.
A reminder that I work for DataRobot. This is my personal blog. Opinions are mine.
On the Move
It’s hard to believe that Amazon Web Services introduced Amazon SageMaker just a year ago, but here we are. AWS moved aggressively to enhance the service with new native and partner capabilities. The service still targets the experienced AWS developer, but AWS can move upmarket if it chooses. Joyent CTO Brian Cantrill tweets:
Am waiting for the year that reInvent goes full Red Wedding, locking the doors and announcing that every attendee’s product or service is now a forthcoming AWS offering. Or maybe that was this year?
Reinforcement learning is among the new bits in SageMaker. If you want to push your toddler into an AI career, AWS announced DeepRacer, an autonomous model race car. Just in time for Christmas.
AWS also announced Amazon Forecast at re:Invent. Amazon Forecast is a managed service for time series analysis. As a global retailer, Amazon has a huge forecasting problem and skills to match. Marketing materials stress AWS’ deep experience in retail forecasting, a credential that would appeal to retailers if retailers wanted to do business with AWS. They don’t however, so AWS might want to shut up about its retailing chops.
Dataiku did not do well in Gartner’s 2018 MQ, dropping like a stone to the bottom of the Ability to Execute axis. Not quite the bottom: Dataiku did better than Teradata. “We beat Teradata” gives cold comfort when you note that Teradata deprecated Aster and exited the category.
To its credit, Dataiku responded to the issues that Gartner surfaced, adding a deployment API and containerized engines. Customer ratings in Gartner PeerInsights look strong, which bodes well for Dataiku’s position in the 2019 MQ.
Update: Dataiku announces a $101 million Series C round of venture capital. Iconiq Capital leads the round, with Alven Capital, Battery Ventures, Dawn Capital and FirstMark Capital also participating. Alven, Battery, and FirstMark all participated in Dataiku’s B round in 2017.
Databricks wants to be more than “the people who invented Apache Spark.” That’s a self-limiting value proposition: to the extent that Spark matures and stabilizes, customers are less likely to need Matei Zaharia to tune their RDDs.
In its first few years, Databricks was little more than Spark on AWS, and its main emphasis was on data engineering. Starting last year, the company pivoted towards the AI and machine learning market. This meant supporting a wider range of Java, Python, R, and Scala packages as well as Spark ML and Spark packages. Databricks also broadened its support for deep learning frameworks, including TensorFlow, MXNet, Keras, PyTorch, Caffe and Microsoft Cognitive Toolkit.
The pivot pays dividends. Earlier this year, Databricks scored a “Visionary” rating in Gartner’s 2018 MQ, thanks to its innovation and scalability. In June, Databricks introduced MLFlow for tool integration, experiment tracking, reproducibility, and model deployment. That’s a positive step towards an integrated data science and machine learning platform.
DataRobot (my employer) had a good year. The company released time series forecasting, model management, and model monitoring, among other things. In October, DataRobot closed a $100 million Series D round led by Sapphire Ventures and Meritech Capital Partners. Privately held startups don’t release financials, so the only way to tell which startups are going somewhere is to follow the money. Venture capitalists don’t back losers.
H2O.ai opened the champagne when Gartner released its 2018 MQ. Getting named to the Leader quadrant is a big deal, and it takes a lot of work to get there. Driverless AI is now available in all three cloud marketplaces. You can also install it on-premises on an NVIDIA GPU-accelerated box or an IBM “Minsky” server.
H2O.ai shipped 32 versions of Driverless AI this year, a fact that can be interpreted in more than one way. Agile sprints sound cool at hacker meetups, but enterprise customers don’t like to upgrade software every six days. Major product enhancements include GLM, time series, an alpha release of TensorFlow, and a text analytics capability that depends on TensorFlow, so I guess that’s in alpha too.
H2O World attracted “record audiences” when it played New York and London. It was the first time the show played those venues, so an audience greater than zero breaks the record.
You have to give Alteryx credit: they know how to bang the cash register and collect money. Alteryx sellers delivered 50% revenue growth and a thousand new logos each quarter this year. That’s a great track record.
However, the average starting sale is small, and account expansion is minuscule. Sooner or later you run out of new logos to land.
In AI/ML, Alteryx did little with the Yhat assets it acquired last year. This is not too surprising. When your flagship product runs only on Windows and you acquire software that runs only on Linux, it can take time to consummate the marriage.
Speaking of marriage, it’s an open secret that Alteryx plans to partner with H2O.ai. Alteryx CEO Dean Stoecker will keynote H2O World SFO in February. We’ll see how that works out.
Google just muddling through? WTF? Google deprecated its Prediction API earlier this year. The AutoML product line will support vision, speech, and translation if it ever gets out of the beta. And Cloud DataLab isn’t winning any prizes. So right now Google doesn’t have a horse in the predictive analytics race.
Nobody cares about TPUs unless you can do something with them.
IBM shoved IBM Watson Studio out the door in time to make the key analyst reports. Forrester awarded IBM a top rating in the revamped “Wave” for MultiModal Predictive Analytics, which is nice. Gartner’s MQ isn’t available until February, so we’ll just have to wait and see what Gartner thinks. Gartner gives more weight to customer feedback, which has been a dumpster fire for IBM the last few MQs. It’s the main reason Gartner kicked IBM out of the “Leaders” quadrant earlier this year. Oh, the shame of it.
In my view, IBM Watson Studio exemplifies what IBM does best: taking old things and calling them new things. The service seems like a quodlibet of existing services, with an additional splash of a new SPSS that looks strikingly like the old SPSS. It’s available in IBM Cloud, everyone’s last choice in cloud platforms.
Incidentally, IBM peddles the line that they are the biggest contributor to Spark’s machine learning library. They might want to reconsider using that factoid. In the past two years, I can’t think of a single new feature added to Spark ML. And data scientists surveyed by KDNuggets in 2018 were less likely to use Spark than those polled in 2017.
Mathworks seems to be doing a better job at analyst relations. I don’t have much to say about Mathworks. Like SAS, it’s the object of desire for a cult of users who will give it up when you wrench it from their cold dead hands.
Microsoft shuffled the executive chairs in AI/ML leadership. This could mean something or it could mean nothing. Microsoft made a splash a couple of years ago when it introduced Azure Machine Learning Studio and acquired Revolution Analytics. Since then, all seems quiet in Redmond. The company has dozens of products and services for machine learning and AI, all of which seem to be managed in silos. There’s no coherent product strategy that bridges cloud and on-premises computing, which is surprising given the market strength of Azure.
RapidMiner launched new enhancements for data prep, feature engineering, and automated model training. Even so, and despite endorsements from Forrester and Gartner, the company hasn’t landed new venture capital in more than two years. It’s unclear why this is so. Some sources attribute it to the history of litigation with TIBCO. Others think that RapidMiner’s adoring users of its free software aren’t willing to pay for what they use. You know, the classic “students and hoboes” problem that makes it hard to monetize open source software.
Perhaps TIBCO will buy Rapidminer. Speaking of which, TIBCO seems to be doing better at analyst relations. Forrester awarded them an above average rating in strategy, FWIW.
SAS published a lot of blog posts this year. The company has a blog farm that spews content like Mount Vesuvius in full eruption. Product-wise, however, all seems quiet in Cary. The June Viya release was underwhelming.
SAS revenue in 2017 was $3.24 billion. If it exceeds $3.40 billion in 2018, I’ll buy Dave Mac lunch at Yum Yum Sushi Thai on Harrison Oaks.
Not Really Muddling
Ayasdi killed off its Federal business and lost its CEO. Headcount declined from 130 to 109 over the year. Ayasdi’s last funding round was three and a half years ago. As a rule, startups that can’t or won’t refund aren’t thriving.
Cloudera announced plans to offer Cloudera Data Science Workbench in the cloud. I doubt that AWS is shaking in its boots about that. CDSW’s main advantage is its integration with Cloudera. Put the same product in the cloud and you have a notebook-based platform for Python and R that doesn’t support Jupyter. I could make a joke about that, but out of respect for my former colleagues at Cloudera, I won’t.
Oracle acquired DataScience.com in June, and on the strength of that secured a “Leader” rating in the Forrester Wave for notebook-based data science platforms. Then everyone died or something because Oracle Data Science Cloud won’t be available until June 2019.
Notice how most of the vendors in the Forrester “Wave” have little circles, but Oracle has just a little dot?
Now flip to the table on page 8 of the report. See that 0.00 under Customer adoption for Oracle? It means that nobody uses the Oracle product.
Oracle Data Science Cloud. It’s the leading product that nobody uses.
Regular readers of this blog may recall that we celebrate the passing of AI/ML vendors with bye-ku. Here’s one for DataScience.com.
A bright future for
Data Science dot com? No…
No, no, no, no, no.
Gartner rated KXEN a Challenger the year before SAP bought the company in 2013. Since then, SAP just keeps sinking deeper into Niche Vendor territory. (The niche, it seems, is “SAP bigots.”) According to Gartner:
SAP has one of the lowest overall customer satisfaction scores in this Magic Quadrant. Its reference customers indicated that their overall experience with SAP was poor, and that the ability of its products to meet their needs was low. SAP continues to struggle to gain mind share for PA across its traditional customer base. SAP is one of the most infrequently considered vendors, relative to other vendors in the Magic Quadrant, by those choosing a data science and machine-learning platform.
In other words, existing customers think SAP sucks, and everyone else stays well clear.
The product itself seems little changed since 2013 or, for that matter, since first launched by KXEN in 1999. Meanwhile, SAP plays this fun game called “Where’s Leonardo?” where they tease everyone with all the wonderful capabilities that will be built into SAP Leonardo while hiding the actual product. Quoting Gartner again:
SAP Leonardo Machine Learning and other components of the SAP Leonardo ecosystem did not contribute to SAP’s Ability to Execute position in this Magic Quadrant.
Translation: we can’t evaluate slideware.
Teradata deprecated Aster, finally, and no longer has a product in the data science and machine learning category. Curiously, at Teradata events, top management still talks about machine learning. Here’s a bye-ku for Teradata’s machine learning business:
It has machine learning but
Sadly it does not.
Goodbye, Teradata. You have enough work to do getting those database customers to stick.