Back in January, I published this post with predictions for 2014. Thought it would be fun to validate how well the crystal ball works.
(1) Apache Spark matures as the preferred platform for advanced analytics in Hadoop.
I wrote this just after attending the 2013 Spark Summit in December; it was clear then that Spark would own 2014. But I had no idea just how fast Spark would catch fire.
Spark will achieve top-level project status in Apache by July; that milestone, together with inclusion in Cloudera CDH5, will validate the project’s rapid maturation.
The Apache Foundation announced top-level status for Spark in February; Cloudera announced immediate support for Spark in February, before it released CDH5; and every other Hadoop distributor followed suit.
At least one commercial software vendor will release software using Spark as a foundation.
There are now thirteen vendors with product certified on Spark.
Apache Mahout is so done that speakers at the recent Spark Summit didn’t feel the need to stick a fork in it.
Not quite. But the Mahout team has announced that all new projects must use a standard DSL that runs the job in Spark.
(2) “Co-location” will be the latest buzzword.
Well, not so much.
Most analytic tools can connect with Hadoop, extract data and drag it across the corporate network to a server for processing; that capability is table stakes. Few, however, can integrate directly with MapReduce for advanced analytics with little or no data movement. YARN changes the picture, however, as it enables integration of MapReduce and non-MapReduce applications.
Co-locating your analytics in the Hadoop cluster is less attractive than integrating your analytics with Hadoop. With Spark fully integrated with Hadoop storage APIs, co-located solutions seem much less attractive.
It’s no coincidence that Hortonworks’ partnership with SAS is timed to coincide with the release of HDP 2.0 and production YARN support.
SAS has such deep pockets, one would think it unwise to bet against it. And yet, seven months into HDP 2.0 and umpteen months into production for SAS HPA, SAS still can’t seem to produce a public success story for advanced analytics in Hadoop.
(3) Graph engines will be hot.
Not that long ago, graph engines were exotic. No longer: a wide range of maturing applications, from fraud detection and social media analytics to national security rely on graph engines for graph-parallel analytics.
Graph analysis is really useful in the right hands, but organizations are still trying to figure out what to do with it. That is why we still see posts like this; when something is hot, nobody writes articles about what to do with it; everyone knows what to do with it.
The other issue with graph analysis is that it’s not easy to learn. Graph techniques are quite different from the predictive analytics algorithms most analysts learn, and the method tends to require specialized knowledge.
GraphLab leads in the space, with Giraph and Tez well behind; Spark’s GraphX is still in beta. GraphX has already achieved performance parity with Giraph and it has the advantage of integration with the other pieces of Spark. As the category matures, analysts will increasingly see graph analysis as one more arrow in the quiver.
Oops. Tez isn’t really comparable to Giraph and GraphLab. And right after I wrote this, the GraphLab open source project pretty much died. GraphLab Inc., the commercial venture incepted to commercialize the open source project, is fiddling around with other stuff. Meanwhile, top contributors to open source GraphLab are now working on Spark.
Since Apache Giraph has flatlined, Spark’s GraphX project appears to be the only game in town, at least in open source scalable graph analytics.
(4) R approaches parity with SAS in the commercial job market.
Hard to evaluate this one until Bob Muenchin updates his analysis for 2014. But the trend is your friend:
R already dominates SAS in broad-based analyst surveys, but SAS still beats R in commercial job postings. But job postings for R programmers are rapidly growing, while SAS postings are declining. New graduates decisively prefer R over SAS, and organizations increasingly recognize the value of R for “hard money” analytics.
Speaking with enterprise customers, I like to ask why they switched from SAS to R. The #1 response: the people we hire know R already, not SAS. SAS’ free “University Edition” is an attempt to stem the bleeding that might make a difference in ten years or so.
(5) SAP emerges as the company most likely to buy SAS.
Hmm. Not really.
“Most likely” as in “only logical” suitor. IBM no longer needs SAS, Oracle doesn’t think it needs SAS, and HP has too many other issues to address before taking on another acquisition. A weak dollar favors foreign buyers, and SAS does substantial business outside the US. SAP lacks street cred in analytics (and knows it), and is more likely to agree to Jim Goodnight’s inflated price and terms.
After a flurry of announcements last fall (combined with optimistic predictions from SAS executives), all is quiet on the SAS+SAP front; my Google Alert grows cobwebs. SAS has delivered an ACCESS engine to HANA but not much else considering the talk about joint solutions. SAP bought a Platinum sponsorship at the 2014 SAS Global Forum, which is an improvement over 2013 when they didn’t show up at all.
Meanwhile, though, SAP continues to invest in HANA PAL and KXEN for predictive analytics, and recently announced support for Spark. That makes the SAS/SAP alliance look more like a handshake than an embrace.
Will a transaction take place this year? Hard to say; valuations are peaking, but there are obstacles to sale, as I’ve noted previously.
Almost certainly not. Goodnight brags that he’s “having too much fun to step down”, which is nice to know but misses the point; succession plans are only useful when they are transparent. Anyone investing in SAS’ proprietary platform should wonder what happens next.
(6) Competition heats up for “easy to use” predictive analytics.
It’s a crowded market for “code-free” analytics.
For hard money analytics, programming tools such as SAS and R continue to dominate. But organizations increasingly seek alternatives to SAS and SPSS for advanced analytic tools that are (a) easy to use, and (b) relatively inexpensive to deploy on a broad scale. SAS’ JMP and Statistica are existing players, with Alteryx, Alpine and RapidMiner entering the fray. Expect more entrants as BI vendors expand offerings to support more predictive analytics.
According to Crunchbase, entrepreneurs have started 142 analytic startups in the past 18 months, and all of them want you to know that they make analytics easy. The likely result is that analytics will be easy and cheap; tools for the casual user should cost no more than $500 per user.
Software firms like to target the easy analytics space because the fastest way to build a customer base is to attract new users who never used analytics in the past. Experienced analysts tend to have established “sticky” preferences for analytic software, and switching is rare.
The obvious users to target already use BI tools, so the major BI players are all trying to embed analytics in their tooling; some have already done so. For most of these startups, the best exit will be a tender offer from IBM.
Vertical and horizontal solutions will be key to success in this category. It’s not enough to have a visual interface; “ease of use” means “ease of use in context”. It is easier to develop a killer app for one use case than for many. Competitive forces require smaller vendors to target use cases they can dominate and pursue a niche strategy.
This seems to be the trend. Of the 142 startups mentioned above, 11 have completed two or more funding rounds. Most of these, like MarketMuse, QuantifiedSkin and ThetaRay, offer highly specialized applications with embedded analytics.