2014 Predictions: Advanced Analytics
A few predictions for the coming year.
(1) Apache Spark matures as the preferred platform for advanced analytics in Hadoop.
Spark will achieve top-level project status in Apache by July; that milestone, together with inclusion in Cloudera CDH5, will validate the project’s rapid maturation. Organizations will increasingly question the value of “point solutions” for Hadoop analytics versus Spark’s integrated platform for machine learning, streaming, graph engines and fast queries.
At least one commercial software vendor will release software using Spark as a foundation.
Apache Mahout is so done that speakers at the recent Spark Summit didn’t feel the need to stick a fork in it.
(2) “Co-location” will be the latest buzzword.
Most analytic tools can connect with Hadoop, extract data and drag it across the corporate network to a server for processing; that capability is table stakes. Few, however, can integrate directly with MapReduce for advanced analytics with little or no data movement.
YARN changes the picture, however, as it enables integration of MapReduce and non-MapReduce applications. In practice, that means it will be possible to stand up co-located server-based analytics (e.g. SAS) on a few nodes with expanded memory inside Hadoop. This asymmetric architecture adds some latency (since data moves from the HDFS data nodes to the analytic nodes), but not as much as when data moves outside of Hadoop entirely. For most analytic use cases, the cost of data movement will be more than offset by the improved performance of in-memory iterative processing.
It’s no coincidence that Hortonworks’ partnership with SAS is timed to coincide with the release of HDP 2.0 and production YARN support.
(3) Graph engines will be hot.
Not that long ago, graph engines were exotic. No longer: a wide range of maturing applications, from fraud detection and social media analytics to national security rely on graph engines for graph-parallel analytics.
GraphLab leads in the space, with Giraph and Tez well behind; Spark’s GraphX is still in beta. GraphX has already achieved performance parity with Giraph and it has the advantage of integration with the other pieces of Spark. As the category matures, analysts will increasingly see graph analysis as one more arrow in the quiver.
(4) R approaches parity with SAS in the commercial job market.
R already dominates SAS in broad-based analyst surveys, but SAS still beats R in commercial job postings. But job postings for R programmers are rapidly growing, while SAS postings are declining. New graduates decisively prefer R over SAS, and organizations increasingly recognize the value of R for “hard money” analytics.
(5) SAP emerges as the company most likely to buy SAS.
“Most likely” as in “only logical” suitor. IBM no longer needs SAS, Oracle doesn’t think it needs SAS, and HP has too many other issues to address before taking on another acquisition. A weak dollar favors foreign buyers, and SAS does substantial business outside the US. SAP lacks street cred in analytics (and knows it), and is more likely to agree to Jim Goodnight’s inflated price and terms.
Will a transaction take place this year? Hard to say; valuations are peaking, but there are obstacles to sale, as I’ve noted previously.
(6) Competition heats up for “easy to use” predictive analytics.
For hard money analytics, programming tools such as SAS and R continue to dominate. But organizations increasingly seek alternatives to SAS and SPSS for advanced analytic tools that are (a) easy to use, and (b) relatively inexpensive to deploy on a broad scale. SAS’ JMP and Statistica are existing players, with Alteryx, Alpine and RapidMiner entering the fray. Expect more entrants as BI vendors expand offerings to support more predictive analytics.
Vertical and horizontal solutions will be key to success in this category. It’s not enough to have a visual interface; “ease of use” means “ease of use in context”. It is easier to develop a killer app for one use case than for many. Competitive forces require smaller vendors to target use cases they can dominate and pursue a niche strategy.