Big Analytics Roundup (March 9, 2015)

Here’s a roundup of interesting Big Analytics news and analysis from the past week.  Featured this week: Hortonworks, Alpine, Spark and H2O.

Hortonworks

  • Matt Asay, writing in InfoWorld, deconstructs Hortonworks’ earnings fiasco, and with it the “100% open source” business model.

Alpine Data Labs

  • VentureBeat reports a story that Alpine Data Labs claims 10X growth in user count and billings year over year.
  • MarketWired reports the same story.
  • ITBusinessNet too.

There is no supporting press release from Alpine Data Labs.   The VentureBeat story includes the nugget that Alpine currently has “more than 60” customers; an insider tells me that the number is closer to 75, roughly twice as many as last year.  Alpine has changed its selling model, hiring its own sales force instead of selling through EMC and Pivotal.  This also means that Alpine has changed its messaging from “we run on Greenplum and PostgresSQL, but mostly on Greenplum” to “we run on anything.”  This is an aspiration, to be sure, but a good one.

Alpine has also changed its pricing model from a perpetual server-based model to a user-based subscription model.

Separately, Ventana Research publishes a positive review of Alpine Chorus 5.0.

Apache Spark

  • Jonathan Buckley of Qubole argues that the three open source projects that transformed Hadoop are Hive, Spark and Presto.  It’s an odd choice.  Hive is certainly a key project and Spark is red hot; Presto, not so much.
  • Data prep engine vendor Paxata announces a new release that runs on Spark, releases benchmark report showing significant performance improvements.
  • Databricks announces selection of Databricks Cloud as preferred platform for B2B vendor Radius Intelligence, publishes case study.
  • Forbes profiles Databricks CEO Ion Stoica.
  • Ian Lumb offers eight reasons why Spark is hot.
  • Databricks published a slideshare about Spark DataFrames, which will be available in Spark 1.3 later this month.
  • From the Cloudera blog, an excellent post showing how to build an application for financial markets risk calculations in Spark.

H2O

  • In an interview with KDNuggets, Ted Dunning touts Mahout and H2O over Spark.
  • H2O.ai announces Cloudera certification for its Sparking Water interface to Spark.

General

CMSWire rehashes the Gartner Magic Quadrant without adding value.   The author notes breathlessly that “many KNIME enthusiasts are data miners”, and “on the downside, (RapidMiner’s) user base is mostly data scientists”; as if these points are news, and as if there is something extraordinary about data miners and data scientists using data mining and data science tools.

Gartner Advanced Analytics Magic Quadrant 2015

Gartner’s latest Magic Quadrant for Advanced Analytics is out; for reference, the 2014 report is here; analysis from Doug Henschen here.  Key changes from last year:

  • Revolution Analytics moves from Visionary to Niche
  • Alpine and Microsoft move from Niche to Visionary
  • Oracle, Actuate and Megaputer drop out of the analysis
Gartner 2015 Magic Quadrant, Advanced Analytics
Gartner 2015 Magic Quadrant, Advanced Analytics

Gartner changed its evaluation criteria this year to reflect only “native” (e.g. proprietary) functionality; as a result, Revolution Analytics dropped from Visionary to Niche.   Other vendors, it seems, complained to Gartner that the old criteria were “unfair” to those who don’t leverage open source functionality.  If Gartner applies this same reasoning to other categories, it will have to drop coverage of Hortonworks and evaluate Cloudera solely on the basis of Impala.  🙂

Interestingly, Gartner’s decision to ignore open source functionality did not impact its evaluation of open source vendors RapidMiner and KNIME.

Based on modest product enhancements from Version 4.0 to Version 5.0, Alpine jumped from Niche to Visionary.   Gartner’s inclusion criteria for the category mandate that “a vendor must offer advanced analytics functionality as a stand-alone product…”; this appears to exclude Alpine, which runs in Pivotal Greenplum database (*).  Gartner’s criteria are flexible, however, and I’m sure it’s purely coincidental that Gartner analyst Gareth Herschel flacks for Alpine.

(*) Yes, I know — Alpine supports other databases and Hadoop as well.   The number of Alpine customers who use it in anything other than Pivotal can meet in Starbucks at one of the little tables in the back.

Gartner notes that Alpine “still lacks depth of functionality. Several model techniques are either absent or not fully developed within its tool.”  Well, yes, that does seem important.   Alpine’s promotion to Visionary appears to rest on its Chorus collaboration capability (originally developed by Greenplum).  It seems, however, that customers don’t actually use Chorus very much; as Gartner notes, “adoption is currently slow and the effort to boost it may divert Alpine’s resources away from the core product.”

Microsoft’s reclassification from Niche to Visionary rests purely on the basis of Azure Machine Learning (AML), a product still in beta at the time of the evaluation.  Hardly anyone uses MSFT’s “other” offering for analytics (SQL Server Analytic Services, or SSAS), apparently for good reason:

  • “The 2014 edition of SSAS lacks breadth, depth and usability, in comparison with the Leaders’ offerings.”
  • “Microsoft received low scores from SSAS customers for its willingness to incorporate their feedback into future versions of the product.”
  • “SSAS is a low-performing product (with poor features, little data exploration and questionable usability.”

On paper, AML is an attractive product, though it maxes out at 10GB of data; however, it seems optimistic to rate Microsoft as “Visionary” purely on the basis of a beta product.  “Visionary” is a stretch in any case — analytic software that runs exclusively in the cloud is by definition a niche product, as it appeals only to a certain segment of the market.  AML’s most attractive capabilities are its ability to run Python and R — and, as we noted above — these no longer carry any weight with Gartner.

Dropping Actuate and Megaputer from the MQ simply recognizes the obvious.  It’s not clear why these vendors were included last year in the first place.

It appears that Oracle chose not to participate in the MQ this year.  Analytics that run in a single database platform are by definition niche products — you can’t use Oracle Advanced Analytics if you don’t have Oracle Database, and few customers will choose Oracle Database because it has Oracle Advanced Analytics.