Big Analytics Roundup (February 15, 2016)

We have a nice harvest of explainers this week, plus eight hard news stories:

  1. Gartner Updates Advanced Analytics MQ
  2. DataRobot Lands Cash
  3. MapR and Hortonworks Report Robust Revenue
  4. StreamAnalytix Adds Spark Support
  5. BlueData Announces Something
  6. Looker Delivers SQL on Hadoop
  7. New RStudio Release
  8. New Anaconda Release

R has a new logo.

On the Databricks blog, Tim Hunter and Joseph Bradley announce release of the spark-sklearn package. Among other tools, the package enables users to train and evaluate multiple scikit-learn models in parallel; convert Spark DataFrames to numpy darrays; and distribute Scipy sparse matrices as a dataset of sparse vectors.  A friend writes: “it’s sad — using Spark for embarrassingly parallel jobs.”

In TechRepublic, Brian Taylor discovers that Spark is the most active open source project in Big Data.  Dude, that was true 20 months ago.

Dato’s Yucheng Lowe reports on the state of the SFrame, Dato’s distributed data structure.  FWIW.


— In InfoWorld, Databricks’ Joseph Bradley, Xiangrui Meng and Denny Lee explain why you should use Spark for machine learning.

— In TechTarget, Craig Mullins explains the Actian Analytics Platform.

— In DataInformed, Frank D. Evans explains Topic Modeling.

— At the Toptal blog, Radek Ostrowski explains how he built an app in Spark and Docker.

— On the Databricks blog:

  • Sujit Pal explains how Elsevier Labs implemented dictionary annotation at scale with Spark in Databricks.
  • Andrew Ray explains the new Pivot feature in Spark 1.6.

— In InformationWeek, Charles Babcock explains how eBay uses Apache Kylin for OLAP on Hadoop.


— At Motley Fool, Adam Levy wonders whether Microsoft can catch up with Amazon Web Services in the cloud.  In overall revenue, that would certainly be a stretch.  But in terms of capabilities for data scientists, MSFT is already ahead of AWS.

— On the IBM Developer Works blog, IBM’s Martin Keen lists five things to know about Apache Spark.  Doubtful that anyone who doesn’t work for IBM would agree that “Spark runs on a mainframe” should make the top five.

(1) Gartner Updates Advanced Analytics MQ

Gartner publishes its 2016 Magic Quadrant for Advanced Analytics Platforms.   You can get a free copy here from RapidMiner (registration required.)

Screen Shot 2016-02-13 at 10.34.28 AM

The report is a muddle that mixes up products in different categories that don’t compete with one another, includes marginal players, excludes important startups and largely ignores open source analytics.

Gartner evaluated BigML, Business-Insight, Dataiku, Dato,, MathWorks, Oracle, Rapid Insight, Salford Systems, Skytree and TIBCO but did not include them because they “did not meet one or more of the inclusion criteria.”  This suggests that Gartner’s inclusion criteria are ridiculous.

Other than that, it’s a fine report.

Changes from last year are relatively small.  Some detailed comments:

— Accenture makes the analysis this year, according to Gartner, because it acquired Milan-based i4C Analytics, a company whose customer base can sit comfortably at one of those little tables near the bookstores in the Galleria Vittorio Emanuele Due.  One wonders: if it makes sense to include Analytic Service Providers like Accenture in the MQ, where is Palantir?

Alpine Data Labs declines a lot in “Ability to Deliver,” which makes sense since they appear to be running out of money.

Alteryx declined a little, which is surprising since its new release is strong and the company just scored a pile of venture cash.

Angoss improved a lot, moving from Niche to Challenger, largely on the basis of its WPL-based SAS integration and better customer satisfaction.  Data prep was a gap for Angoss, so the WPL partnership is a positive move.

— Dell: Arguing that Dell has “executed on an ambitious roadmap during the past year”, Gartner moves Dell into the Leaders quadrant.   That “execution” is largely invisible to everyone else, as the product seems to have changed little since Dell acquired Statistica, and I don’t think too many people are excited that the product interfaces with Boomi.  Customer satisfaction has declined and pricing is a mess, but Gartner is all giggly about Boomi, Kitenga and Toad.  Gartner rightly cautions that software isn’t one of Dell’s core strengths, and the recent EMC acquisition “raises questions” about the future of software at Dell.  Which raises questions about why Gartner thinks Dell qualifies as a Leader in the category.

FICO fades.

IBM stays at about the same position in the MQ.  Gartner rightly notes the “market confusion” about IBM’s analytics products, and dismisses yikyak about cognitive computing.

— KNIME was a Leader last year and remains a Leader, moving up a little.  Gartner notes that many customers choose KNIME for its cost-benefit ratio, which is unsurprising since the software is free.

Lavastorm makes it to the MQ this year, for some reason.

Megaputer, a text mining vendor, makes it to the MQ for the second year running despite being so marginal that they lack a record in Crunchbase.  Gartner notes that “Megaputer scores low on viability and visibility and there is a lack of awareness of the company outside of text analytics in the advanced analytics market.”  Just going out on a limb, here, Mr. Gartner, but maybe that’s your cue to drop them from the MQ, or cover them under text mining.

Microsoft gets Gartner’s highest scores on Completeness of Vision on the strength of Azure Machine Learning (AML) and Cortana Analytics Suite.  Some customers aren’t thrilled that AML is only available in the cloud, presumably because they want hackers to steal their data from an on-premises system, where most data breaches happen.  Microsoft’s hybrid on-premises cloud should render those arguments moot.  Existing customers who use SQL Server Analytic Services are less than thrilled with that product.

Predixion Software improves on “Completeness of Vision” because it can “deploy anywhere” according to Gartner.  Wut?  Anywhere you can run Windows.

Prognoz markets software not used outside the former Soviet Union.

RapidMiner moves up on both dimensions.  Gartner recognizes the company’s “Wisdom of Crowds” feature and the recent Series C funding, but neglects to note RapidMiner’s excellent Hadoop and Spark integration.

SAP stays at pretty much the same place in the MQ.  Gartner notes that SAP has the lowest scores in customer satisfaction, analytic support and sales relationship, which is about what you would expect when an ankle-biter like KXEN gets swallowed by a behemoth like SAP, where analytics goes to die.

SAS declines slightly in Ability to Deliver.  Gartner notes that SAS’ licensing model, high costs and lack of transparency are a concern.  No kidding.

(2) DataRobot Lands Cash

Machine learning startup DataRobot announces that it has closed on a $33M “B” round led by New Enterprise Associates.  New investors Intel Capital and Recruit Strategic Partners joined existing investors Accomplice (previously Atlas Venture), IA Ventures and New York Life.

(3) MapR, Hortonworks Report Robust Revenue

MapR reports 2015 billings increased “more than 100%” over 2014.  Meanwhile, the slackers at Hortonworks report an increase of only 90%.  HDP’s billings are $40MM higher than revenue, since customers prepay contracts; the same is likely true for MapR, which doesn’t report revenue.

(4) StreamAnalytix Adds Spark Support

StreamAnalytix announces Release 2.0 of its streaming analytics platform, with Spark support and a few other enhancements.  More stories here.

(5) BlueData Announces Something

BlueData announces that it has added a bundle of Spark, Kafka and Cassandra to its EPIC software for on-premises elastic provisioning.

(6) Looker Delivers SQL on Hadoop

Looker announces support for Presto and Spark SQL,  Darryl Taft reports.  Jessica Davis draws broad conclusions about the state of Spark adoption from the Looker announcement.

(7) New RStudio Release

RStudio announces a new release for its eponymous IDE for R.  Key new bits:

  • RStudio Addins, a mechanism for executing custom R functions.
  • Improvements to R Markdown authoring and parameterized reports.
  • Multiple ways to open source windows.
  • Custom keypad shortcuts.
  • EMACS keybindings.

Improvements to the Server edition include support for multiple concurrent R sessions, multiple R versions and shared projects.

(8) New Anaconda Release

Continuum Analytics announces release of Anaconda Release 2.5.  The new release includes the Intel Math Kernel Library, Windows debugging information files, an option to install on Windows without touching the registry and the OpenSSL security update to OpenSSL version 1.0.2f.

Continuum also announces plans to bundle Microsoft R Open with Anaconda later this month,

New Data Science Studio Release

Dataiku releases Data Science Studio V2.3, a major upgrade.  New bits:

  • Updated visual interactive data prep studio.
  • Workflow visualization.
  • Instance-wide data catalogue.
  • Notebook for SQL, Hive and Impala.

More detail available here, in the Release notes.

One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.