Big Analytics Roundup (October 26, 2015)
Fourteen stories this week, beginning with an announcement from IBM. This week, IBM celebrates 14 straight quarters of declining revenue at its IBM Insight conference, appropriately enough at the Mandalay Bay in Vegas, where the restaurants are overhyped and overpriced.
Meanwhile, the first Spark Summit Europe meets in Amsterdam, in the far more interesting setting of the Beurs van Berlage. There will be a live stream on Wednesday and Thursday — details here. Sadly, I can’t make this one — the first Spark Summit I’ve missed — but am looking forward to the live stream.
(1) IBM Announces Spark on Bluemix
At its IBM Insight beauty show, IBM announces availability of its Apache Spark cloud service. Actually, IBM announced it back in July, but that was a public beta. On ZDNet, Andrew Brust gushes, noting that IBM has DB2, Watson, Netezza, Cognos, TM1, SPSS, Informix and Cloudant in its portfolio. He fails to note that of those products, exactly one — Cloudant — actually interfaces with Spark.
There were rumors that IBM would have an exciting announcement about Spark at this show, but if this is it — yawn. Looking at IBM’s “Spark in the cloud” offering, I don’t see anything that sets it apart from other available offerings unless you have a Blue fetish.
Update: Rod Reicks of IBM writes to note that IBM’s new release of SPSS Analytics Server runs processes in Spark. For the uninitiated, Analytics Server is a product you license from IBM that enables SPSS Modeler user to run selected operations in Hadoop. Previous versions ran through MapReduce only. Reicks claims that the latest version runs through Spark when available.
I say “claims” because there is no reference to this feature in IBM’s Release Notes, Installation Guide or User’s Guide. Spark is mentioned deep in the Administrator Guide, under Troubleshooting. So the good news is that if the product fails, IBM has some tips — one of which should be “Install Spark.”
You’d think that with IBM’s armies of people they could at least find someone to write documentation.
(2) Mahout Book FAIL
(3) Concurrent Adds Spark Support
Concurrent announces Release 2.0 of Driven, its oddly-named performance management software, which now includes support for Apache Spark.
(4) Flink Founder Touts Streaming Analytics
At Big Data Spain, Data Artisans co-founder Kostas Tzoumas argues that streaming is the basis for all analytics, which is a bit over the top: as they say, if all you have is a hammer, the world looks like a nail. Still, his deck is a nice intro to Flink, which has made some progress this year.
(5) AtScale Announces Release 3.0
AtScale, one of the more interesting startups in the BI space, delivers Release 3.0 of its OLAP-on Hadoop platform. Rather than introducing a new user interface into the mix, AtScale makes it possible for BI users to work with Hadoop tables without jumping back and forth to programming tools. The product currently supports Tableau, Excel, Qlik, Spotfire, MicroStrategy and JasperSoft, and runs on CDH, HDP or MapR with Impala, Spark SQL or Hive on Tez. The new release includes enhanced role-based security, including Kerberos, Username/Password or LDAP.
(6) Neo: Graphs are Eating the World
Graph database leader Neo announces immediate availability of Neo4j 2.3, which includes what it calls “intelligent applications at scale” and Docker support. Exactly what Neo means by “intelligence applications at scale” means is unclear, but if Neo is claiming that you no longer have to dump a graph into Spark to run a PageRank, I’ll believe it when I see it.
(7) New Notebook Sharing for Databricks
Databricks announces new notebook sharing capabilities for its eponymous product. On the Databricks blog, Denise Li and Dave Wang explain.
(8) Teradata: Blah, Blah, Blah, IoT, Blah, Blah Blah…
At its annual user conference, Teradata announces that it’s heard about IoT. Teradata also announces that it will make Aster available on Hadoop, which would have been interesting in 2012. Aster, for the uninitiated, includes a SQL on MapReduce engine, which is rendered obsolete by fast SQL engines like Presto, which Teradata has just embraced.
(9) Flink Forward Redux
As I noted last week, the first Flink Forward conference met in Berlin two weeks ago. William Benton records his impressions.
Presentations are here. Some highlights:
- Dongwon Kim benchmarks Flink against MR, MR on Tez and Spark. Flink wins.
- Kostas Tzoumas outlines the Flink development roadmap through Release 1.0.
- Martin Junghanns explains graph analytics with Flink.
- Anwar Rizal demonstrates streaming decision trees with Flink.
Henning Kropp offers resources for diving deeply into Flink.
(10) Pyramid Analytics Lands New Funding
Amsterdam-based BI startup Pyramid Analytics announces a $30 million “B” round to help it try to explain why we need more BI software.
(11) Harte Hanks Switches from CDH to MapR
John Leonard explains why Harte Hanks switched from Cloudera to MapR. Most likely explanation: they were able to cut a cheaper deal with MapR.
(12) Audience Modeling with Spark
Guest posting on the Databricks blog, Eugene Zhulenev explains audience modeling with Spark ML pipelines.
(13) New Functions in Drill
On the MapR blog, Neeraja Rentachintala describes new capabilities in Drill Release 1.2, including SQL window functions.
(14) Integrating Spark and Redshift
“Redshift is where data goes to die.” — Rob Ferguson, Spark Summit East
On the Databricks blog, Sameer Wadkar of Axiomine explains how to use the spark-redshift package, first introduced in March of this year and now in version 0.5.2. So you can yank your data out of Redshift and do something with it. (h/t Hadoop Weekly)