Big Analytics Roundup (April 27, 2015)

In the news this week: ODP, Spark Summit and a culinary FAIL from IBM Watson.

MapR to ODP: Get Lost

On the MapR blog, CEO John Schroeder describes ODP as “a Hortonworks marketing vehicle that provides a graceful market exit for Greenplum Pivotal,”  thus voicing thoughts shared by everyone not employed by Hortonworks and Pivotal.  (Additional coverage here.)  Schroeder notes that ODP adds a redundant layer of opaque pay-to-play governance, solves problems that don’t need solving and misdefines the Hadoop core in ways that serve the interests of Hortonworks.

Other than that, he’s for it.

In Datanami, Alex Woodie covers the “debate”, writing that ODP’s launch “effectively split the Hadoop community down the middle.”  Eighteen paragraphs later, he notes that Cloudera and MapR support 75% of the Hadoop implementations.  In other words, on one side we have Hadoop’s leaders and, on the other we have ODP.

Spark Summit 2015 Posts Agenda

The organizers of Spark Summit 2015, to be held in San Francisco June 15-17, have posted the agenda.   Keynotes are still TBD.  On the first two days there will be three tracks, one each targeting developers, data scientists and people like me who care mostly about applications.  Among the presenters: NBC Universal, Netflix, Capital One, Beth Israel Deaconess,, Shopify, OpenTable, AutoTrader, Uber, UnderArmour, Thomson Reuters, and Duke University, thus demonstrating that Spark really is enterprise-ready.

Predixion Lands Cash?

Predixion Software announces a “D” Round, does not disclose amount.  In other words, they’re still negotiating.

The “C” round 22 months ago drew $21 million.

Applications of Note

Bots that report on other bots.

Apache Spark Updates

At, Lindsay Clarke profiles Spark, gets it right.

Arush Kharbanda delivers an excellent guide to Spark Streaming for

The bloggers at Sematext say they see Spark Streaming displacing Storm.  Hortonworks, are you listening?

On the Databricks blog:

  • Reynold Xin summarizes recent Spark performance improvements.
  • Ion Stoica and Vida Ha demonstrate analysis of Apache Access logs with Databricks Cloud.
  • Daniel Darabos of Lynx Analytics touts LynxKite, a graph analytics solution that leverages Spark.

Kay Ewbank writes a positive review of Learning Spark, the recently released book by Holden Karau, et. al.

Kay Ousterhout et. al. test three workloads in Spark, conclude that performance is CPU-bound and not disk or network bound.  (Republished in The Morning Paper).

Other Updates

The R Core Team has announced availability of R 3.2.0.

For those so inclined, the Mahout team has posted a guide to building an app in Mahout.

Google adds stream processing capabilities to BigQuery.

MapR releases on-demand training for Apache Drill.

Microsoft releases a free ebook on Azure Machine Learning.  It’s nicely written.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.