Big Analytics Roundup (April 27, 2015)
In the news this week: ODP, Spark Summit and a culinary FAIL from IBM Watson.
MapR to ODP: Get Lost
On the MapR blog, CEO John Schroeder describes ODP as “a Hortonworks marketing vehicle that provides a graceful market exit for Greenplum Pivotal,” thus voicing thoughts shared by everyone not employed by Hortonworks and Pivotal. (Additional coverage here.) Schroeder notes that ODP adds a redundant layer of opaque pay-to-play governance, solves problems that don’t need solving and misdefines the Hadoop core in ways that serve the interests of Hortonworks.
Other than that, he’s for it.
In Datanami, Alex Woodie covers the “debate”, writing that ODP’s launch “effectively split the Hadoop community down the middle.” Eighteen paragraphs later, he notes that Cloudera and MapR support 75% of the Hadoop implementations. In other words, on one side we have Hadoop’s leaders and, on the other we have ODP.
Spark Summit 2015 Posts Agenda
The organizers of Spark Summit 2015, to be held in San Francisco June 15-17, have posted the agenda. Keynotes are still TBD. On the first two days there will be three tracks, one each targeting developers, data scientists and people like me who care mostly about applications. Among the presenters: NBC Universal, Netflix, Capital One, Beth Israel Deaconess, Edmunds.com, Shopify, OpenTable, AutoTrader, Uber, UnderArmour, Thomson Reuters, Salesforce.com and Duke University, thus demonstrating that Spark really is enterprise-ready.
Predixion Lands Cash?
Predixion Software announces a “D” Round, does not disclose amount. In other words, they’re still negotiating.
The “C” round 22 months ago drew $21 million.
Applications of Note
Bots that report on other bots.
Apache Spark Updates
At ComputerWeekly.com, Lindsay Clarke profiles Spark, gets it right.
Arush Kharbanda delivers an excellent guide to Spark Streaming for opensource.com.
The bloggers at Sematext say they see Spark Streaming displacing Storm. Hortonworks, are you listening?
On the Databricks blog:
- Reynold Xin summarizes recent Spark performance improvements.
- Ion Stoica and Vida Ha demonstrate analysis of Apache Access logs with Databricks Cloud.
- Daniel Darabos of Lynx Analytics touts LynxKite, a graph analytics solution that leverages Spark.
Kay Ewbank writes a positive review of Learning Spark, the recently released book by Holden Karau, et. al.
Kay Ousterhout et. al. test three workloads in Spark, conclude that performance is CPU-bound and not disk or network bound. (Republished in The Morning Paper).
The R Core Team has announced availability of R 3.2.0.
For those so inclined, the Mahout team has posted a guide to building an app in Mahout.
Google adds stream processing capabilities to BigQuery.
MapR releases on-demand training for Apache Drill.
Microsoft releases a free ebook on Azure Machine Learning. It’s nicely written.