Big Analytics Roundup (September 7, 2015)
Top news for this week: SAP and Syncsort release Spark connectors, and HP says it wants to develop one; Pivotal abandons Hawq; Flink, Spark and H2O publish agendas for the upcoming events.
Peter Rose, Site Head of the RCSB Protein Data Bank West at SDSC just landed a $1.4 million grant from the National Institutes of Health under its Big Data to Knowledge (BD2K) initiative. Peter writes to say that the team plans to use Apache Spark, and seeks postdocs to join the group. See the flyer here: Postdoc Big Data UCSD flyer
SQL on Hadoop
Pivotal gives up trying to sell Hawq, donates it to Apache where it is now an incubator project. Hawq is a SQL tool that federates queries across Hadoop and Greenplum database, introduced with much hoopla two years ago.
Apache Flink/ Data Artisans
…announces Flink 0.9.1, a maintenance release.
Flink Forward’s organizers publish the conference program, which will be held October 12-13 at Berlin’s KulturBrauerei in the Prenzler Berg district. Conference organizers have arranged discounted rooms at the nearby Hotel4Youth, where presumably old folks like me are unwelcome.
On Slideplayer, Data Artisans co-founder Stephan Ewan presents an overview of Flink.
Dataconomy interviews Romeo Kienzler of IBM, who plans to present at the Flink Forward conference in October.
Apache Spark/ Databricks
Databricks publishes agenda for the Spark Summit Europe. I’ve attended every Spark Summit to date, but will skip this one.
HP announces its future “commitment” to integrate Vertica with Spark, featuring accelerated data transfer and a model scoring capability. Translation: HP wants some Spark buzz to counter IBM, but they don’t have a working release or even a definite plan.
SAP announces SAP HANA Vora, an in-memory query engine that runs on Spark and supports in-memory queries, OLAP and drill-down analysis. SAP’s PR engine kicks into overdrive. Timothy Prickett Morgan touts it; additional coverage here, here, and here.
Syncsort contributes an IBM z-system mainframe connector to Spark Packages. Analysis ensues: here, here, here, and here. Some analysts confuse this announcement with IBM’s plan to support Spark on its mainframes. The two things are completely different: the point of the connector is to extract data from the mainframe and move it to Spark for processing.
Bernard Marr asks whether the rise of Spark spells the end of Hadoop, explains why the answer is no.
…publishes schedule and speaker lineup for H2O World 2015 (November 9-11). Featured speakers include Hilary Mason, Monica Rogati, Stephen Boyd and Rob Tibshirani.
Revolution R/ Microsoft
…announces Revolution R Open 3.2.2, an enhanced R distribution.