Big Analytics Roundup (September 7, 2015)

Top news for this week: SAP and Syncsort release Spark connectors, and HP says it wants to develop one; Pivotal abandons Hawq; Flink, Spark and H2O publish agendas for the upcoming events.

Postdoc Opportunity

Peter Rose, Site Head of the RCSB Protein Data Bank West at SDSC just landed a $1.4 million grant from the National Institutes of Health under its Big Data to Knowledge (BD2K) initiative.   Peter writes to say that the team plans to use Apache Spark, and seeks postdocs to join the group.   See the flyer here: Postdoc Big Data UCSD flyer

SQL on Hadoop

Pivotal gives up trying to sell Hawq, donates it to Apache where it is now an incubator project.  Hawq is a SQL tool that federates queries across Hadoop and Greenplum database, introduced with much hoopla two years ago.

Apache Drill

On the MapR blog, Carol McDonald offers the ultimate guide to Drill’s architecture.  (h/t Hadoop Weekly)

Apache Flink/ Data Artisans

announces Flink 0.9.1, a maintenance release.

Flink Forward’s organizers publish the conference program, which will be held October 12-13 at Berlin’s KulturBrauerei in the Prenzler Berg district.  Conference organizers have arranged discounted rooms at the nearby Hotel4Youth, where presumably old folks like me are unwelcome.

On Slideplayer, Data Artisans co-founder Stephan Ewan presents an overview of Flink.

On the Data Artisans blog, Robert Metzger and Kostas Tzoumas present a practical guide to integrating Flink and Kafka.  Go here for video.

Dataconomy interviews Romeo Kienzler of IBM, who plans to present at the Flink Forward conference in October.

Apache Spark/ Databricks

Databricks publishes agenda for the Spark Summit Europe.  I’ve attended every Spark Summit to date, but will skip this one.

HP announces its future “commitment” to integrate Vertica with Spark, featuring accelerated data transfer and a model scoring capability.  Translation:  HP wants some Spark buzz to counter IBM, but they don’t have a working release or even a definite plan.

MapR’s Carol McDonald publishes a guide to integrating Spark Streaming with HBase.  (h/t Hadoop Weekly)

SAP announces SAP HANA Vora, an in-memory query engine that runs on Spark and supports in-memory queries, OLAP and drill-down analysis.  SAP’s PR engine kicks into overdrive.  Timothy Prickett Morgan touts it; additional coverage here, here, and here.

Syncsort contributes an IBM z-system mainframe connector to Spark Packages.   Analysis ensues: here, here, here, and here.  Some analysts confuse this announcement with IBM’s plan to support Spark on its mainframes.  The two things are completely different: the point of the connector is to extract data from the mainframe and move it to Spark for processing.

Bernard Marr asks whether the rise of Spark spells the end of Hadoop, explains why the answer is no.


publishes schedule and speaker lineup for H2O World 2015 (November 9-11).   Featured speakers include Hilary Mason, Monica Rogati, Stephen Boyd and Rob Tibshirani.


Hadley Wickham announces dplyr 0.4.3, with 30 minor improvements and bug fixes.  Release notes here.

Revolution R/ Microsoft

…announces Revolution R Open 3.2.2, an enhanced R distribution.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.