Big Analytics Roundup (May 25, 2015)

This week features new releases from Drill and Hive, plus announcements from DataStax and MemSQL.

Andrew Brust summarizes the SQL options presented by Drill, Hive and Spark, noting that Drill’s “SQL everywhere” approach and DBMS vendors’ federated engines make the term “SQL on Hadoop” obsolete.

Gartner surveys its panel of 284 people who rely on Gartner and concludes that Hadoop is less ubiquitous than Microsoft Office, because it’s hard.   (One wonders how many Hadoop deployments were overlooked because members of the “Gartner Research Circle” don’t know about them.)   Gartner analysts Nick Heudecker and Merv Adrian look at the numbers and tut-tut about the future of Hadoop.   But if 26% say they are investing today and 11% say they plan to invest in the next twelve months, that sounds to me like a 42% growth rate for Hadoop distributors, which is not too shabby.

Quant recruiter Linda Burtch surveys her network and sees a big shift in preference of SAS versus R. (h/t Oliver Vagner)  In other wars, Datacamp pits R versus Python, concludes nothing.

In VentureBeat, Mark Lorion trolls the Boston tech community.  Speaking of which, the Open Data Science Conference meets this weekend in Boston at the Convention Center.

Apache Drill

The team announces Release 1.0, which includes improvements to documentation and bug fixes.

Apache Flink

OK, maybe Flink does something. (h/t Hadoop Weekly)

Apache Hive

Hive Release 1.2 is available.  Hortonworks’ announcement ignores Cloudera’s Hive on Spark beta.

Apache Spark

Ian Stirk reviews the second edition of Fast Data Processing with Spark, by Krishna Sankar and Holden Karau, and pans it.

Altiscale posts part one of a series of tips and tricks for running Spark co-located with Hadoop.  Repeat after me, kids: Spark is co-located with Hadoop, not “on” Hadoop.

Revolution Analytics (now part of Microsoft) publishes beta release of a dplyr-Spark interface with more disclaimers than a pack of cigarettes.


DataStax announces GA for DSE 4.7, which includes Cassandra 2.1, certification with Spark 1.2 and integrated search.  Coverage here, here and here.


MemSQL announces Release 4 of its eponymous in-memory database, which includes a Community Edition and some previously announced features, including Spark integration and geospatial capability.  Coverage here, here, here, here and here.


Skytree gets an award for something, which is good because they haven’t produced any other news lately.


Or Yet Another Machine Learning Library.  This time, it’s Keystone, which is pretty much the same as Spark MLLib.

One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.