Big Analytics Roundup (May 25, 2015)
This week features new releases from Drill and Hive, plus announcements from DataStax and MemSQL.
Andrew Brust summarizes the SQL options presented by Drill, Hive and Spark, noting that Drill’s “SQL everywhere” approach and DBMS vendors’ federated engines make the term “SQL on Hadoop” obsolete.
Gartner surveys its panel of 284 people who rely on Gartner and concludes that Hadoop is less ubiquitous than Microsoft Office, because it’s hard. (One wonders how many Hadoop deployments were overlooked because members of the “Gartner Research Circle” don’t know about them.) Gartner analysts Nick Heudecker and Merv Adrian look at the numbers and tut-tut about the future of Hadoop. But if 26% say they are investing today and 11% say they plan to invest in the next twelve months, that sounds to me like a 42% growth rate for Hadoop distributors, which is not too shabby.
The team announces Release 1.0, which includes improvements to documentation and bug fixes.
Altiscale posts part one of a series of tips and tricks for running Spark co-located with Hadoop. Repeat after me, kids: Spark is co-located with Hadoop, not “on” Hadoop.
Revolution Analytics (now part of Microsoft) publishes beta release of a dplyr-Spark interface with more disclaimers than a pack of cigarettes.
MemSQL announces Release 4 of its eponymous in-memory database, which includes a Community Edition and some previously announced features, including Spark integration and geospatial capability. Coverage here, here, here, here and here.
Skytree gets an award for something, which is good because they haven’t produced any other news lately.
Or Yet Another Machine Learning Library. This time, it’s Keystone, which is pretty much the same as Spark MLLib.