Big Analytics Roundup (May 4, 2015)

Light news this week; presumably everyone is out enjoying the weather. SAS and Microsoft put on shows; Pivotal announces a port; and some helpful bits for Spark users.
Late addition: Seth Grimes seeks text analytics unicorns, finds one in Clarabridge.
SDTimes offers a nice primer on the Hadoop ecosystem.
There seems to be some tension between devotees of Apache Ignite (aka GridGain) and Tachyon.
Pivotal Ports Hawq to HDP
Pivotal announces that Hawq, a federated SQL engine, is now supported on Hortonworks HDP. Pivotal donated its HD distribution to open source and handed customer support over to Hortonworks, so the choices for Hawq boiled down to port it or kill it. Presumably, porting to HDP would only be difficult if Pivotal had previously forked HD.
This reminds me of the announcement back in 1954 that you could buy a Packard at your Studebaker dealer.
Apache Spark
On the Databricks blog, Reynold Xin and Josh Rosen review Project Tungsten, an effort to improve the efficiency of memory and CPU for Spark applications.
On video (h/t Hadoop Weekly):
- John Haddad of Datastax offers a guide to getting started with Spark and Cassandra
- Sandy Ryza of Cloudera advises on how to debug a Spark job
Ludwine Probst summarizes her analysis of accelerometer data with Spark MLLib and Cassandra
In two posts on the Cloudera blog, Sean Owen and Juliet Hoagland compare APIs for MapReduce and Spark. Part One is here; Part Two is here.
Microsoft Build Conference
Microsoft unveils Azure Data Lake, a data lake-in-the-box offering that includes recently acquired Revolution R. More here from Alex Woodie.
SAS Global Forum
SAS held its annual beauty show in Dallas last week. Puff pieces to the contrary (here, here and here), they announced very little. Releasing results of a paid study that concluded most enterprises are “Analytically Challenged”, SAS touted two existing products, SAS Visual Analytics and SAS Visual Statistics. Also announced: two niche solutions, one for cybersecurity, the other for model risk management.