Big Analytics Roundup (July 27, 2015)
Top stories this week: Palantir’s valuation grows, Continuum Analytics gets a bump, Cloudera announces a Python interface for Impala, and we have a winner in KDD Cup 2015.
Conversion Logic wins the KDD Cup 2015, which L.A. Biz characterizes as the “Nerd Olympics”.
Here’s a picture from an actual Nerd Olympics held in 2011 at Brentwood College School in British Columbia.
The EduPristine blog explains how to choose the right value of k for k-means. The approach I learned at a strategy consulting firm thirty years ago:
- Start with the elbow method
- Eyeball cluster means for the active variables
- If you can’t concisely say how each cluster differs from the others, reduce k by 1 and re-run
On O’Reilly Radar, Ellen Friedman summarizes Drill’s capabilities
The Smaato blog publishes a Q&A about Flink with the founders of Data Artisans, the commercial venture supporting Flink.
John Hammink, who enjoys travel to unusual places, summarizes Flink on the Treasure Data blog.
On InfoQ, Mesosphere founder Benjamin Hindman explains how to build and run distributed systems with Mesos.
The Hammer Lab announces Spree, a live-updating UI for Spark.
On the MapR blog, Nitin Bandugula compares Spark to MapReduce. His conclusion: Spark is faster, easier to use and does more.
Adrian Bridgewater, writing in Forbes, profiles Huawei’s use of Spark.
Speaking of Huawei, the company announces availability of its Spark SQL on HBase connector, which it brands as Astro.
In InfoWorld, Serdar Yegulalp explains why Spark is spiking in the cloud. (Spoiler: it’s for the same reasons Spark is spiking on premises.)
On his personal blog, Eugene Zhulenev explains interactive audience analytics with Spark SQL, with a discussion of Spark’s advantages over Hive and Impala.
…lands a generous $24 million “A” round. Continuum distributes Anaconda, Python for science and machine learning.
On the Dato blog, Tim Muss describes “major advances” in Release 1.5.1 of the Dato Machine Learning Platform.
For a laugh, look up “Dato” in the Urban Dictionary. (NSFW)
On Slideshare, Edward Agarwala and Scott Marsh describe machine learning at Progressive Insurance with H2O.
…rolls out version 2.0 of Fusion, its enterprise search application.
…benchmarks Hawq against Hive on Tez, Impala, declares victory.
…announces availability of Revolution R Open 3.2.1, the latest release of Revolution’s enhanced free distribution of R.
…updates its customer page for the first time since coming out of stealth in 2013. Significant adds: American Express, Discover, Equifax, Intuit, MasterCard, PayPal, among others.
…launches its Early Access Program, which gives customers access to new capabilities developed at Zoomdata Labs. This includes:
- Zoomdata Fusion, data federation and blending capability with a drag-and-drop interface
- SQL access to HBase through Apache Phoenix
- “Smart” connectors to popular SaaS platforms, including Google Analytics, Marketo, Salesforce.com, Sendgrid, Zendesk and others.