Big Analytics Roundup (July 27, 2015)

Top stories this week: Palantir’s valuation grows, Continuum Analytics gets a bump, Cloudera announces a Python interface for Impala, and we have a winner in KDD Cup 2015.
Nate Desmond chronicles Palantir‘s $15 Billion growth story just as the company hits $20 Billion.
Conversion Logic wins the KDD Cup 2015, which L.A. Biz characterizes as the “Nerd Olympics”.
Here’s a picture from an actual Nerd Olympics held in 2011 at Brentwood College School in British Columbia.
The EduPristine blog explains how to choose the right value of k for k-means. The approach I learned at a strategy consulting firm thirty years ago:
- Start with the elbow method
- Eyeball cluster means for the active variables
- If you can’t concisely say how each cluster differs from the others, reduce k by 1 and re-run
Apache Drill
On O’Reilly Radar, Ellen Friedman summarizes Drill’s capabilities
Apache Flink
The Smaato blog publishes a Q&A about Flink with the founders of Data Artisans, the commercial venture supporting Flink.
John Hammink, who enjoys travel to unusual places, summarizes Flink on the Treasure Data blog.
Apache Kylin
Andrea Mostosi lists Kylin on his page of Useful Stuff, which means it has arrived.
Apache Mesos
On InfoQ, Mesosphere founder Benjamin Hindman explains how to build and run distributed systems with Mesos.
Apache Spark
The Hammer Lab announces Spree, a live-updating UI for Spark.
On the MapR blog, Nitin Bandugula compares Spark to MapReduce. His conclusion: Spark is faster, easier to use and does more.
Adrian Bridgewater, writing in Forbes, profiles Huawei’s use of Spark.
Speaking of Huawei, the company announces availability of its Spark SQL on HBase connector, which it brands as Astro.
In InfoWorld, Serdar Yegulalp explains why Spark is spiking in the cloud. (Spoiler: it’s for the same reasons Spark is spiking on premises.)
On his personal blog, Eugene Zhulenev explains interactive audience analytics with Spark SQL, with a discussion of Spark’s advantages over Hive and Impala.
Cloudera
…announces Ibis, a Python interface for Impala. Get started here.
Continuum Analytics
…lands a generous $24 million “A” round. Continuum distributes Anaconda, Python for science and machine learning.
Databricks
…announces that sales “accelerator” Yesware has “selected” Databricks Cloud for its production pipeline. Case study here.
Dato
On the Dato blog, Tim Muss describes “major advances” in Release 1.5.1 of the Dato Machine Learning Platform.
For a laugh, look up “Dato” in the Urban Dictionary. (NSFW)
H2O
On Slideshare, Edward Agarwala and Scott Marsh describe machine learning at Progressive Insurance with H2O.
Lucidworks
…rolls out version 2.0 of Fusion, its enterprise search application.
Pivotal
…benchmarks Hawq against Hive on Tez, Impala, declares victory.
Revolution Analytics
…announces availability of Revolution R Open 3.2.1, the latest release of Revolution’s enhanced free distribution of R.
Skytree Software
…updates its customer page for the first time since coming out of stealth in 2013. Significant adds: American Express, Discover, Equifax, Intuit, MasterCard, PayPal, among others.
Zoomdata
…launches its Early Access Program, which gives customers access to new capabilities developed at Zoomdata Labs. This includes:
- Zoomdata Fusion, data federation and blending capability with a drag-and-drop interface
- SQL access to HBase through Apache Phoenix
- “Smart” connectors to popular SaaS platforms, including Google Analytics, Marketo, Salesforce.com, Sendgrid, Zendesk and others.
useful information so far,
Regards