Big Analytics Roundup (July 27, 2015)

Top stories this week:  Palantir’s valuation grows, Continuum Analytics gets a bump, Cloudera announces a Python interface for Impala, and we have a winner in KDD Cup 2015.

Nate Desmond chronicles Palantir‘s $15 Billion growth story just as the company hits $20 Billion.

Conversion Logic wins the KDD Cup 2015, which L.A. Biz characterizes as the “Nerd Olympics”.

Here’s a picture from an actual Nerd Olympics held in 2011 at Brentwood College School in British Columbia.


The EduPristine blog explains how to choose the right value of k for k-means.  The approach I learned at a strategy consulting firm thirty years ago:

  • Start with the elbow method
  • Eyeball cluster means for the active variables
  • If you can’t concisely say how each cluster differs from the others, reduce k by 1 and re-run

Apache Drill

On O’Reilly Radar, Ellen Friedman summarizes Drill’s capabilities

Apache Flink

The Smaato blog publishes a Q&A about Flink with the founders of Data Artisans, the commercial venture supporting Flink.

John Hammink, who enjoys travel to unusual places, summarizes Flink on the Treasure Data blog.

Apache Kylin

Andrea Mostosi lists Kylin on his page of Useful Stuff, which means it has arrived.

Apache Mesos

On InfoQ, Mesosphere founder Benjamin Hindman explains how to build and run distributed systems with Mesos.

Apache Spark

The Hammer Lab announces Spree, a live-updating UI for Spark.

On the MapR blog, Nitin Bandugula compares Spark to MapReduce.  His conclusion: Spark is faster, easier to use and does more.

Adrian Bridgewater, writing in Forbes, profiles Huawei’s use of Spark.

Speaking of Huawei, the company announces availability of its Spark SQL on HBase connector, which it brands as Astro.

In InfoWorld, Serdar Yegulalp explains why Spark is spiking in the cloud.  (Spoiler: it’s for the same reasons Spark is spiking on premises.)

On his personal blog, Eugene Zhulenev explains interactive audience analytics with Spark SQL, with a discussion of Spark’s advantages over Hive and Impala.


announces Ibis, a Python interface for Impala.  Get started here.

Continuum Analytics

lands a generous $24 million “A” round.  Continuum distributes Anaconda, Python for science and machine learning.


announces that sales “accelerator” Yesware has “selected” Databricks Cloud for its production pipeline.  Case study here.


On the Dato blog, Tim Muss describes “major advances” in Release 1.5.1 of the Dato Machine Learning Platform.

For a laugh, look up “Dato” in the Urban Dictionary.  (NSFW)


On Slideshare, Edward Agarwala and Scott Marsh describe machine learning at Progressive Insurance with H2O.


rolls out version 2.0 of Fusion, its enterprise search application.


benchmarks Hawq against Hive on Tez, Impala, declares victory.

Revolution Analytics

announces availability of Revolution R Open 3.2.1, the latest release of Revolution’s enhanced free distribution of R.

Skytree Software

…updates its customer page for the first time since coming out of stealth in 2013.  Significant adds: American Express, Discover, Equifax, Intuit, MasterCard, PayPal, among others.


launches its Early Access Program, which gives customers access to new capabilities developed at Zoomdata Labs.  This includes:

  • Zoomdata Fusion, data federation and blending capability with a drag-and-drop interface
  • SQL access to HBase through Apache Phoenix
  • “Smart” connectors to popular SaaS platforms, including Google Analytics, Marketo,, Sendgrid, Zendesk and others.

One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.