Big Analytics Roundup (July 13, 2015)

Light news this week, likely due to summer vacations.   Story of the week: Microsoft announces Spark in Azure.

Shivon Zilis spends three months compiling a list of 2,529 analytic startups, creates this chart:


Apache Flink

Dan Gray reports on the upcoming Flink Forward 2015 conference in Berlin.

At the Chicago Flink Meetup, Slim Baltagi delivers an overview.

On YouTube, Robert Metzger of DataArtisans dives deeply into Flink.

Apache Kafka

Confluent, the commercial venture driving Kafka development, announces $24M Series B round.  On ZDNet, Toby Wolpe covers the story.  Jonathan Vanian writes it up for Fortune.

Apache Spark

Microsoft announces public preview of Spark for Azure HDInsight.  Microsoft’s Spark offering includes Spark 1.3.1, Anaconda, Spark Job Server (an open source tool developed by Ooyala), the Zeppelin and Jupyter notebooks and the Microsoft Spark ODBC driver for connectivity with tools like Power BI and Tableau.  The offering also includes out-of-the box integration with Azure Event Hubs for streaming analytics.

In ZDNet, Andrew Brust covers Microsoft’s Spark announcement.

Databricks announces SparkHub, a site that aggregates Spark news, events, video and other Spark resources.

On his blog, Dan Osipov fills in the details of running Spark on EMR. (h/t Hadoop Weekly)

In Infoworld, Andrew Oliver mostly drops his previous Spark skepticism, advises you to use Spark most of the time.  The exception, he notes, is that Spark sometimes spills to disk; however, he misses the point that when it does so, it is still faster than MapReduce.

On the Cloudera blog, Ted Malaska describes data quality checks using Spark DataFrames.

Vincent D. Warmerdam explains how to provision Spark 1.4 with RStudio.

Databricks’ Reynold Xin delivers a presentation about DataFrames.  Slideshare here.


On the Dato blog, Robert Voyer explains the Autotagging feature in GraphLab Create, using Hacker News topics as data.


On Slideshare, H2O data scientist Hank Roark offers an overview of data science, machine learning and H2O.



  • Jim Goodnight, 2013: Big Data is hype.
  • Jim Goodnight, 2015: We help people with Big Data.


Teradata recognizes reality, offers Cloudera option for its oxymoronic Hadoop Appliance.  Alex Woodie reports.


Curt Monash explains Zoomdata.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.