Big Analytics Roundup (July 13, 2015)
Light news this week, likely due to summer vacations. Story of the week: Microsoft announces Spark in Azure.
Shivon Zilis spends three months compiling a list of 2,529 analytic startups, creates this chart:
Dan Gray reports on the upcoming Flink Forward 2015 conference in Berlin.
At the Chicago Flink Meetup, Slim Baltagi delivers an overview.
On YouTube, Robert Metzger of DataArtisans dives deeply into Flink.
Microsoft announces public preview of Spark for Azure HDInsight. Microsoft’s Spark offering includes Spark 1.3.1, Anaconda, Spark Job Server (an open source tool developed by Ooyala), the Zeppelin and Jupyter notebooks and the Microsoft Spark ODBC driver for connectivity with tools like Power BI and Tableau. The offering also includes out-of-the box integration with Azure Event Hubs for streaming analytics.
In ZDNet, Andrew Brust covers Microsoft’s Spark announcement.
In Infoworld, Andrew Oliver mostly drops his previous Spark skepticism, advises you to use Spark most of the time. The exception, he notes, is that Spark sometimes spills to disk; however, he misses the point that when it does so, it is still faster than MapReduce.
On the Cloudera blog, Ted Malaska describes data quality checks using Spark DataFrames.
Vincent D. Warmerdam explains how to provision Spark 1.4 with RStudio.
Databricks’ Reynold Xin delivers a presentation about DataFrames. Slideshare here.
On Slideshare, H2O data scientist Hank Roark offers an overview of data science, machine learning and H2O.
Curt Monash explains Zoomdata.