Big Analytics Roundup (July 13, 2015)

Light news this week, likely due to summer vacations. Story of the week: Microsoft announces Spark in Azure.
Shivon Zilis spends three months compiling a list of 2,529 analytic startups, creates this chart:
Apache Flink
Dan Gray reports on the upcoming Flink Forward 2015 conference in Berlin.
At the Chicago Flink Meetup, Slim Baltagi delivers an overview.
On YouTube, Robert Metzger of DataArtisans dives deeply into Flink.
Apache Kafka
Confluent, the commercial venture driving Kafka development, announces $24M Series B round. On ZDNet, Toby Wolpe covers the story. Jonathan Vanian writes it up for Fortune.
Apache Spark
Microsoft announces public preview of Spark for Azure HDInsight. Microsoft’s Spark offering includes Spark 1.3.1, Anaconda, Spark Job Server (an open source tool developed by Ooyala), the Zeppelin and Jupyter notebooks and the Microsoft Spark ODBC driver for connectivity with tools like Power BI and Tableau. The offering also includes out-of-the box integration with Azure Event Hubs for streaming analytics.
In ZDNet, Andrew Brust covers Microsoft’s Spark announcement.
Databricks announces SparkHub, a site that aggregates Spark news, events, video and other Spark resources.
On his blog, Dan Osipov fills in the details of running Spark on EMR. (h/t Hadoop Weekly)
In Infoworld, Andrew Oliver mostly drops his previous Spark skepticism, advises you to use Spark most of the time. The exception, he notes, is that Spark sometimes spills to disk; however, he misses the point that when it does so, it is still faster than MapReduce.
On the Cloudera blog, Ted Malaska describes data quality checks using Spark DataFrames.
Vincent D. Warmerdam explains how to provision Spark 1.4 with RStudio.
Databricks’ Reynold Xin delivers a presentation about DataFrames. Slideshare here.
Dato/GraphLab
On the Dato blog, Robert Voyer explains the Autotagging feature in GraphLab Create, using Hacker News topics as data.
H2O
On Slideshare, H2O data scientist Hank Roark offers an overview of data science, machine learning and H2O.
SAS
Evolution:
Teradata
Teradata recognizes reality, offers Cloudera option for its oxymoronic Hadoop Appliance. Alex Woodie reports.
Zoomdata
Curt Monash explains Zoomdata.
I have enjoyed reading many of the articles and posts contained on the website, keep up the good work and hope to read some more interesting content in the future.
Thank you for bringing more information to this topic for me. I’m truly grateful and really impressed.
Valuable information! Looking forward to seeing your notes posted. The information you have posted is very useful. Keep going on, good stuff. Thank you for this valuable information