Big Analytics Roundup (August 1, 2016)
There are two big stories this week: Apache Spark 2.0 and Apache Mesos 1.0. There’s also a new release from Kylin, and a nice crop of explainers.
IEEE Spectrum publishes its third annual ranking of top programming languages, based on twelve metrics drawn from Google Search, Google Trends, Twitter, GitHub, Stack Overflow, Reddit, Hacker News, CareerBuilder, Dice, and the IEEE Xplore Digital Library. Among analytic languages, Python ranks third; R ranks fifth; Matlab, fourteenth; Scala, fifteenth; Julia thirty-third. SAS ranks thirty-ninth, good enough to qualify at the tail end of a NASCAR race.
Spark 2.0 General Availability
- Improved memory management and performance.
- Unified DataFrames and Datasets APIs.
- SQL 2003 support.
- Pipeline persistence for machine learning.
- Structured Streaming, a declarative streaming API (in experimental release.)
Databricks immediately announces support for the release.
Matei Zaharia explains continuous applications, noting that real-world use cases combine streaming and static data. For example, real-time fraud detection applications leverage information about the individual transaction together with information about the customer, the merchant and the item purchased.
Matei, Tathagata Das, Michael Armbrust and Reynold Xin explain Structured Streaming.
Apache Mesos Release 1.0
The Apache Mesos team announces the availability of Mesos 1.0.
— Maria Deutscher reports.
— Timothy Prickett Morgan details Mesos vs. Kubernetes.
— Serdar Yegualp notes that Mesos is not a clone of Kubernetes, which is certainly true.
— Gabriela Motroc says Mesos 1.0 is full of surprises, which sounds ominous.
— Kaggle Grandmaster Abhishek Thakur details best practices for predictive modeling.
— H2O.ai’s Arno Candel explains new developments in H2O.
— Kypriani Sinaris interviews Databricks’ Xiangrui Meng, who explains Spark MLlib.
— TIBCO’s Hayden Schultz explains TIBCO’s Accelerator for Apache Spark.
— Bob Grossman of the University of Chicago and the Open Data Group explains best practices for predictive model deployment.
— Allstate’s Rob Nendorf explains DevOps for Data Science.
— Doug Henschen blogs on Workday’s plans for Platfora.
— Andrew Psaltis argues for a unified stream processing model, touts Apache Beam.
— Martin Heller reviews Google Cloud Machine Learning and likes what he sees.
— Janakiram MSV touts Microsoft’s machine learning initiatives.
Open Source News
— Apache Kylin announces release 1.5.3, with bug fixes, improvements, and a few new features.