Big Analytics Roundup (August 1, 2016)

There are two big stories this week: Apache Spark 2.0 and Apache Mesos 1.0. There’s also a new release from Kylin, and a nice crop of explainers.

IEEE Spectrum publishes its third annual ranking of top programming languages, based on twelve metrics drawn from Google Search, Google Trends, Twitter, GitHub, Stack Overflow, Reddit, Hacker News, CareerBuilder, Dice, and the IEEE Xplore Digital Library. Among analytic languages, Python ranks third; R ranks fifth; Matlab, fourteenth; Scala, fifteenth; Julia thirty-third. SAS ranks thirty-ninth, good enough to qualify at the tail end of a NASCAR race.

Spark 2.0 General Availability

The Spark team announces general availability for Spark 2.0. My full report here.  Key new bits:

  • Improved memory management and performance.
  • Unified DataFrames and Datasets APIs.
  • SQL 2003 support.
  • Pipeline persistence for machine learning.
  • Structured Streaming, a declarative streaming API (in experimental release.)

Databricks immediately announces support for the release.

Matei Zaharia explains continuous applications, noting that real-world use cases combine streaming and static data. For example, real-time fraud detection applications leverage information about the individual transaction together with information about the customer, the merchant and the item purchased.

Matei, Tathagata Das, Michael Armbrust and Reynold Xin explain Structured Streaming.

More stories herehereherehereherehereherehere, and here.

Apache Mesos Release 1.0

The Apache Mesos team announces the availability of Mesos 1.0.

— Maria Deutscher reports.

— Timothy Prickett Morgan details Mesos vs. Kubernetes.

— Serdar Yegualp notes that Mesos is not a clone of Kubernetes, which is certainly true.

— Gabriela Motroc says Mesos 1.0 is full of surprises, which sounds ominous.

Explainers

— Kaggle Grandmaster Abhishek Thakur details best practices for predictive modeling.

— H2O.ai’s Arno Candel explains new developments in H2O.

— Kypriani Sinaris interviews Databricks’ Xiangrui Meng, who explains Spark MLlib.

— TIBCO’s Hayden Schultz explains TIBCO’s Accelerator for Apache Spark.

— Bob Grossman of the University of Chicago and the Open Data Group explains best practices for predictive model deployment.

— Allstate’s Rob Nendorf explains DevOps for Data Science.

Perspectives

— Doug Henschen blogs on Workday’s plans for Platfora.

— Andrew Psaltis argues for a unified stream processing model, touts Apache Beam.

— Martin Heller reviews Google Cloud Machine Learning and likes what he sees.

— Janakiram MSV touts Microsoft’s machine learning initiatives.

Open Source News

— Apache Kylin announces release 1.5.3, with bug fixes, improvements, and a few new features.

Commercial Announcements

— MapR announces a third place ranking in a Gartner report. Ask yourself this: who came in third at Daytona?

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s