Thomas Dinsmore's Blog

Spark Summit 2015: Preliminary Report

June 18, 2015

Written by:

Thomas W. Dinsmore

So I guess Spark really is enterprise ready. Nick Heudecker, call your office.

There are several key themes coming from the Summit:

Spark Continues to Mature

Spark and its contributors deserve a round of applause. Some key measures of growth since the 2014 Summit:

Contributor headcount increased from 255 to 730
Committed lines of code increased from 175K to 400K

There is increasing evidence of Spark’s scalability:

Largest cluster: 8,000 nodes
Largest job: 1 petabyte
Top streaming intake: 1TB/hour

Project Tungsten aims to make Spark faster and prepare for the next five years; the project has already accomplished significant performance improvements through better use of memory and CPU.

IBM and Spark

IBM drops the big one with its announcement. Key bits from the announcement:

IBM will build Spark into the core of its analytic and commerce products, including IBM Watson Health Cloud
IBM will open source its machine learning library (System ML) and work with Databricks to port it to Spark.
IBM will offer Spark as a Cloud service on Bluemix.
IBM will commit 3,500 developers to Spark-related projects.
IBM (and its partners) will train more than a million people on Spark

I will post separately on this next week

Spark is Enterprise-Ready

If IBM’s announcement is not sufficient to persuade skeptics, presentations from Adobe, Airbnb, Baidu, Capital One, CIA, NASA/JPL, NBC Universal, Netflix, Thompson Reuters, Toyota and many others demonstrate that Spark already supports enterprise-level workloads.

In one of the breakouts, Arsalan Tavakoli-Shiraji of Databricks presented results from his analysis of more than 150 production deployments of Spark. As expected, organizations use Spark for BI and advanced analytics; the big surprise is that 60% use non-HDFS data sources. These organizations use Spark for data consolidation on the fly, decoupling compute from storage, with unification taking place on the processing layer.

Databricks Cloud is GA

Enough said.

SparkR

Spark 1.4 includes R bindings, opening Spark to the large community of R users. Out of the gate, the R interface enables the R user to leverage Spark DataFrames; the Spark team plans to extend the capability to include machine learning APIs in Spark 1.5.

Spark’s Expanding Ecosystem

Every major Hadoop distributor showed up this year, but there were no major announcements from the distributors (other than IBM’s bombshell).

In other developments:

Amazon Web Services announced availability of a new Spark on EMR service
Intel announced a new Streaming SQL project for Spark
Lucidworks showcased its Fusion product, with Spark embedded
Alteryx announced its plans to integrate with Spark in its Release 10

One interesting footnote — while there were a number of presentations about Tachyon last year, there were none this year.

These are just the key themes. I’ll publish a more detailed story next week.

One response to “Spark Summit 2015: Preliminary Report”

Big Analytics Roundup (June 22, 2015) | The Big Analytics Blog

June 22, 2015 at 11:02 am

[…] preliminary report is here; full report when slides are available from the […]

Reply

Leave a comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Thomas Dinsmore's Blog

Spark Summit 2015: Preliminary Report

AI Is Coming For Your Job!!!

Spring 2024 Preview

More on AI Venture Funding

Spark Summit 2015: Preliminary Report

Share this:

One response to “Spark Summit 2015: Preliminary Report”

Leave a comment Cancel reply

AI Is Coming For Your Job!!!

Spring 2024 Preview

More on AI Venture Funding