Comments by Thomas W. Dinsmore
Key highlights from the 2014 Spark Summit:
Last December, the 2013 Spark Summit pulled 450 attendees for a two-day event. Six months later, the Spark Summit 2014 sold out at more than a thousand seats for a three-day affair.
It’s always ironic when manual registration at a tech conference produces long lines:
Databricks CTO Matei Zaharia kicked off the keynotes with his recap of Spark progress since the last summit. Zaharia enumerated Spark’s two big goals: a unified platform for Big Data applications combined with a standard library for analytics. CEO Ion Stoica followed with a Databricks update, including an announcement of the SAP alliance and an impressive demo of Databricks Cloud, currently in private beta. Separately, Databricks announced $33 million in Series B funding.
Spark Release Manager Patrick Wendell delivered an overview of planned development over the next several releases. Wendell confirmed Spark’s commitment to stable APIs; patches that break the API fail the build. The project will deliver dot releases every three months beginning in August 2014, and maintenance releases as needed. Development focus in the near future will be in the libraries:
Mike Franklin of Berkeley’s AMPLab summarized new developments in the Berkeley Data Analytics Stack (“BadAss”), including significant new work in genomics and energy, as well as improvements to Tachyon and MLBase. Dave Patterson elaborated on AMPLab’s work in genomics, providing examples showing how Spark has markedly reduced both cost and runtime for genomic analysis.
Cloudera, Datastax, MapR and SAP demonstrated that the first rule of success is to show up:
IBM wasted a Platinum sponsorship by sending some engineers to talk about “System T”, IBM’s text mining application, with passing references to Spark. Although IBM Infosphere BigInsights is a certified Spark distribution, IBM appears uncommitted to Spark; the lack of executive presence at the Summit stood out in sharp contrast to Cloudera and MapR.
Silver sponsors Hortonworks and Pivotal hosted tables in the vendor area, but did not present anything.
Neuroscientist Jeremy Freeman, back by popular demand from the 2013 Spark Summit, presented latest developments in his team’s research into animal brains using Spark as an analytics platform. Freeman’s presentations are among the best demonstrations of applied analytics that I’ve seen in any forum.
A number of vendors in the Spark ecosystem delivered presentations showing how their applications leverage Spark, including:
The most significant change from the 2013 Spark Summit is the number of reported production users for Spark. While the December conference focused on Spark’s potential, I counted several dozen production users among the presentations I attended.
Also among the sellout crowd: a SAS executive checking to see if there is anything to this open source and vendor-neutral stuff. Apparently, he did not get Jim Goodnight’s message that “Big Data is hype manufactured by media“.