In a press release and blog post, Databricks announces results from its 2016 Spark Survey. Databricks surveyed 1,615 Spark users and prospective users in July, 2016 Respondents include data engineers, data scientists, architects, technical managers, and academics.
Key findings from the survey:
- Spark SQL remains the most widely used component.
- 88% use Spark SQL
- 71% use Spark Streaming
- 71% use MLlib (machine learning)
- Respondents value Spark’s performance and advanced analytics.
- 91% rate performance very important
- 82% rate advanced analytics very important
- 76% rate ease of programming very important
- 69% rate ease of deployment very important
- 51% rate real-time streaming very important
- Production use has increased markedly since 2015.
- 40% use SQL in production, up from 24%
- 38% use DataFrames in production, up from 15%
- 22% use streaming in production, up from 14%
- 18% use machine learning, up from 13%
- So has usage in the public cloud.
- 61% said they use Spark in the public cloud, up from 51% in 2015.
- Usage of Spark deployed on-premises has declined.
- 42% use Spark in a standalone deployment, down from 48%
- 36% use Spark under YARN, down from 40%
- 7% use Spark on Apache Mesos, down from 11%
- The Scala API remains the most popular, followed closely by the Python API.
- 65% use Scala, down from 71% in 2015
- 62% use Python, up from 58%
- 44% use SQL, up from 36%
- 29% use Java, down from 31%
- 20% use R, up from 18%
- While Linux remains the most popular OS, Mac and Windows usage is growing rapidly.
- 74% use Linux/Unix, down from 75% in 2015
- 32% use Windows, up from 23%
- 22% use Mac OSX, up from 14%
The report also includes statistics about the Spark community at large.
— Databricks reports growth in the contributor base from 600 in 2015 to 1,000 in 2016, a figure that does not seem to square with the statistics reported in OpenHub.
— Spark Meetup membership grew from 66,000 in 2015 to 225,000 in 2016.
— Spark Summit attendance grew from 3,912 to 5,100.
For a copy of the report and an infographic, go here.
One thought on “Databricks Releases Spark Survey”