Here is a quick roundup of some recent Apache Spark news.
(1) Databricks and Typesafe released results from a survey of 2,136 individuals (mostly developers). Some key findings:
- 13% of respondents run Spark in production, 20% plan to use Spark in 2015
- Most say they expect to use the 82% Spark core to replace MapReduce
- 88% say they use the Scala API
- Respondents split on deployment: 54% deploy Spark standalone, 42% co-located with Hadoop under YARN
- 62% load data from HDFS, 46% from unspecified databases, 41% from Apache Kafka, 29% from Amazon S3
Analysis from GigaOm here. Copy of the report available here (registration required).
(2) On the Databricks blog, Jeremy Freeman introduces streaming k-means, a capability included in Spark 1.2. Excellent article outlining some of the practical differences between streaming and static analytics.
(3) NewSQL vendor MemSQL announced availability of its Spark Connector, which it claims offers seamless connectivity with Spark. More coverage here, here and here; analysis here.
(4) Learning Spark, by Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia is now available from O’Reilly Media.
(5) InfoWorld selected Apache Spark (along with 31 other products and open source projects) for its 2015 Technology of the Year Award.