Today, the Spark project announced availability of Apache Spark 1.0.0, the first major release since the Apache Foundation named Spark a top-level project. (Additional announcements here, here and here). With 117 contributors, Spark continues to build critical mass and engagement in the data science community.
Features of the new release include:
- API stability
- Integration with YARN security
- Operational and packaging improvements
- Spark SQL (Alpha)
- MLLib enhancements, including
- Support for sparse feature vectors
- Scalable decision trees for classification and regression
- Distributed SVD and PCA
- Model evaluation functions
- L-BFGS optimization primitive
- GraphX enhancements, including performance improvements in graph loading, edge reversal and neighborhood computation
- Streaming enhancements, including optimized performance for stateful stream transformations, improved Flume support and automated state cleanup for long-running jobs
- Extended Java and Python support
- Significant improvements to documentation
…and many small improvements, documented in the Release Notes.
For more information on Spark, read this backgrounder.