Apache Spark is 1.0

Today, the Spark project announced availability of Apache Spark 1.0.0, the first major release since the Apache Foundation named Spark a top-level project.  (Additional announcements here, here and here). With 117 contributors, Spark continues to build critical mass and engagement in the data science community.

Features of the new release include:

  • API stability
  • Integration with YARN security
  • Operational and packaging improvements
  • Spark SQL (Alpha)
  • MLLib enhancements, including
    • Support for sparse feature vectors
    • Scalable decision trees for classification and regression
    • Distributed SVD and PCA
    • Model evaluation functions
    • L-BFGS optimization primitive
  • GraphX enhancements, including performance improvements in graph loading, edge reversal and neighborhood computation
  • Streaming enhancements, including optimized performance for stateful stream transformations, improved Flume support and automated state cleanup for long-running jobs
  • Extended Java and Python support
  • Significant improvements to documentation

…and many small improvements, documented in the Release Notes.

For more information on Spark, read this backgrounder.

 

 

 

 

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.