Thomas Dinsmore's Blog

Spark 1.4 Released

June 12, 2015

Written by:

On June 11, the Spark team announced availability of Release 1.4. More than 210 contributors from 70 different organizations contributed more than 1,000 patches. Spark continues to expand its contributor base, the best measure of health for an open source project.

Spark Core

The Spark team continues to improve Spark operability, performance and compatibility. Key enhancements include:

The first phase in Project Tungsten performance improvements, a cache-friendly sort algorithm
Also for improved performance, serialized shuffle output
For the Spark UI, visualization for Spark DAGs and operational monitoring
A REST API for application information, such as job, stage, task and storage status
For Python users, support for Python 3.x, plus external spilling for Python groupByKey operations
Two YARN enhancements: support for YARN on EC2 and security for long-running YARN applications
Two Mesos enhancements: Docker support and cluster mode.

DataFrames and SQL

This release includes extensions of analytic functions for DataFrames, operational utilities for Spark SQL and support for ORCFile format.

Hive-style partitioning support in data sources API
Ability to optimize very large joins through sort-merge
Improved DataFrame API for equijoins and self-joins
Dedicated UI for SQL JDBC
Improved DataFrame and SQL error message reporting
Window function support
Column arithmetics in DataFrames
Summary statistics (count, mean, std, min and max) for DataFrames
Rollup and cube functions

A complete list of enhancements to the DataFrame API is here.

R Interface

AMPLab released a developer version of SparkR in January 2014. In June 2014, Alteryx and Databricks announced a partnership to lead development of this component. In March, 2015, SparkR officially merged into Spark.

SparkR offers an interface to use Apache Spark from R. In Spark 1.4, SparkR supports operations like selection, filtering and aggregation on large datasets. It’s important to note that as of this release SparkR does not support an interface to MLLib, Streaming or GraphX.

Machine Learning

In Spark 1.4, ML pipelines graduate from alpha release, add feature transformations (Vector Assembler, String Indexer, Bucketizer etc.) and a Python API. Additional enhancements to ML include:

ElasticNet: Binary Logistic Regression with L1/L2
OnevsRest: Multiclass to Binary Reduction
Standardized developer APIs

There appears to be an effort under way to rebuild MLLib’s supervised learning algorithms in ML.

Enhancements to MLLib include:

Stabilized API for decision trees, RandomForests and GradientBoosted Trees
API for feature attributes
Bernoulli Naive Bayes algorithm
Latent Dirichlet Allocation with online variational inference
Support recommendAll in matrix factorization model
PMML export for k-means, linear regression, ridge regression, lasso, support vector machines and binary logistic regression

There is a single enhancement to GraphX in Spark 1.4, a personalized PageRank. Spark’s graph analytics capabilities are comparatively static.

Streaming

The enhancements to Spark Streaming include improvements to the UI plus enhanced support for Kafka and Kinesis and a pluggable interface for write ahead logs. Enhanced support for Kafka includes better error reporting, support for Kafka 0.8.2.1 and Kafka with Scala 2.11, input rate tracking and a Python API for Kakfa direct mode.

2 responses to “Spark 1.4 Released”

Big Analytics Roundup (June 15, 2015) | The Big Analytics Blog

June 15, 2015 at 8:45 am

[…] announces Release 1.4. (See post here.) Key […]

Reply
Live Notes from Spark Summit | The Big Analytics Blog

June 15, 2015 at 1:31 pm

[…] release manager Patrick Wendell recaps features of 1.4 (see separate post here). Coming in 1.5: more Project Tungsten, more streaming features, expansion of SparkR to cover […]

Reply

Leave a comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Thomas Dinsmore's Blog

Spark 1.4 Released

AI Is Coming For Your Job!!!

Spring 2024 Preview

More on AI Venture Funding

Spark 1.4 Released

Share this:

2 responses to “Spark 1.4 Released”

Leave a comment Cancel reply

AI Is Coming For Your Job!!!

Spring 2024 Preview

More on AI Venture Funding