Tag Archives: MLLib

Big Analytics Roundup (June 22, 2015)

Last week’s Spark Summit is the big news driver for this roundup: On the Databricks blog, Scott Walent recaps the summit here Anmol Rajpurohit writes KDnuggets’ play-by-play for Day One and Day Two My preliminary report is here; full report when slides are available from the sessions. Spark will be one of several technologies featured at the inaugural In-Memory Computing

Read more

Spark 1.4 Released

On June 11, the Spark team announced availability of Release 1.4.  More than 210 contributors from 70 different organizations contributed more than 1,000 patches.  Spark continues to expand its contributor base, the best measure of health for an open source project. Spark Core The Spark team continues to improve Spark operability, performance and compatibility.  Key enhancements include: The first phase in

Read more

Distributed Analytics: A Primer

Can we leverage distributed computing for machine learning and predictive analytics? The question keeps surfacing in different contexts, so I thought I’d take a few minutes to write an overview of the topic. The question is important for four reasons: Source data for analytics frequently resides in distributed data platforms, such as MPP appliances or Hadoop; In many cases, the

Read more