Tag Archives: MapReduce

2016 Big Analytics Predictions Roundup

Before publishing my own predictions for 2016 later this week, I thought it would be fun to round up published predictions on analytics and Big Data.  Looking through this list, I see a few patterns: — Streaming is hot.  Analysts do not seem to understand distinctions between streaming data, streaming analytics and real-time decisioning. — “Data Science” continues to be a

Read more

Benchmark: Spark Beats MapReduce

A group of scientists affiliated with IBM and several universities report on a detailed analysis of MapReduce and Spark performance across four different workloads.  In this benchmark, Spark outperformed MapReduce on Word Count, k-Means and Page Rank, while MapReduce outperformed Spark on Sort. On the ADT Dev Watch blog Dave Ramel summarizes the paper, arguing that it “brings into question..Databricks Daytona GraySort claim”.  This point refers to Databricks’ record-setting

Read more

Spark is Too Big to Fail

Reacting to growing interest in Apache Spark, there is a developing contrarian meme: David Ramel asks: are Spark and Hadoop friends or foes? Jack Vaughan compares Spark to the PDP-11, dismisses it as “just processing.” Doug Henschen praises Spark, pans Databricks Nicole Laskowski complains that Spark Summit East “felt like a Databricks show.” Andrew Oliver thinks Spark needs to grow up Andrew

Read more

Distributed Analytics: A Primer

Can we leverage distributed computing for machine learning and predictive analytics? The question keeps surfacing in different contexts, so I thought I’d take a few minutes to write an overview of the topic. The question is important for four reasons: Source data for analytics frequently resides in distributed data platforms, such as MPP appliances or Hadoop; In many cases, the

Read more

RevoScaleR Beats SAS, Hadoop for Regression on Large Dataset

Still catching up on news from Strata conference. This post from Revolution Analytics’ blog summarizes an excellent paper jointly presented at Strata by Allstate and Revolution Analytics. The paper documents how a team at Allstate struggled to run predictive models with SAS on a data set of 150 million records.  The team then attempted to run the same analysis using

Read more