Category Archives: Thoughtware

Benchmark: Spark Beats MapReduce

A group of scientists affiliated with IBM and several universities report on a detailed analysis of MapReduce and Spark performance across four different workloads.  In this benchmark, Spark outperformed MapReduce on Word Count, k-Means and Page Rank, while MapReduce outperformed Spark on Sort. On the ADT Dev Watch blog Dave Ramel summarizes the paper, arguing that it “brings into question..Databricks Daytona GraySort claim”.  This point refers to Databricks’ record-setting

Read more

O’Reilly Data Science Survey 2015

O’Reilly releases its 2015 Data Science Salary Survey.  The report, authored by John King and Roger Magoulas summarizes results from an ongoing web survey.  The 2015 survey includes responses from “over 600” participants, down from the “over 800” tabulated in 2014. The authors note that the survey includes self-selected respondents from the O’Reilly audience and may not generalize to the

Read more

Spark 1.4 Released

On June 11, the Spark team announced availability of Release 1.4.  More than 210 contributors from 70 different organizations contributed more than 1,000 patches.  Spark continues to expand its contributor base, the best measure of health for an open source project. Spark Core The Spark team continues to improve Spark operability, performance and compatibility.  Key enhancements include: The first phase in

Read more

How to Buy SAS Visual Analytics

Stories about SAS Visual Analytics are among the most widely read posts on this blog.  In the last two years I’ve received many queries from readers who complain that it’s hard to get clear answers about the software from SAS. In software procurement, the customer has bargaining power until the deal closes; after that, power shifts to the vendor.  

Read more

Software for High Performance Advanced Analytics

Strata+Hadoop World week is a good opportunity to update the list of platforms for high-performance advanced analytics.  Vendors are hustling this week to announce their latest enhancements; I’ll post updates as needed. First some definition.  The scope of this analysis includes software with the following properties: Support for supervised and unsupervised machine learning Support for distributed processing Open platform or multi-vendor

Read more
« Older Entries