Tag Archives: Hadoop

Still More Comments on Microsoft and Revolution Analytics

Three full business days post-announcement, and stories continue to roll in. Stephen Sowyer of TDWI writes an excellent summary of what Microsoft will likely do with Revolution Analytics.  He correctly notes, for example, that Microsoft is unlikely to develop a business user interface for R with code-generating capabilities (comparable to SAS Enterprise Guide, for example).  This is difficult to do,

Read more

SAS in Hadoop: An Update

SAS supports several different products that run “inside” Hadoop based on two different in-memory architectures: (1) The SAS High Performance Analytics suite, originally designed to run in dedicated Teradata and Greenplum appliances, includes five modules: Statistics, Data Mining, Text Mining, Econometrics and Optimization. (2) A second set of products — SAS Visual Analytics, SAS Visual Statistics and SAS In-Memory Statistics for Hadoop

Read more

Spark Summit 2014 Roundup

Key highlights from the 2014 Spark Summit: Spark is the single most active project in the Hadoop ecosystem Among Hadoop distributors, Cloudera and MapR are clear leaders with Spark SAP now offers a certified Spark distribution and integration with HANA Datastax has delivered a Cassandra connector for Spark Databricks plans to offer a cloud service for Spark Spark SQL will absorb

Read more

Python for Analytics

A reader complains that I did not include Python in a survey of Machine Learning in Hadoop.  It’s a fair point.  There was a lively debate last year between R and Python advocates, variously described as a war or a boxing match.  Matt Asay argued that Python is displacing R; Sharon Machlis and David Smith countered.  In this post I review the

Read more

Analytic Startups: Skytree

Skytree started out as an academic machine learning project developed at Georgia Tech’s Fastlab.  Leadership shopped the software to a number of software vendors prior to 2011 and, finding no buyers, launched as a standalone venture in 2012. In April 2013, Skytree announced Series A funding of $18 million, with backing from U.S. Venture Partners, UPS, Javelin Venture Partners and

Read more

Apache Spark for Big Analytics (Updated for Spark Summit and Release 1.0.1)

Updated and bumped July 10, 2014. For a powerpoint version on Slideshare, go here. Introduction Apache Spark is an open source distributed computing framework for advanced analytics in Hadoop.  Originally developed as a research project at UC Berkeley’s AMPLab, the project achieved incubator status in Apache in June 2013 and top-level status in February 2014.  According to one analyst, Apache Spark is among the five

Read more
« Older Entries Recent Entries »