Tag Archives: Apache Mahout

Big Analytics Roundup (August 24, 2015)

Lots of Mesos news this week (thanks to MesosCon in Seattle), including reports that Microsoft wants to acquire Mesosphere. Rashid Jamal surveys the battle space for the next generation big data analysis framework.  Good overview of how some of the top projects and vendors are positioning themselves. On LinkedIn, Bernard Marr reports on the “top ten” Hadoop distributions, including Cloudera,

Read more

Big Analytics Roundup (April 27, 2015)

In the news this week: ODP, Spark Summit and a culinary FAIL from IBM Watson. MapR to ODP: Get Lost On the MapR blog, CEO John Schroeder describes ODP as “a Hortonworks marketing vehicle that provides a graceful market exit for Greenplum Pivotal,”  thus voicing thoughts shared by everyone not employed by Hortonworks and Pivotal.  (Additional coverage here.)  Schroeder notes

Read more

Software for High Performance Advanced Analytics

Strata+Hadoop World week is a good opportunity to update the list of platforms for high-performance advanced analytics.  Vendors are hustling this week to announce their latest enhancements; I’ll post updates as needed. First some definition.  The scope of this analysis includes software with the following properties: Support for supervised and unsupervised machine learning Support for distributed processing Open platform or multi-vendor

Read more

Distributed Analytics: A Primer

Can we leverage distributed computing for machine learning and predictive analytics? The question keeps surfacing in different contexts, so I thought I’d take a few minutes to write an overview of the topic. The question is important for four reasons: Source data for analytics frequently resides in distributed data platforms, such as MPP appliances or Hadoop; In many cases, the

Read more

2014 Predictions: Advanced Analytics

A few predictions for the coming year. (1) Apache Spark matures as the preferred platform for advanced analytics in Hadoop. Spark will achieve top-level project status in Apache by July; that milestone, together with inclusion in Cloudera CDH5, will validate the project’s rapid maturation.  Organizations will increasingly question the value of “point solutions” for Hadoop analytics versus Spark’s integrated platform

Read more

Apache Spark for Big Analytics (Updated for Spark Summit and Release 1.0.1)

Updated and bumped July 10, 2014. For a powerpoint version on Slideshare, go here. Introduction Apache Spark is an open source distributed computing framework for advanced analytics in Hadoop.  Originally developed as a research project at UC Berkeley’s AMPLab, the project achieved incubator status in Apache in June 2013 and top-level status in February 2014.  According to one analyst, Apache Spark is among the five

Read more
« Older Entries Recent Entries »