Updated and bumped April 11, 2014. The emergence of Apache Spark is a key development for Big Analytics in 2014.   Spark, a top-level Apache project, is an open source distributed computing framework for advanced analytics in Hadoop.  Originally developed as a research project at UC Berkeley’s AMPLab, the project achieved incubator status in Apache in June 2013 and […]


0xdata (“Hexa-data”) is a small group of smart people from Stanford and Silicon Valley with VC backing and an open source software project for advanced analytics (H2O).  Founded in 2011, 0xdata first appeared on analyst dashboards in 2012 and has steadily built a presence in the data science community since then. 0xdata operates on a […]


A reader complains that I did not include Python in a survey of Machine Learning in Hadoop.  It’s a fair point.  There was a lively debate last year between R and Python advocates, variously described as a war or a boxing match.  Matt Asay argued that Python is displacing R; Sharon Machlis and David Smith countered.  In […]


Analytic users are not all the same; in most organizations, there are a number of different user “personalities”, or personas, with distinct needs.  If you develop an analytics architecture for your organization or develop analytic software to sell to others, it is important to understand these personas.  In this essay, I profile four personas: Power […]


A colleague asks: can we automate predictive modeling? How we answer the question depends on the context.   Consider the two variations on the question below, with more precise wording: Can we completely eliminate the need for expertise in predictive modeling — so that an “ordinary business user” can do it? Can we make expert […]


Dell announced this morning that it has acquired Statsoft, a privately held company that distributes Statistica, a suite of software for statistics and data mining.   Terms of sale were not announced. Founded by academics in 1984, Statsoft has developed a loyal following at the low end of the analytics market, where it offers a […]


This is the second of a three-part series on the current state of play for machine learning in Hadoop.  Part One is here.  In this post, we cover open source options. As we noted in Part One, machine learning is one of several technologies for analytics; the broader category also includes fast queries, streaming analytics […]


Much has changed since I last blogged on this subject a year ago (here and here).  This is the first of a three-part blog covering the current state of play for machine learning in Hadoop.  I use the term “machine learning” deliberately, to refer to tools that can learn from data in an automated or […]


Funding for analytic ventures remained robust in January, with 17 significant funding transactions and three acquisitions.   Key themes: Outcomes-based medicine and health care Vertical solutions for the energy industry Solutions for risk management Mobile analytics, including location-based targeting and app metrics Social media sentiment analysis Graph engines (and solutions based on graph engines) In-memory […]