Tag Archives: AMPLab

Big Analytics Roundup (November 16, 2015)

Just three main stories this week: possible trouble for a pair of analytic startups; Google releases TensorFlow to open source; and H2O delivers new capabilities at its annual meeting. In other news, the Spark team announces Release 1.5.2, a maintenance release; and the Mahout guy announces Release 0.11.1, with bug fixes and performance improvements. (h/t Hadoop Weekly) Two items of

Read more

Big Analytics Roundup (June 29, 2015)

The Sparkalanche continues; plus we have new releases from Flink and H2O.  And, in case you thought Spark was the last word in Big Analytics, well, think again: here comes Splash, from AMPLab. In the Wall Street Journal’s Saturday Essay, Sean Parker calls for philanthropists to focus on “hackable problems,” a message that should resonate with data scientists.  (Link may require registration.) On

Read more

Automated Predictive Modeling

A colleague asks: can we automate predictive modeling? How we answer the question depends on the context.   Consider the two variations on the question below, with more precise wording: Can we completely eliminate the need for expertise in predictive modeling — so that an “ordinary business user” can do it? Can we make expert analysts more productive by automating

Read more

R Interface to Apache Spark

The team at AMPLab has announced a developer preview of SparkR, an R package enabling R users to run jobs on an Apache Spark cluster.   Spark is an open source project that supports distributed in-memory computing for advanced analytics, such as fast queries, machine learning, streaming analytics and graph engines.  Spark works with every data format supported in Hadoop, and supports YARN 2.2. SparkR exposes the Spark

Read more