Big Analytics Roundup (April 13, 2015)

This week:  Microsoft closes on the acquisition of Revolution Analytics, plus lots of cloud news driven by the AWS Summit in San Francisco.

But the top item for the week is this History of Hadoop, from Marko Bonaci.

Update:  OK, the top item is actually this piece from Dave McClure on unicorns and dinosaurs.

Amazon Web Services

If you thought Amazon would let Microsoft own the cloud-based machine learning space, think again.  Amazon introduces Amazon Machine Learning. (h/t Oliver Vagner)

Apache Drill

In Big Data Quarterly, Jim Scott offers an excellent summary of Apache Drill and its significance for the Hadoop ecosystem

Apache Mahout

The Mahout team announces Release 0.10, which includes a distributed algebraic optimizer, a Scala API and the Spark interface.  The team has optimistically re-branded these capabilities as Samsara, which suggests that we can escape from Mahout by following the Buddhist path.

Apache Spark

Advanced Analytics with Spark, the new book by Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills, is now available.

Writing in insideBIGDATA, MemSQL CEO Eric Frenkiel champions Spark working together with MemSQL.


Writing in ITBusinessEdge Arthur Cole says analytics is heading toward the cloud.  Newsflash: analytics is already in the cloud, big time.  There are organizations today that run most or all of their advanced analytics in the cloud, and the most sophisticated have done so for years.

Cloud is eating the analytics world because predictive modeling requires large-scale computing power in short bursts; organizations that scale up on-premises computing power to meet peak requirements will own a lot of unused server capacity.  Moreover, cloud enables analysts to radically reduce cycle time and build better models with massively parallel test-and-learn operations.

In an InfoWorld piece headlined Big Data is All About the Cloud Matt Asay argues that Big Data is about other things, too, like streaming and dedicated task clusters.  He interviews Matt Wood of Amazon Web Services, who thinks cloud is a good thing.


Databricks announces that it is now an Amazon Web Services Advanced Technology Partner.

On the Databricks blog, Andy Konwinski recaps Spark Summit East.


News of the company’s plan to go private produces a slew of overwrought articles about “generational shifts” in data integration like this one from Alex Woodie in Datanami.   Venture capitalists pay for potential and Wall Street pays for growth, but private owners want recurring revenue and profit margins; hence, private ownership is the best model for firms that are well along in the hype cycle, past the “Trough of Disillusionment” and well into the “Slope of Enlightenment”.  It shouldn’t surprise anyone that SnapLogic, Alteryx, ClearstoryData, Trifacta and Paxata all have higher growth rates than Informatica; after all, 1+1 equals 100% growth.  Nevertheless, the total revenue of those companies amounts to rounding error on Informatica’s 10-K, so grave-dancing seems premature.



Microsoft closes on its acquisition of Revolution Analytics (previously discussed here, here and here.)   Financial terms are undisclosed, so we will just have to troll through MSFT’s next 10-Q to confirm rumors about the price.  Additional coverage here and here.  Dave Rich, CEO of Revolution Analytics, assumes the role of General Manager, Advanced Analytics for Microsoft.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.