My inbox continues to fill with Google Alerts about Microsoft’s announced purchase of Revolution Analytics — too numerous to link.
Most of these stories simply repackage the Microsoft announcement.
Clint Boulton of the WSJ’s CIO Journal writes one of the best analyses:
Microsoft is betting on the timeliness of its acquisition as more businesses adopt analytics. Revolution’s software helps companies use R, an open source programming language that more than two million programmers use daily to build predictive models. R is popular among university computer science students, many of whom continue to use it in their careers as data scientists.
Data scientists who extract data from of a data warehouse or Hadoop processing system, use R to slice and dice it for insights, and visualize the results. But businesses analyzing financial, social media and other data often need to scale the analytics across clusters of computers.
Several analysts pass along the factoid that two million people use R. The truth is that nobody has any idea how many people use R; we don’t even know how many have downloaded the software. The New York Times pointed out the difficulty in its piece five years ago:
While it is difficult to calculate exactly how many people use R, those most familiar with the software estimate that close to 250,000 people work with it regularly.
It’s possible that R has gained 1,750,000 users in the interceding five years. It’s also possible that R has gained 10,000,000 users. “Those most familiar with the software” are simply guessing.
While most analysts are neutral to positive on Microsoft’s move, Mr. Dan Woods takes a contrary view. In an article published in Forbes and cross-posted on multiple platforms, Mr. Woods argues that Microsoft was wrong to buy Revolution Analytics, and instead should buy Tibco. (That is the implication of his argument that Microsoft should “emulate” Tibco, since the only way to “emulate” Tibco is to own the clump of software Tibco packages up as TERR.)
Mr. Woods is a “content specialist”, as freelance writers call themselves today, and his expertise in analytics is exemplified by his most recent book, Wikis for Dummies, published in 2007. One suspects that the private equity firm that acquired Tibco in September is peddling the pieces, and has engaged “content specialists” to bang the drum.
Mr. Woods gets two things right. It’s true that R is a mess, and it is also true that the GPL license makes R difficult to commercialize. R’s messiness is a byproduct of crowdsourced development; it is a feature to its devotees and a bug for everyone else. (For those who simply cannot tolerate R’s messiness there is a simple solution: use Python.) Under the GPL license, any enhancements become part of the free distribution, so if you distribute a product built with R you must share the source code of your product as well.
At the crux of his argument, though, Mr. Woods gets it wrong:
Revolution Analytics has made a business, like many open source-based companies, of supporting Open Source R.
This is factually incorrect. Revolution only recently started to offer a consulting service for open source R users; for most of its history, its business was built around Revolution R Enterprise, a commercially supported enhanced R distribution. This is not a trivial distinction. Cloudera Hadoop, for example, is based on Apache Hadoop, but it is not the same thing; while many enterprises use commercially supported Hadoop distributions (from vendors like Cloudera, Hortonworks or MapR), hardly anyone uses open source Apache Hadoop in production.
The same is true for R; while many enterprises have an issue using open source R, they are willing to deploy commercially supported R distributions (such as Oracle R or Revolution R). This is the business Microsoft enters by acquiring Revolution Analytics.
Regarding Mr. Woods’ point about the need to rebuild R from the ground up, that is neither possible nor necessary. The GPL license prevents anyone from “rebuilding” R as a commercial venture; if anyone “rebuilds” the language it will be the open source development team itself.
In any case, one need not “make R scale” — one need only provide an R API to other platforms (such as Apache Spark or dbLytix) that can scale, so that R users can interface with them. This is the approach taken by Revolution Analytics’ ScaleR software, which is actually written in C, but includes an interface from the R programming language. By building this component into Azure, Microsoft can offer those who use R locally a scaleable back end.
Update: Mr. Woods doubles down here.