Embrace Open Source Analytics

Suppose you could implement an analytics platform with comprehensive out-of-the-box capabilities, a flexible programming environment, good visualization capabilities and a growing body of skilled users.  Suppose this platform leveraged a massively parallel architecture for high performance and scalability.  And suppose you could do this without investing in software fees.

You don’t have to suppose, because IBM Netezza helps you leverage the power and capability of R.

R is the best known open source analytics project, but there are many other open source analytics available, including the Data Mining Template Library, the dlib and Orange C++ libraries and the Java Data Mining Package.  In this article, we’ll focus on R.

There are three main reasons R should be part of your enterprise analytics architecture:

  • R has capabilities not available in commercial analytics software
  • Usage of R by analysts is growing rapidly
  • R’s total cost of ownership is attractive

R functionality is a superset of the functionality available in commercial analytics packages. There are currently 3,047 packages published in the CRAN repository, and almost 5,000 packages in all repositories worldwide.  Moreover, the number of available packages is growing rapidly.  While commercial software vendors must prioritize development effort towards features with predictable demand and broad appeal, R developers work under no such constraints.  As a result, new, cutting-edge and niche applications tend to be published in R before they are available in commercial packages.

A customer we’re working with in the life sciences industry wants to apply four new methods to their analytic toolkit.  This customer spends almost a billion dollars each year to run hundreds of thousands of experiments; very small improvements in precision directly impact this customer’s bottom line.  Right now, all of these new methods are available in R, and none are available in commercial packages.

Interest in R is growing exponentially.   According to the most recent Rexer Analytics survey, R is the preferred analytics package for more respondents than for any other analytic software.  R outperforms all other analytics packages on various measures of mindshare, including listserv activity, website popularity, page rank and blogging activity.

Some customers we work with express concerns that open source software may be full of bugs, trojan horses or other security risks.  This view is based on the mistaken belief that developers can publish anything they like in R.  In fact, the R Project has a highly-developed review and testing process, and well-defined procedures for bug tracking and fixing.  R’s large and highly engaged user community ensures that R packages receive as much scrutiny and testing as many commercial software packages.

Like many analytical packages, R performs calculations in memory, which limits the amount of data that can be used in analysis to the size of memory on the host.  IBM Netezza partner Revolution Analytics has developed a commercial version of R (Revolution R Enterprise) that combines the capability and value of open source R with the quality assurance and technical support of vendor-supported software.   Revolution has also developed a set of enhancements that enable R to scale to terabyte-sized problems.  The combination of Revolution R Enterprise and Netezza’s massively parallel architecture provides a truly scalable and high-performance analytics platform.

Open source analytics like R offer firms rich capabilities, a flexible platform and great value.   With Netezza and Revolution Analytics, R is a scalable and high performance platform.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.