2013 Rexer Data Miner Survey
Rexer Analytics published its 2013 Data Miner Survey just before the Holidays, and it’s an excellent read.
As always when working with survey research, one should use some caution in interpreting the results; it’s very difficult to build a representative sample of analysts and data miners. While it is easy to find fault with Rexer’s sample — which vendors who are unhappy with some of the findings will likely try to do — there is no better survey of working analysts available today.
- Customer Analytics is the most frequently cited application for analytics:
- Understanding customers
- Improving customer experience
- Customer acquisition, upsell and cross-sell
- Respondents recognize growing data volumes, but the size of their analytic data sets is stable
- In other words, one should not confuse managing Big Data with analyzing Big Data
- R is the most widely used analytic software
- 70% of respondents say they use R
- 24% say R is their primary tool, more than any other software
- Text mining is mainstream; 70% of respondents say they mine text now or plan to start
- Time to deployment remains an issue; respondents report deployment cycles ranging from weeks to a year or more
One of the most interesting pieces of analysis in the survey is a clustering based on the importance ratings of tool selection criteria. Rexer’s analysis reveals two principal dimensions in the data, one labeled as “Cost” and the other labeled as “Ease of Use and Interface Quality”. The largest cluster, which includes respondents who rated everything important, should be discounted as an artifact of questionnaire design; it reflects a phenomenon known as the “wrist effect”, where respondents simply check all of the boxes on one end of the scale. Of the remaining respondents:
- Respondents who value the ability to write one’s own code generally do not value ease of use, and vice versa. These respondents are most likely to cite SAS or R as their primary software
- Among these users, those who cite the importance of cost are much more likely to cite R as their primary tool
- Those who place a lower value on cost tend to value the quality of the user interface
- Respondents who value ease of use and the quality of the user interface are more likely to be new to analytics
- These respondents are most likely to cite Statistica, Rapid Miner and IBM SPSS Modeler as their primary tool
For more information about the survey and to get a copy, go here.