Statwing: A Review

In every enterprise that uses analytics, there are a few power users who need the most advanced tools all of the time, and an army of casual users who need to do simple analysis now and then.  For the latter group, cloud-based analytics make perfect sense; users get the tools they need when they need them, and the organization gets out of the business of licensing, hosting, distributing and maintaining infrequently used software.

Statwing launched in 2012, and has recently scored some buzz and funding; it wants to “make your data…dreams come true.”  A review of the service seems timely.

Registration is simple; no credit card is needed for a trial license, just plug in your email address and go.   Statwing lets you try its Silver plan for fourteen days at no charge; after that, you can pay $25 per month to stay on the Silver plan, upgrade to the Gold plan for $100 per month, or downgrade to the free public plan.   The Silver and Gold plans keep your data private and let you share charts; the Gold plan lets you upload more data.

I tried uploading a few data sets.  The 1998 KDD Cup data was too large for the Silver plan, but a couple of other smaller data sets uploaded quickly, in seconds.

If you don’t have any data to work with, no problem: Statwing offers a few of its own to try:

Screen Shot 2014-02-01 at 3.15.41 PM

Once you select a data set, Statwing displays the variables in your data set on the left, with most of the screen available for your charts.  There is a video tutorial narrated by a robot, which is marginally useful and not really necessary since the application is very intuitive and easy to use.  (Statwing: is it all that expensive to hire someone to read the script?)

Screen Shot 2014-02-01 at 3.22.54 PM

Statwing does two things well: one-way profiles and two-way tests of correlation.  (Statwing claims to do crosstabs, but after watching the video and reading the available help, I can’t figure out how).

Univariate profiles offer the user a nice graphic and the option to toggle statistical measures on or off:

Screen Shot 2014-02-01 at 3.24.19 PM

Bivariate analysis gives the user a “plain English” interpretation of the statistical tests, which is helpful.

Screen Shot 2014-02-01 at 3.28.02 PM

Like any other statistical package, Statwing discovered a statistically significant relationship between two columns of random numbers in one of my test data sets.  This simply illustrates that making analytics “easy” isn’t helpful unless the users have an actual clue about what they are doing:

Screen Shot 2014-02-01 at 3.20.52 PM

Some caveats:

  • Statwing does not handle time series data — a problem since many enterprise users work with time series
  • By default, Statwing treats coded variables as numeric variables; the user can override this, but see my comment about users having a clue
  • Statwing lacks even the most basic tools for data processing, so you will need to prepare your data table in some other tool
  • Significance tests appear to be hard coded at 95% confidence, which is relatively “tight” for commercial work

Overall, this service is well implemented and easy to use.  It does very little that other tools can’t do; for example, if you use SurveyMonkey or a similar tool to conduct an online survey you can simply do the analysis there and forget about Statwing.   Given its limited functionality, Statwing is seriously overpriced; the Gold Plan will run you $1,200 per year ($800 if billed in advance); at that pricing, there are a number of alternatives that are just as easy to use.

To crack the enterprise market, Statwing will need to add more analytic features to current capabilities and offer enterprise licensing with concurrent user pricing.

Book Review: Big Data Big Analytics

Big Data Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses, by Michael Minelli, Michele Chambers and Ambiga Dhiraj.

Books on Big Data tend to fall into two categories: they are either “strategic” and written at a very high level, or they are cookbooks that tell you how to set up a Hadoop cluster.  Moreover, many of these books focus narrowly on data management — an interesting subject in its own right for those who specialize in the discipline, but yawn-inducing for managers in Sales, Marketing, Risk Management, Merchandising or Operations who have businesses to run.

Hey, we can manage petabytes of data.  Thank you very much.  Now go away.

Big Data Big Analytics appeals to business-oriented readers who want a deeper understanding of Big Data, but aren’t necessarily interested in writing MapReduce code.   Moreover, this is a book about analytics — not just how we manage data, but what we do with it and how we make it drive value.

The authors of this book — Michael Minelli, Michele Chambers and Ambiga Dhiraj — combine in-depth experience in enterprise software and data warehousing with real-world experience delivering analytics for clients.  Building on interviews with a cross-section of subject matter experts — there are 58 people listed in the acknowledgements — they map out the long-cycle trends behind the explosion of data in our economy, and the expanding tools to manage and learn from that data.  They also point to some of the key bottlenecks and constraints enterprises face as they attempt to deal with the tsunami of data, and provide sensible thinking about how to address these constraints.

Big Data Big Analytics includes rich and detailed examples of working applications.  This is refreshing; books in this category tend to push case studies to the back of the book, or focus on one or two niche applications.  This book documents the disruptive nature of Big Data analytics across numerous vertical and horizontal applications, including Consumer Products, Digital Media, Marketing, Advertising, Fraud and Risk Management, Financial Markets and Health Care.

The book includes excellent chapters that describes the technology of Big Data, chapters on Information Management, Business Analytics, Human Factors — people, process, organization and culture.   The final chapter is a good summary of Privacy and Ethics.

The Conclusion aptly summarizes this book: it’s not how much data you have, it’s what you do with it that matters.  Big Data Big Analytics will help you get started.

Recent Books on Analytics

For your Christmas gift list,  here is a brief roundup of four recently published books on analytics.

Business Intelligence in Plain Language by Jeremy Kolb (Kindle Edition only) is a straightforward and readable summary of conventional wisdom about Business Intelligence.  Unlike many guides to BI, this book devotes some time and attention to data mining.  As an overview, however, Mr. Kolb devotes too little attention to the most commonly used techniques in predictive analytics, and too much attention to more exotic methods.  There is nothing wrong with this per se, but given the author’s conventional approach to implementation it seems eccentric.  At $6.99, though, even an imperfect book is a pretty good value.

Tom Davenport’s original Harvard Business Review article Competing on Analytics is one of the ten most-read articles in HBR’s history; Google Trends shows a spike in search activity for the term “analytics” concurrent with its publication, and steady growth in interest since them.  Mr. Davenport’s latest book  Enterprise Analytics: Optimize Performance, Process, and Decisions Through Big Data is a collection of essays by Mr. Davenport and members of the International Institute of Analytics, a commercial research organization funded in part by SAS.   (Not coincidentally, SAS is the most frequently mentioned analytics vendor in the book).  Mr. Davenport defines enterprise analytics in the negative, e.g. not “sequestered into several small pockets of an organization — market research, or actuarial or quality management”.    Ironically, though, the best essays in this book are about narrowly focused applications, while the worst essay, The Return on Investments in Analytics, is little more than a capital budgeting primer for first-year MBA students, with the word “analytics” inserted.  This book would benefit from a better definition of enterprise analytics, the value of “unsequestering” analytics from departmental silos, and more guidance on exactly how to make that happen.

Jean-Paul Isson and Jesse Harriott have hit a home run with Win with Advanced Business Analytics: Creating Business Value from Your Data, an excellent survey of the world of Business Analytics.   This book combines an overview of traditional topics in business analytics (with a practical “what works/what does not work” perspective) with timely chapters on emerging areas such as social media analytics, mobile analytics and the analysis of unstructured data.  A valuable contribution to the business library.

The “analytical leaders” featured in Wayne Eckerson’s  Secrets of Analytical Leaders: Insights from Information Insiders — Eric Colson, Dan Ingle, Tim Leonard, Amy O’Connor, Ken Rudin, Darren Taylor and Kurt Thearling — are executives who have actually done this stuff, which distinguishes them from many of those who write and speak about analytics.  The practical focus of this book is apparent from its organization — departing from the conventional wisdom of how to talk about analytics, Eckerson focuses on how to get an analytics initiative rolling, and keep it rolling.  Thus, we read about how to get executive support for an analytics program, how to gain momentum, how to hire, train and develop analysts, and so forth.  Instead of writing about “enterprise analytics” from a top-down perspective, Eckerson writes about how to deploy analytics in an enterprise — which is the real problem that executives need to solve.

Book Review: Antifragile

There is a (possibly apocryphal) story about space scientist James Van Allen.  A reporter asked why the public should care about Van Allen belts, which are layers of particles held in place by Earth’s magnetic field.    Dr. Van Allen puffed on his pipe a few times, then responded:  “Van Allen belts?  I like them.  I make a living from them.”

One can imagine a similar conversation with Nassim Nicholas Taleb, author of The Black Swan and most recently Antifragile: Things That Gain From Disorder.

Reporter: Why should the public care about Black Swans?

Taleb: Black Swans?  I like them.  I make a living from them.

And indeed he does.   Born in Lebanon, educated at the University of Paris and the Wharton School, Mr. Taleb pursued a career in trading and arbitrage (UBS, CS First Boston, Banque Indosuez, CIBC Wood Gundy, Bankers Trust and BNP Paribas) where he developed the practice of tail risk hedging, a technique designed to insure a portfolio against rare but catastrophic events.  Later, he established his own hedge fund (Empirica Capital), then retired from active trading to pursue a writing and academic career.  Mr. Taleb now positions at NYU and Oxford, together with an assortment of adjuncts.

Antifragile is Mr. Taleb’s third book in a series on randomness.  The first, Fooled by Randomness, published in 2001, made Fortune‘s  2005  list of “the 75 smartest books we know.”   The Black Swan, published in 2007, elaborated Mr. Taleb’s theory of Black Swan Events (rare and unforeseen events of enormous consequences) and how to cope with them; the book has sold three million copies to date in thirty-seven languages.   Mr. Taleb was elevated to near rock-star status on the speaker circuit in part due to his claim to have predicted the recent financial crisis, a claim that would  be more credible had he published his book five years earlier.

I recommend this book; it is erudite, readable and full of interesting tidbits, such as an explanation of Homer’s frequent use of the phrase “the wine-dark sea”.   (Mr. Taleb attributes this to the absence of the word ‘blue’ in Ancient Greek.  I’m unable to verify this, but it sounds plausible.)  Erudition aside, Antifragile is an excellent sequel to The Black Swan because it enables Mr. Taleb to elaborate on how we should build institutions and businesses that benefit from unpredictable events.  Mr. Taleb contrasts the “too big to fail” model of New York banking with the “fail fast” mentality of Silicon Valley, which he cites as an example of antifragile business.

Some criticism is in order.  Mr. Taleb’s work sometimes seems to strive for a philosophical universalism that explains everything but provides few of the practical heuristics which he says are the foundation of an antifragile order.  In other words, if you really believe what Mr. Taleb says, don’t read this book.

Moreover, it’s not exactly news that there are limits to scientific rationalism; the problem, which thinkers have grappled with for centuries, is that it is difficult to build systematic knowledge outside of  a rationalist perspective.   One cannot build theology on the belief that the world is a dark and murky place where the gods can simply zap you at any time for no reason.  Mr. Taleb cites Nietzsche as an antifragile philosopher, and while Nietzsche may be widely read among adolescent lads and lassies, his work is pretty much a cul-de-sac.

One might wonder what the study of unpredictable events has to do with predictive analytics, where many of us make a living.  In Reckless Endangerment, Gretchen Morgenstern documents how risk managers actually did a pretty good job identifying financial risks, but that bank leadership chose to ignore, obfuscate or shift risks to others.  Mr. Taleb’s work offers a more compelling explanation for this institutional failure than the customary “greedy robber baron” theory.  Moreover, everyone in the predictive analytics business (and every manager who relies on predictive analytics) should remember that predictive models have boundary conditions, which we ignore at our peril.