Analytic Applications (Part Two): Managerial Analytics

This is the second in a four-part taxonomy of analytics based on how the analytic work product is used.  In the first post of this series, I covered Strategic Analytics, or analytics that support the C-suite.  In this post, I will cover Managerial Analytics: analytics that support middle management, including functional and regional line managers.

At this level, questions and issues are functionally focused:

  • What is the best way to manage our cash?
  • Is product XYZ performing according to expectations?
  • How effective are our marketing programs?
  • Where can we find the best opportunities for new retail outlets?

There are differences in nomenclature across functions, as well as distinct opportunities for specialized analytics (retail store location analysis, marketing mix analysis, new product forecasting), but managerial questions and issues tend to fall into three categories:

  • Measuring the results of existing entities (products, programs, stores, factories)
  • Optimizing the performance of existing entities
  • Planning and developing new entities

Measuring existing entities with reports, dashboards, drill-everywhere (etc.) is the sweet spot for enterprise business intelligence systems.  Such systems are highly effective when the data is timely and credible, reports are easy to use and the system reflects a meaningful assessment framework.  This means that metrics (activity, revenue, costs, profits) reflect the goals of the business function and are standardized to enable comparison across entities.

Given the state of BI technology, analysis teams within functions (Marketing, Underwriting, Store Operations etc.) spend a surprisingly large amount of time preparing routine reports for managers.  (For example, an insurance client asked my firm to perform an assessment of actual work performed by a group of more than one hundred SAS users.  The client was astonished to learn that 80% of the SAS usage could be done in Cognos, which the client also owned).

In some cases, this is simply due to a lack of investment by the organization in the necessary tools and enablers, a problem that is easily fixed.  More often than not, though, the root cause is the absence of consensus within the function of what is to be measured and how performance should be compared across entities.   In organizations that lack measurement discipline, assessment is a free-for-all where individual program and product managers seek out customized reports that show their program or product to the best advantage; in this environment, every program or product is a winner and analytics lose credibility with management.  There is no technical “fix” for this problem; it takes leadership for management to set out clear goals for the organization and build consensus for an assessment framework.

Functional analysts often complain that they spend so much time preparing routine reports that they have little or no time to perform analytics that optimize the performance of existing entities.  Optimization technology is not new, but tends to be used more pervasively in Operational Analytics (which I will discuss in the next post in this series).   Functionally focused optimization tools for management decisions have been available for well over a decade, but adoption is limited for several reasons:

  • First, an organization stuck in the “ad hoc” trap described in the previous paragraph will never build the kind of history needed to optimize anything.
  • Second, managers at this level tend to be overly optimistic about the value of their own judgment in business decisions, and resist efforts to replace intuitive judgment with systematic and metrics-based optimization.
  • Finally, in areas such as Marketing Mix decisions, constrained optimization necessarily means choosing one entity over another for resources; this is inherently a leadership decision, so unless functional leadership understands and buys into the optimization approach it will not be used.

Analytics for planning and developing new entities (such as programs, products or stores) usually require information from outside of the organization, and may also require skills not present in existing staff.  For both reasons, analytics for this purpose are often outsourced to providers with access to pertinent skills and data.  For analysts inside the organization, technical requirements look a lot like those for Strategic Analytics: the ability to rapidly ingest data from any source combined with a flexible and agile programming environment and functional support for a wide range of generic analytic problems.

In the next post in this series, I’ll cover Operational Analytics, defined as analytics whose purpose is to improve the efficiency or effectiveness of a business process.

Analytic Applications (Part One)

Conversations about analytics tend to get muddled because the word describes everything from a simple SQL query to climate forecasting.  There are several different ways to classify analytic methods, but in this post I propose a taxonomy of analytics based on how the results are used.

Before we can define enterprise best practices for analytics, we need to understand how they add value to the organization.  One should not lump all analytics together because, as I will show, the generic analytic applications have fundamentally different requirements for people, processes and tooling.

There are four generic analytic applications:

  • Strategic Analytics
  • Managerial Analytics
  • Operational Analytics
  • Customer-Enabling Analytics

In today’s post, I’ll address Strategic Analytics; the rest I’ll cover in subsequent posts.

Strategic Analytics directly address the needs of the C-suite.  This includes answering non-repeatable questions, performing root-cause analysis and supporting make-or-break decisions (among other things).   Some examples:

  • “How will Hurricane Sandy impact our branch banks?”
  • “Why does our top-selling SUV turn over so often?”
  • “How will a merger with XYZ Co. impact our business?”

Strategic issues are inherently not repeatable and fall outside of existing policy; otherwise the issue would be delegated.   Issues are often tinged with a sense of urgency, and a need for maximum credibility; when a strategic decision must be taken, time is of the essence, and the numbers must add up.   Answers to strategic questions frequently require data that is not readily accessible and may be outside of the organization.

Conventional business intelligence systems do not address the needs of Strategic Analytics, due to the ad hoc and sui generis nature of the questions and supporting data requirements.   This does not mean that such systems add no value to the organization; in practice, the enterprise BI system may be the first place an analyst will go to seek an answer.  But no matter how good the enterprise BI system is, it will never be sufficiently complete to provide all of the answers needed by the C-suite.

The analyst is key to the success of Strategic Analytics.  This type of work tends to attract the best and most capable analysts, who are able to work rapidly and accurately under pressure.  Backgrounds tend to be eclectic: an insurance company I’ve worked with, for example, has a strategic analysis team that includes an anthropologist, an economist, an epidemiologist and graduate of the local community college who worked her way up in the Claims Department.

Successful strategic analysts develop domain, business and organizational expertise that lends credibility to their work.  Above all, the strategic analyst takes a skeptical approach to the data, and demonstrates the necessary drive and initiative to get answers.  This often means doing hard stuff, such as working with programming tools and granular data to get to the bottom of a problem.

More often than not, the most important contribution of the IT organization to Strategic Analytics is to stay out of the way.  Conventional IT production standards are a bug, not a feature, in this kind of work, where the sandbox environment is the production environment.  Smart IT organizations recognize this, and allow the strategic analysts some latitude in how they organize and manage data.   Dumb IT organizations try to force the strategic analysis team into a “Production” framework.  This simply inhibits agility, and encourages top executives to outsource strategic issues to outside consultants.

Analytic tooling tends to reflect the diverse backgrounds of the analytics, and can be all over the map.  Strategic analysts use SAS, R, Stata, Statsoft, or whatever to do the work, and drop the results into Powerpoint.  One of the best strategy analysts I’ve ever worked with used nothing other than SQL and Excel.  Since strategic analysis teams tend to be small, there is little value in demanding use of a single tool set; moreover, most strategic analysts want to use the best tool for the job, and prefer to use niche tools that are optimized for a single problem.

The most important common requirement is the capability to rapidly ingest and organize data from any source and in any format.  For many organizations, this has historically meant using SAS.  (A surprisingly large number of analytic teams use SAS to ingest and organize the data, but perform the actual analysis using other tools).    Growing data volumes, however, pose a performance challenge for the conventional SAS architecture, so analytic teams increasingly look to data warehouse appliances like IBM Netezza, to Hadoop, or a combination of the two.

In the next post, I’ll cover Managerial Analytics, which includes analytics designed to monitor and optimize the performance of programs and products.

Advanced Analytics in Hadoop, Part One

This is the first of a two-part post on the current state of advanced analytics in Hadoop.  In this post, I’ll cover some definitions, the business logic of advanced analytics in Hadoop, and summarize the current state of Mahout.  In a second post, I’ll cover some alternatives to Mahout, currently available and in the pipeline.

For starters, a few definitions.

I use the term advanced analytics to cover machine learning tools (including statistical methods) for the discovery and deployment of useful patterns in data.   Discovery means the articulation of patterns as rules or mathematical expressions;  deployment means the mobilization of discovered patterns to improve a business process.  Advanced analytics may include supervised learning or unsupervised learning, but not queries, reports or other analysis where the user specifies the pattern of interest in advance.  Examples of advanced analytic methods include decision trees, neural networks, clustering, association rules and similar methods.

By “In Hadoop” I mean the complete analytic cycle (from discovery to deployment) runs in the Hadoop environment with no data movement outside of Hadoop.

Analysts can and do code advanced analytics directly in MapReduce.  For some insight into the challenges this approach poses, review the slides from a recent presentation at Strata by Allstate and Revolution Analytics.

The business logic for advanced analytics in Hadoop is similar to the logic for in-database analytics.   External memory-based analytic software packages (such as SAS or SPSS) provide easy-to-use interfaces and rich functionality but they require the user to physically extract data from the datastore.  This physical data movement takes time and effort, and may force the analyst to work with a sample of the data or otherwise modify the analytic approach.  Moreover, once the analysis is complete, deployment back into the datastore may require a complete extract and reload of the data, custom programming or both.  The end result is an extended analytic discovery-to-deployment cycle.

Eliminating data movement radically reduces analytic cycle time.  This is true even when actual run time for model development in an external memory-based software package is faster, because the time needed for data movement and model deployment tends to be much greater than the time needed to develop and test models in the first place.  This means that advanced analytics running in Hadoop do not need to be faster than external memory-based analytics; in fact, they can run slower than external analytic software and still reduce cycle time since the front-end and back-end integration tasks are eliminated.

Ideal use cases for advanced analytics in Hadoop have the following profile:

  • Source data is already in Hadoop
  • Applications that consume the analytics are also in Hadoop
  • Business need to use all of available data (e.g. sampling is not acceptable)
  • Business need for minimal analytic cycle time; this is not the same as a need for minimal score latency, which can be accomplished without updating the model itself

The best use cases for advanced analytics running in Hadoop are dynamic applications where the solution itself must be refreshed constantly.  These include microclustering, where there is a business need to update the clustering scheme whenever a new entity is added to the datastore; and recommendation engines, where each new purchase by a customer can produce new recommendations.

Apache Mahout is an open source project to develop scalable machine learning libraries whose core algorithms are implemented on top of Apache Hadoop using the MapReduce paradigm.   Mahout currently supports classification, clustering, association, dimension reduction, recommendation and lexical analysis use cases.   Consistent with the ideal use cases described above, the recommendation engines and clustering capabilities are the most widely used in commercial applications.

As of Release 0.7 (June 16, 2012), the following algorithms are implemented:

Classification: Logistic Regression, Bayesian, Random Forests, Online Passive Aggressive and Hidden Markov Models

Clustering: Canopy, K-Means, Fuzzy K-Means, Expectation Maximization, Mean Shift, Hierarchical, Dirchlet Process, Latent Dirichlet, Spectral, Minhash, and Top Down

Association: Parallel FP-Growth

Dimension Reduction: Singular Value Decomposition and Stochastic Singular Value Decomposition

Recommenders: Distributed Item-Based Collaborative Filtering and Collaborative Filtering with Parallel Matrix Factorization

Lexical Analysis: Collocations

For a clever introduction to machine learning and Mahout, watch this video.

For more detail, review this presentation on Slideshare.

There are no recently released books on Mahout.  This book is two releases out of date, but provides a good introduction to the project.

Mahout is currently used for commercial applications by Amazon, Buzzlogic, Foursquare, Twitter and Yahoo, among others.   Check the Powered by Mahout page for an extended list.

Next post: Alternatives to Mahout, some partial solutions and enablers, and projects in the pipeline.

Agile Analytics: Overview

Is this the year of Agile Analytics?  Recent publications show growing interest in the application of Agile methods to analytics:

  • Ken Collier, an Agile pioneer, tackles analytics in his aptly named new book Agile Analytics .
  • A quick Google search surfaces a number of recent blogs and articles (here, here and here)
  • Curt Monash recently published an excellent two-part blog on the subject (here and here)

I’ve commented in the past on IBM’s Big Data Hub about techniques that contribute to Agile Analytics, such as in-database analyticsopen source analytics and tighter integration with commercial packages like SAS.  In addition, I’ve commented on some of the barriers to agility, such as limitations of the PMML standard.

In this series, I’ll cover these topics

(1) What is Agile Analytics?

(2) What’s driving interest in Agile Analytics?

(3) What business practices enable Agile Analytics?