SAS Visual Analytics: FAQ (Updated 1/2014)

SAS charged its sales force with selling 2,000 licenses for Visual Analytics in 2013; the jury is still out on whether they met this target.  There’s lots of marketing action lately from SAS about this product, so here’s an FAQ.

Update:  SAS recently announced 1,400 sites licensed for Visual Analytics.  In SAS lingo, a site corresponds roughly to one machine, but one license can include multiple sites; so the actual number of licenses sold in 2013 is less than 1,400.  In April 2013 SAS executives claimed two hundred customers for the product.   In contrast, Tableau reports that it added seven thousand customers in 2013 bringing its total customer count to 17,000.

What is SAS Visual Analytics?

Visual Analytics is an in-memory visualization and reporting tool.

What does Visual Analytics do?

SAS Visual Analytics creates reports and graphs that are visually compelling.  You can view them on mobile devices.

VA is now in its fifth dot release.  Why do they call it Release 6.3?

SAS Worldwide Marketing thinks that if they call it Release 6.3, you will think it’s a mature product.  It’s one of the games software companies play.

Is Visual Analytics an in-memory database, like SAP HANA?

No.  HANA is a standards-based in-memory database that runs on many different brands of hardware and supports a range of end-user tools.  VA is a proprietary architecture available on a limited choice of hardware platforms.  It cannot support anything other than the end-user applications SAS chooses to develop.

What does VA compete with?

SAS claims that Visual Analytics competes with Tableau, Qlikview and Spotfire.  Internally, SAS leadership refers to the product as its “Tableau-killer” but as the reader can see from the update at the top of this page, Tableau is alive and well.

How well does it compare?

You will have to decide for yourself whether VA reports are prettier than those produced by Tableau, Qlikview or Spotfire.  On paper, Tableau has more functionality.

VA runs in memory.  Does that make it better than conventional BI?

All analytic applications perform computations in memory.  Tableau runs in memory, and so does Base SAS.   There’s nothing unique about that.

What makes VA different from conventional BI applications is that it loads the entire fact table into memory.  By contrast, BI applications like Tableau query a back-end database to retrieve the necessary data, then perform computations on the result set.

Performance of a conventional BI application depends on how fast the back-end database can retrieve the data.  With a high-performance database the performance is excellent, but in most cases it won’t be as fast as it would if the data were held in memory.

So VA is faster?  Is there a downside?

There are two.

First, since conventional BI systems don’t need to load the entire fact table into memory, they can support usage with much larger datastores.  The largest H-P Proliant box for VA maxes out at about 10 terabytes; the smallest Netezza appliance supports 30 terabytes, and scales to petabytes.

The other downside is cost; memory is still much more expensive than other forms of storage, and the machines that host VA are far more expensive than data warehouse appliances that can host far more data.

VA is for Big Data, right?

SAS and H-P appear to be having trouble selling VA in larger sizes, and are positioning a small version that can handle 75-100 Gigabytes of data.  That’s tiny.

The public references SAS has announced for this product don’t seem particularly large.  See below.

How does data get into VA?

VA can load data from a relational database or from a proprietary SASHDAT file.  SAS cautions that loading data from a relational database is only a realistic option when VA is co-located in a Teradata Model 720 or Greenplum DCA appliance.

To use SASHDAT files, you must first create them using SAS.

Does VA work with unstructured data?

VA works with structured data, so unstructured data must be structured first, then loaded either to a co-located relational database or to SAS’ proprietary SASHDAT format.

Unlike products like Datameer or IBM Big Sheets, VA does not support “schema on read”, and it lacks built-in tools for parsing unstructured text.

But wait, SAS says VA works with Hadoop.  What’s up with that?

A bit of Marketing slight-of-hand.  VA can load SASHDAT files that are stored in the Hadoop File System (HDFS); but first, you have to process the data in SAS, then load it back into HDFS.  In other words, you can’t visualize and write reports from the data that streams in from machine-generated sources — the kind of live BI that makes Hadoop really cool.  You have to batch the data, parse it, structure it, then load it with SAS to VA’s staging area.

Can VA work with streaming data?

SAS sells tools that can capture streaming data and load it to a VA data source, but VA works with structured data at rest only.

With VA, can my users track events in real time?

Don’t bet on it.   To be usable VA requires significant pre-processing before it is loaded into VA’s memory.  Moreover, once it is loaded it can’t be updated; updating the data in VA requires a full truncate and reload.   Thus, however fast VA is in responding to user requests, your users won’t be tracking clicks on their iPads in real time; they will be looking at yesterday’s data.

Does VA do predictive analytics?

Visual Analytics 6.1 can perform correlation, fit bivariate trend lines to plots and do simple forecasting.  That’s no better than Tableau.  Surprisingly, given the hype, Tableau actually supports more analysis functions.

While SAS claims that VA is better than SAP HANA because “HANA is just a database”, the reality is that SAP supports more analytics through its Predictive Analytics Library than SAS supports in VA.

Has anyone purchased VA?

A SAS executive claimed 200 customers in early 2013, a figure that should be taken with a grain of salt.  If there are that many customers for this product, they are hiding.

There are five public references, all of them outside the US:

SAS has also recently announced selection (but not implementation) by

OfficeMax has also purchased the product, according to this SAS blog.

As of January 2014, the four customers who announced selection or purchase are not cited as reference customers.

What about implementation?  This is an appliance, right?

Wrong.  SAS’ considers an implementation that takes a month to be wildly successful.  Implementation tasks include the same tasks you would see in any other BI project, such as data requirements, data modeling, ETL construction and so forth.  All of the back end feeds must be built to put data into a format that VA can load.

Bottom line, does it make sense to buy SAS Visual Analytics?

Again, you will have to decide for yourself whether the SAS VA reports look better than Tableau or the many other options in this space.  BI beauty shows are inherently subjective.

You should also demand that SAS prove its claims to performance in a competitive POC.  Despite the theoretical advantage of an in-memory architecture, actual performance is influenced by many factors.  Visitors to the recent Gartner BI Summit who witnessed a demo were unimpressed; one described it to me as “dog slow”.  She didn’t mean that as a compliment.

The high cost of in-memory platforms mean that VA and its supporting hardware will be much more expensive for any given quantity of data than Tableau or equivalent products. Moreover, its proprietary architecture means you will be stuck with a BI silo in your organization unless you are willing to make SAS your exclusive BI provider.  That makes this product very good for SAS; the question is whether it is good for you.

The early adopters for this product appear to be very SAS-centric organizations (with significant prior SAS investment).  They also appear to be fairly small.  If you have very little data, money to burn and are willing to experiment with a relatively new product, VA may be for you.

Fact-Check: SAS and Greenplum

Does SAS run “inside” Greenplum?  Can existing SAS programs run faster in Greenplum without modification?  Clients say that their EMC rep makes such claims.

The first claim rests on confusion about EMC Greenplum’s product line.  It’s important to distinguish between Greenplum Database and Greenplum DCA.  Greenplum DCA is a rack of commodity blade servers which can be configured with Greenplum Database running on some of the blades and SAS running on the other blades.  For most customers, a single DCA blade provides insufficient computing power to support SAS, so EMC and SAS typically recommend deployment on multiple blades, with SAS Grid Manager implemented for workload management.   This architecture is illustrated in this white paper on SAS’ website.

As EMC’s reference architecture clearly illustrates, SAS does not run “inside” Greenplum database (or any other database); it simply runs on server blades that are co-located in the same physical rack as the database.  The SAS instance installed on the DCA rack works just like any other SAS instance installed on freestanding servers.  SAS interfaces with Greenplum Database through a SAS/ACCESS interface, which is exactly the same way that SAS interacts with other databases.

Does co-locating SAS and the database in the same rack offer any benefits?  Yes, because when data moves back and forth between SAS and Greenplum Database, it does so over a dedicated 10GB Ethernet connection.   However, this is not a unique benefit — customers can implement a similar high-speed connection between a free-standing instance of SAS and any data warehouse appliance, such as IBM Netezza.

To summarize, SAS does not run “inside” Greenplum Database or any other database; moreover, SAS’  interface with Greenplum is virtually the same as SAS’ interface with any other supported database.  EMC offers customers the ability to co-locate SAS in the same rack of servers as the Greenplum Database, which expedites data movement between SAS and the database, but this is a capability that can be replicated cheaply in other ways.

The second claim — that SAS programs run faster in Greenplum DCA without modification — requires more complex analysis.   For starters, though, keep in mind that SAS programs always require at least some modification when moved from one SAS instance to another, if only to update SAS libraries and adjust for platform-specific options.  Those modifications are small, however, so let’s set them aside and grant EMC some latitude for sales hyperbole.

To understand how existing SAS program will perform inside DCA, we need to consider the building blocks of those existing programs:

  1. SAS DATA Steps
  3. SAS Database-Enabled PROCs
  4. SAS Analytic PROCs (PROC LOGISTIC, PROC REG, and so forth)

Here’s how SAS will handle each of these workloads within DCA:

(1) SAS DATA Steps: SAS attempts to translate SAS DATA Step statements into SQL.   When this translation succeeds, SAS submits the SQL expression to Greenplum Database, which runs the query and returns the result set to SAS.  Since SAS DATA Step programming includes many concepts that do not translate well to SQL, in most cases SAS will extract all required data from the database and run the required operations as a single-threaded process on one of the SAS nodes.

(2) SAS PROC SQL: SAS submits the embedded SQL to Greenplum Database, which runs the query and return the result set to SAS.   The SAS user must verify that the embedded SQL expression is syntactically correct for Greenplum.

(3) SAS Database-Enabled PROCs;  SAS converts the user request to database-specific SQL and submits to Greenplum Database, which runs the query and returns the result set to SAS.

(4) SAS Analytic PROCs:  In most cases, SAS runs the PROC on one of the server blades.  A limited number of SAS PROCs are automatically enabled for Grid Computing; these PROCs will run multi-threaded.

In each case, the SAS workload runs in the same way inside DCA as it would if implemented in a free-standing SAS instance with comparable computing power.   Existing SAS programs are not automatically enabled to leverage Greenplum’s parallel processing; the SAS user must explicitly modify the SAS program to exploit Greenplum Database just as they would when using SAS with other databases.

So, returning to the question: will existing SAS programs run faster in Greenplum DCA without modification?  Setting aside minor changes when moving any SAS program, the performance of existing programs when run in DCA will be no better than what would be achieved when SAS is deployed on competing hardware with comparable computing specifications.

SAS users can only realize radical performance improvements when they explicitly modify their programs to take advantage of in-database processing.   Greenplum has no special advantage in this regard; conversion effort is similar for all databases supported by SAS.

Customer Endorsement for SAS High Performance Analytics

When SAS released its new in-memory analytic software last December, I predicted that SAS would have one reference customer in 2012.  I believed at the time that several factors, including pricing, inability to run most existing SAS programs and SAS’ track record with new products would prevent widespread adoption, but that SAS would do whatever it takes to get at least one customer up and running on the product.

It may surprise you to learn that SAS does not already have a number of public references for the product.  SAS uses the term ‘High Performance Analytics’ in two ways: as the name for its new high-end in-memory analytics software, and to refer to an entire category of products, both new and existing.  Hence, it’s important to read SAS’ customer success stories carefully; for example, SAS cites CSI-Piemonte as a reference for in-memory analytics, but the text of the story indicates the customer has selected SAS Grid Manager, a mature product.

Recently, a United Health Group executive spoke at SAS’ Analytics 2012 conference and publicly endorsed the High Performance Analytics product; a search through SAS press releases and blog postings appears to show that this is the first genuine public endorsement.  You can read the story here.

Several comments:

— While it appears the POC succeeded, the story does not say that United Healthcare has licensed SAS HPA for production.

— The executive interviewed in the article appears to be unaware of alternative technologies, some of which are already owned and used by his employer.

— The use case described in the article is not particularly challenging.  Four million rows of data was a large data set ten years ago; today we work with data sets that are orders of magnitude larger than that.

— The reported load rate of 9.2 TB is good, but not better than what can be achieved with competing products.  The story does not state whether this rate measure load from raw data to Greenplum or from Greenplum into SAS HPA’s memory.

— Performance for parsing unstructured data — “millions of rows of text data in a few minutes” — is not compelling compared to alternatives.

The money quote in this story: “this Big Data analytics stuff is expensive…”  That statement is certainly true of SAS High Performance Analytics, but not necessarily so for alternatives.   Due to the high cost of this software, the executive in the story does not believe SAS HPA can be deployed broadly as an architecture, but must be implemented in a silo that will require users to move data around.

That path doesn’t lead to the Analytic Enterprise.

EMC Announces Partnership with Alpine Data Labs

Catching up on the news here.

The keyword in the title of this post is “announces”.  It’s not news that EMC partners with Alpine Data Labs.   Alpine Miner is a nifty product, but in the predictive analytics market Alpine is an ankle-biter compared to SAS, SPSS, Mathsoft and other vendors.   Greenplum and Alpine were sister companies funded by the same VC before EMC entered the picture.  When EMC acquired Greenplum, they passed on Alpine because (a) it didn’t fit into EMC’s all-things-data warehousing strategy, and (b) EMC didn’t want to mess up their new alliance with SAS.

SAS does not look kindly on alliance partners that compete with them; this is, in part, a knee-jerk response.  In the analytics software market, clients rarely switch from one vendor to another, and growth opportunities in the analytic tools market are limited.  Most of the action is in emerging users and analytic applications, where SAS’ core strengths don’t play as well.  Nevertheless, SAS expects to own every category in which it chooses to compete and expects its partners to go along even as SAS invades new territory.

After EMC acquired Greenplum, GP reps continued to work together on a “sell-with” basis in a kind of “stealth” partnership.

So it’s significant that EMC entered into a reseller agreement with Alpine and announced it to the world.  It’s a smart move by EMC; as I said earlier, Alpine is a nifty product. But it suggests that EMC isn’t getting the traction it expected from the SAS alliance — a view that’s supported by scuttlebutt from inside both SAS and EMC.