Leverage the In-Database Capabilities of Analytic Software
Many analysts have a strong preference for commercial analytic workbenches such as SAS or SPSS. Both packages are widely used, respected by analysts, and each has strong advocates. The purpose of this article is to point out that analytic users can benefit from the performance and simplicity of IBM Netezza in-database analytics without abandoning their preferred interface.
Let’s start with SAS. One of the most frequent complaints from IT organizations about SAS users is the propensity for users to require significant amounts of storage space for SAS data sets. A leading credit card issuer, for example, reports that users have more than one hundred terabytes of SAS files – and the volume is growing rapidly.
But SAS users can store data tables in the Netezza appliance and run data preparation steps against those tables using the SAS Pass-Through Facility. In addition to centralizing storage, reducing data movement and simplifying security, users can realize 100X improvements in program runtime.
In-database PROCs are another SAS feature. SAS currently enables in-database execution of FREQ, MEANS, RANK, REPORT, SORT, SUMMARY, and TABULATE in a number of databases and data warehouse appliances, including Netezza. For the user, database-enabled PROCs operate like any other SAS PROC — but instead of running on the server, the PROC runs in the database.
SAS supports a number of other in-database capabilities through SAS/ACCESS, including the ability to pass functions and formats to Netezza, the ability to create temporary tables and the ability to leverage Netezza’s bulk load/unload facility
SAS users can make calls to Netezza in-database functions by invoking Netezza In-Database Analytics through PROC SQL. In-database functions are far more efficient for building analytic data sets, data cleansing and enhancement. Customers who have implemented this approach have observed remarkable improvements in overall runtime: jobs that ran in hours now run in minutes.
SAS customers using SAS Enterprise Miner or SAS Model Manager can also benefit from SAS Scoring Accelerator. Scoring Accelerator which SAS enables an Enterprise Miner user to export a scoring function that runs on Netezza. This capability helps the organization avoid a custom programming task, and enables the analyst to easily hand off model scoring to a production operation.
IBM SPSS Modeler also offers the capability to work directly with database tables in Netezza; like SAS, it can be configured to minimize storage on the SPSS server. Modeler also offers Pushback SQL capabilities, which enable the user to perform functions within the Netezza appliance, including table joins, aggregation, selections, sorting, field derivation, field projection and scoring. While the in-database functional capabilities of the two packages are similar, SPSS accomplishes this entirely within the graphical environment of the Stream canvas.
As with SAS, SPSS Modeler users can leverage Netezza in-database analytics to build, score and store predictive models, either through custom nodes or out-of-the box integration in Release 15.0. Again, a key difference between SAS and SPSS is that while SPSS Modeler surfaces Netezza in-database analytics through the graphical user environment, SAS users must have programming and SQL skills.
To summarize, leading commercial software packages like SAS and SPSS already offer the ability to manage files, perform data preparation, build models and run scoring processes entirely within the Netezza appliance. Users of these tools can significantly improve runtime performance by leveraging these existing capabilities.