Big Analytics Roundup (November 16, 2015)

Just three main stories this week: possible trouble for a pair of analytic startups; Google releases TensorFlow to open source; and H2O delivers new capabilities at its annual meeting.

In other news, the Spark team announces Release 1.5.2, a maintenance release; and the Mahout guy announces Release 0.11.1, with bug fixes and performance improvements. (h/t Hadoop Weekly)

Two items of note from the Databricks blog:

— Darin McBeath describes Elsevier’s Spark use case and introduces spark-xml-utils, a Spark package contributed by his team.  The package enables the Spark user to filter documents based on an Path expression, return specific nodes for an Path/XQuery expression and transform documents using an XLST stylesheet.

— Rachit Agarwal and Anurag Khandelwal of Berkeley’s AMPLab introduce Succinct, a distributed datastore for queries on compressed data.   They announce release of Succinct Spark, a Spark package that enables search, count, range and random access queries on compressed RDDs.  The authors claim a 75X performance advantage over native Spark using Succinct as a document store,

Three interesting stories on streaming data:

  • In a podcast, Data Artisans CTO Stephan Ewen discusses Flink, Spark and the Kappa architecture.
  • Techalpine’s Kaushik Pal compares Spark and Flink for streaming data.
  • Will McGinnis helps you get started with Python and Flink.

(1) Analytic Startups in Trouble

In The Information, Steve Nellis and Peter Schulz explain why startups return to the funding well frequently — and why those that don’t may be in trouble.  Venture funding isn’t a perfect indicator of success, but is often the only indicator available.  On the list: Skytree Software and Alpine Data Labs.

(2) Google Releases TensorFlow for Machine Learning

On the Google Research blog, Google announces open source availability of TensorFlow.  TensorFlow is Google’s second generation machine learning system; it supports Deep Learning as well as any computation that can be expressed as a flow graph.   Read this white paper for details of the system.  At present, there are Python and C++ APIs;  Google notes that the C++ API may offer some performance advantages.

Video intro here.

In Wired, Cade Metz reports; Erik T. Mueller dismisses; and Metz returns to note that Deep Learning can leverage GPUs, and that AI’s future is in data, as if we didn’t know these things already.

On Slate, Will Oremus feels the buzz.

On his eponymous blog, Sachin Joglekar explains how to do k-means clustering with TensorFlow.

Separately, in VentureBeat, Jordan Novet rounds up open source frameworks for Deep Learning.

(3) Releases Steam

It’s not a metaphor.  At its second annual H2O World event, H2O releases Steam, an open source data science hub that bundles model selection, model management and model scoring into a single container for elastic deployment.

On the H2O Blog, Yotam Levy wraps Day One, Day Two and Day Three of the H2O World event.  Speaker videos are here, slides here.  (Registration required.)  Some notable presentations:

— H2O: Tomas Nykodym on GLM; Mark Landry on GBM and Random Forests; Arno Candel on Deep Learning; Erin LaDell on Ensemble Modeling.

— Michal Malohlava of H2O and Richard Garris of Databricks explain how to run H2O on Databricks Cloud.  Separately, Michal demonstrates Sparkling Water, a Spark package that enables a Spark user to call H2O algorithms; Nidhi Mehta leads a hands-on with PySparkling Water;  and Xavier Tordoir of Data Fellas exhibits Interactive Genomes Clustering with Sparkling Water on the Spark Notebook.

— Szilard Pafka of Epoch summarizes his work to date benchmarking R, Python, Vowpal Wabbit, H2O, xgboost and Spark MLLib.  As reported previously, Pafka’s benchmarks show that H2O and xgboost are the best performers; they are faster and deliver more accurate models.

As reported in last week’s roundup, also announces a $20 million “B” round.

Big Analytics Roundup (October 26, 2015)

Fourteen stories this week, beginning with an announcement from IBM.  This week, IBM celebrates 14 straight quarters of declining revenue at its IBM Insight conference, appropriately enough at the Mandalay Bay in Vegas, where the restaurants are overhyped and overpriced.

Meanwhile, the first Spark Summit Europe meets in Amsterdam, in the far more interesting setting of the Beurs van Berlage.  There will be a live stream on Wednesday and Thursday — details here.  Sadly, I can’t make this one — the first Spark Summit I’ve missed — but am looking forward to the live stream.

(1) IBM Announces Spark on Bluemix

At its IBM Insight beauty show, IBM announces availability of its Apache Spark cloud service.  Actually, IBM announced it back in July, but that was a public beta.   On ZDNet, Andrew Brust gushes, noting that IBM has DB2, Watson, Netezza, Cognos, TM1, SPSS, Informix and Cloudant in its portfolio.  He fails to note that of those products, exactly one — Cloudant — actually interfaces with Spark.

There were rumors that IBM would have an exciting announcement about Spark at this show, but if this is it — yawn.  Looking at IBM’s “Spark in the cloud” offering, I don’t see anything that sets it apart from other available offerings unless you have a Blue fetish.

Update: Rod Reicks of IBM writes to note that IBM’s new release of SPSS Analytics Server runs processes in Spark.  For the uninitiated, Analytics Server is a product you license from IBM that enables SPSS Modeler user to run selected operations in Hadoop.  Previous versions ran through MapReduce only.  Reicks claims that the latest version runs through Spark when available.

I say “claims” because there is no reference to this feature in IBM’s Release Notes, Installation Guide or User’s Guide.  Spark is mentioned deep in the Administrator Guide, under Troubleshooting.  So the good news is that if the product fails, IBM has some tips — one of which should be “Install Spark.”

You’d think that with IBM’s armies of people they could at least find someone to write documentation.

(2) Mahout Book FAIL

Packt announces a book on Clustering with Mahout with an entire chapter devoted to Canopy Clustering, which the Mahout team just deprecated.

(3) Concurrent Adds Spark Support

Concurrent announces Release 2.0 of Driven, its oddly-named performance management software, which now includes support for Apache Spark.

(4) Flink Founder Touts Streaming Analytics

At Big Data Spain, Data Artisans co-founder Kostas Tzoumas argues that streaming is the basis for all analytics, which is a bit over the top: as they say, if all you have is a hammer, the world looks like a nail.  Still, his deck is a nice intro to Flink, which has made some progress this year.

(5) AtScale Announces Release 3.0

AtScale, one of the more interesting startups in the BI space, delivers Release 3.0 of its OLAP-on Hadoop platform.  Rather than introducing a new user interface into the mix, AtScale makes it possible for BI users to work with Hadoop tables without jumping back and forth to programming tools.  The product currently supports Tableau, Excel, Qlik, Spotfire, MicroStrategy and JasperSoft, and runs on CDH, HDP or MapR with Impala, Spark SQL or Hive on Tez.  The new release includes enhanced role-based security, including Kerberos, Username/Password or LDAP.

(6) Neo: Graphs are Eating the World

Graph database leader Neo announces immediate availability of Neo4j 2.3, which includes what it calls “intelligent applications at scale” and Docker support.  Exactly what Neo means by “intelligence applications at scale” means is unclear, but if Neo is claiming that you no longer have to dump a graph into Spark to run a PageRank, I’ll believe it when I see it.

(7) New Notebook Sharing for Databricks 

Databricks announces new notebook sharing capabilities for its eponymous product.  On the Databricks blog, Denise Li and Dave Wang explain.

(8) Teradata: Blah, Blah, Blah, IoT, Blah, Blah Blah

At its annual user conference, Teradata announces that it’s heard about IoT.    Teradata also announces that it will make Aster available on Hadoop, which would have been interesting in 2012.  Aster, for the uninitiated, includes a SQL on MapReduce engine, which is rendered obsolete by fast SQL engines like Presto, which Teradata has just embraced.

(9) Flink Forward Redux

As I noted last week, the first Flink Forward conference met in Berlin two weeks ago.  William Benton records his impressions.

Presentations are here.  Some highlights:

  • Dongwon Kim benchmarks Flink against MR, MR on Tez and Spark.  Flink wins.
  • Kostas Tzoumas outlines the Flink development roadmap through Release 1.0.
  • Martin Junghanns explains graph analytics with Flink.
  • Anwar Rizal demonstrates streaming decision trees with Flink.

Henning Kropp offers resources for diving deeply into Flink.

(10) Pyramid Analytics Lands New Funding

Amsterdam-based BI startup Pyramid Analytics announces a $30 million “B” round to help it try to explain why we need more BI software.

(11) Harte Hanks Switches from CDH to MapR

John Leonard explains why Harte Hanks switched from Cloudera to MapR.  Most likely explanation: they were able to cut a cheaper deal with MapR.

(12) Audience Modeling with Spark

Guest posting on the Databricks blog, Eugene Zhulenev explains audience modeling with Spark ML pipelines.

(13) New Functions in Drill

On the MapR blog, Neeraja Rentachintala describes new capabilities in Drill Release 1.2, including SQL window functions.

(14) Integrating Spark and Redshift

“Redshift is where data goes to die.”  — Rob Ferguson, Spark Summit East

On the Databricks blog, Sameer Wadkar of Axiomine explains how to use the spark-redshift package, first introduced in March of this year and now in version 0.5.2.  So you can yank your data out of Redshift and do something with it. (h/t Hadoop Weekly)

Big Analytics Roundup (October 19, 2015)

Ten stories this week.  Don’t miss story #10, which recaps an analysis of collaboration and influence in the U.S.Congress using open source graph engines and a rich database of legislation.

(1) Rexer: R Continues to Lead

Rexer Analytics has released preliminary results from its 2015 survey of working analysts; Bob Muenchin reports.  One interesting snippet — reported tool use, as shown in the graphic below.


Several interesting changes from the previous survey:

  • Reported primary and total use of R continues to increase
  • SPSS/Statistics declined slightly in reported usage, remains #2
  • RapidMiner is way down, from third to ninth.  Also interesting to note that ~95% of RapidMiner users say they use the free version.
  • SAS usage remained constant, but moved up in rank to third as RapidMiner fell
  • Reported usage of Excel Data Mining and Tableau are way up from previous rounds of the survey

Like most surveys on this topic, there are issues with Rexer’s sampling methodology that mandate careful interpretation.  Rexer’s methods are largely consistent from year to year, however, so changes between iterations of the survey are interesting and may reflect real-world trends.

(2) CfP for Spark Summit East Opens

Spark Summit East will meet at the New York Hilton February 16-18; I will be there, with bells on.  The Call for Presentations is now open, link here.

(3) DataTorrent Explains DAGs

On the DataTorrent blog, Thomas Weise explains directed acyclic graphs, or DAGs, which is a fancy name for a way to describe logical dependencies with dots and arrows.  It sounds prosaic, but DAGs are fundamental to Storm, Spark, Tez and Apex, all of which play a role in bringing high-performance computing to the Hadoop ecosystem.

(4) New Apache Drill Release

SQL platform Apache Drill announces Release 1.2.  Key new bits:

  • Relational database support (through JDBC)
  • Additional window functions
  • Parquet metadata caching
  • Performance improvements on HBase and Hive tables
  • Drop table capability for files and directories
  • Enhanced MongoDB integration

Plus many bug fixes.  Nice work, Drill team, but it feels like rearranging the deck chairs.  Drill lags the other SQL engines in Kerberos support, YARN integration and query fault tolerance; while Teradata is stepping in to do something with Presto, Drill is an orphan.  There is no UI, and no sign that the BI vendors are looking to build on Drill, so it’s not clear where Drill goes from here.

(5) Fans Flock to Flink Forward ’15

The first Flink Forward conference met for two days in Berlin last week.  Data Artisans organized the program and delivered a number of the presentations.  Capital One’s Slim Baltagi has kindly shared the deck from his keynoter on Flink versus Spark.

(6) Big Data Spain Meets in Madrid

The 4th Edition of Big Data Spain met last week in Madrid.  On Slideshare, evil mad scientist Paco Nathan offers two decks:

Data Science in 2016, his keynote address, covers architectural design patterns; observations on trends; example applications and use cases; and offers a glimpse ahead.

Crash Introduction to Apache Spark, slides from a workshop, is exactly what it sounds like it is.

(7) MIT Researchers Build Data Science Machine

James Max Kanter and Kalyan Veeramachaneni of MIT develop an automated Data Science Machine (DSM), enroll it in three data science competitions, beat 615 out of 906 teams.  DSM performed “nearly” as well as the human teams; but while humans spent months developing their models, the DSM spent 2-12 hours.

In a paper that describes their approach, Kanter and Veeramachaneni describe an approach to feature engineering they call Deep Feature Synthesis, which generates features based on automated analysis of a relational data model.  The authors note that a naive grid search for the optimal model specification would require trillions of experiments; they use Bayesian optimization to find the best model.

(8) Spark-Based Security Platform Lands Funding

DataVisor, founded in 2013, announces a $14.5 million “A” round from GSR and NEA to develop its eponymous security analysis engine, which runs on Spark.  The company, based in Mountain View, claims that its software can process billions of events per hour, and boasts Yelp and Momo as customers.

(9) Dato Releases Spark-GraphLab Interface

On the Dato Blog, Emad Soroush introduces the spark-sframe package, which enables a GraphLab user to ingest Spark RDDs as GraphLab SFrames.  Dato introduced SFrames a couple of weeks ago.  As I noted at the time, it doesn’t really matter how cool the SFrame is, it’s YADF — Yet Another Data Format.

Rather than forcing data scientists to convert data to a new format, machine learning vendors need to figure out how to work with existing Hadoop formats.  Dato isn’t going to build a complete Business Analytics stack; it’s going to have to integrate with SQL engines and other tools, and YADF makes that harder, not easier.

I also have to wonder why Dato hasn’t registered this package on Spark Packages, like everyone else who integrates with Spark.

(10) Spark Plus GraphX Equals Mazerunner

On his personal blog, William Lyon demonstrates an analysis of influence in the U.S.Congress using the Neo4j graph database, Apache Spark GraphX and Mazerunner, an open source project that merges the capabilities of Neo4j and Spark.  In a previous post, Lyon showed how he loaded data from into Neo4j to build a rich graph of collaboration among different members of Congress.


Next, he uses Mazerunner’s PageRank tooling to calculate the influence for each Senator and Member of Congress.  Mazerunner selects and extracts the relevant subgraph from Neo4j, runs a Spark GrapX job and writes the results back to Neo4j.

Mazerunner is free and open source under an Apache 2.0 license, and is distributed on Git.  Currently, it supports algorithms for PageRank, Closeness Centrality, Betweenness Centrality, Triangle Counting, Connected Components and Strongly Connected Components.

Big Analytics Roundup (September 28, 2015)

Strata+Hadoop World NYC is upon us.  Andrew Brust opines that there will be three themes at Strata this year: (1) Spark “versus” Hadoop; (2) streaming goes mainstream; (3) data governance matters.  My take:

  1. “Spark versus Hadoop” is controversy for the sake of people who like controversy.  Spark works with Hadoop, and Spark works with other platforms, or by itself.  Use cases will determine the best platform.
  2. We’ve been hearing that streaming is mainstream for something like ten years now.  There are a half-dozen commercial products in the space, plus multiple open source frameworks.
  3. Data governance is a soporific.

Due to the spate of Spark stories this week, this week’s roundup has four sections: Spark, SQL, Machine Learning and Streaming.  The top story is Databricks’ Spark survey, which provoked a flurry of analysis.


2015 Spark Survey

Databricks released results of its 2015 Spark Survey, available here (registration required); an infographic is here.  The “report” is a somewhat informative mashup of survey findings, plus other information, such as the headcount from Spark Summits.  (Spoiler: it’s increasing.)  On the Databricks blog, Matei Zaharia, Patrick Wendell and Denny Lee summarize key points.  Additional analysis herehereherehereherehere, here and here.

Analysts, loving controversy, note that Spark users slightly prefer standalone configurations over Spark-on-YARN (e.g. co-located in Hadoop).  Andrew Oliver, for example, commenting on Cloudera’s One Platform  announcement earlier this month, argues that Databricks is actively marketing against Spark-on-YARN, citing results of this survey.  But if you compare these results to the Typesafe/Databricks Spark survey published in January, you will note that respondents to the 2015 survey are slightly less likely to run Spark in a standalone cluster this year compared to last year.

Other analysts, like Tony Baer, note that 11% of respondents run Spark on Mesos, hinting darkly that since the AMPLab team developed both Spark and Mesos, there must be some sort of conspiracy against Hadoop.  But in the earlier survey, 26% of respondents said they run on Mesos, so if someone is organizing a secret cabal to compete against Spark-on-YARN, it’s not working out too well.

The biggest news in the survey is the rapid growth of users who use the Python API, from 22% to 58%, and the corresponding decline among those who use Scala or Java.  The SQL and R interfaces are too new to compare to the previous survey, but it’s worth noting that in 2015 more respondents use the SQL interface than the Java interface.

Spark as a Service

Google announces Cloud Dataproc, a managed Spark and Hadoop service, currently available in beta.  Key benefits claimed: cheap, fast, integrated with the other Google Cloud platform services, easy to manage, simple and familiar.  Google claims that they can set up or knock down a cluster in ninety seconds or less.  Billing is by the minute, which is cool.  Stories here, here, herehere, here, herehere, here, herehere, here, herehere, here, and here.

BlueData offers Yet Another Spark Service.

In case you’re not happy with available offerings for Spark-as-a-service from Databricks, Qubole, Amazon Web Services, Google and BlueData, MemSQL offers Streamliner.  Stories here, here, here, here and here.

Miscellaneous Spark Bits

Jim Scott enters the Spark vs. Hadoop fray and gets it wrong.  No, Spark does not need HDFS; it works perfectly well with other datastores.

Jim Scott (again) lists five use cases for Spark Streaming: credit card fraud detection, network security, genomic sequencing, real-time ad targeting and hospital readmission.

On the MapR blog, the ubiquitous Jim Scott explains why Spark is a great companion to Hadoop.

In IT Jungle, Alex Woodie wonders what IBM’s embrace of Spark means for the product line IBM now brands as “i-series” and everyone else calls “AS-400”.  His answer: nothing, IBM has no plans to put Spark on these tired old boxes.

Writing for American Banker, Tom Groenfeldt interviews Tom Davenport, several vendors (Rob Thomas of IBM, David Wallace of SAS and Abhi Mehta of Tresata) and one banker.  Tom Davenport says that bankers use different things, touts Teradata; Rob Thomas talks about IBM’s Spark initiative; David Wallace says that banks use SAS, and the one banker talks about using Accenture.  From this muddle, Mr. Groenfeldt concludes that banks are turning to Spark.

In an article titled Retail Gains with Distributed Systems, Daniel Gutierrez talks about Hadoop and Spark, but provides no actual examples of retailers using these platforms.



MapR’s Drill team walks to start Dremio.

Jim Scott, who was quite busy last week, profiles Apache Drill.

On YouTube, a disembodied voice representing Syntelli Solutions offers you a Test Drive using Drill and Spotfire on AWS.


Cloudera benchmarks Impala with TPC-DS queries, concludes that maximum concurrency with good performance increases with the size of the cluster.  This does not seem surprising at all; more nodes in the cluster means more horsepower.


Harish Butani of Sparkline Data benchmarks TPCH queries using Spark SQL on Druid, summarizes results on LinkedIn.  Conclusion: Spark on Druid runs a lot faster than Spark on Parquet.  Full report here. Sparkline publishes a Spark Druid interface in Spark Packages.

On the MapR blog, Michele Nemschoff touts the Hadoop and Spark platform for retail analytics it sold to Quantium, an Australian analytic services provider.

Platfora announces Release 5.0, which leverages Spark behind the scenes for data preparation.  Alex Woodie explains.  More stories here, herehere and here.

ClearStory Data announces a triumph of branding (“Intelligent Data Harmonization”) and a few new features in a muddled press release.

Machine Learning


Carlos Guestrin announces that Dato is a big believer in open source software, which will make you feel good when you pay the subscription fees on Dato’s commercial software.   Dato has released its SFrame columnar data frame to open source under a BSD license.  SFrames are like Pandas or R Frames, with some additional features useful to data scientists, like out-of-memory operations and support for wide datasets.

No doubt SFrames are cool, but the key challenge for companies in this space is to figure out how to make analytics work with mainstream data formats.  Any advantages of a new format are offset by the time and cost needed to ingest and export the data.


At the Moscow Data Fest, H2O argues that machine learning is the new SQL.

Sam Dean interviews VP Marketing Oleg Rogynskyy.


Two items from the Databricks blog cover improvements to Spark’s machine learning capabilities in Spark 1.5:

Cloudera’s Sandy Ryza et. al. contribute Spark-Timeseries, a Python and Scala library for analyzing large-scale time series datasets. (h/t Hadoop Weekly)

Streaming Analytics

Flink/Data Artisans

Concurrent and Data Artisans announce “strategic partnership” to support Cascading on Flink.  Cascading touts.

On the MapR blog, Ellen Friedman introduces you to Flink.

TIBCO Streambase

TIBCO’s Kai Wahner presents a nice overview of stream processing frameworks and products.  Not surprisingly, he likes Tibco Streambase, but the deck nicely summarizes differences between the commercial and open source options.

Big Analytics Roundup (August 31, 2015)

Top stories for the penultimate week of summer: an excellent SQL-on-Hadoop benchmark; a couple of stories about Gelly, Flink’s graph engine; Apache Ignite goes top-level; a preview of Spark 1.5; and new stuff from RStudio.

Also, on Slideshare, evil mad scientist Paco Nathan presents on “Uber for Education.”

SQL on Hadoop

I missed this story in June, but better late than never.  The folks at, a Warsaw-based collaborative, published results of an excellent benchmark of SQL-on-Hadoop technologies.  Scope of the analysis included Hive on MapReduce (the “control”), Hive on Tez, Presto, Impala, Drill and Spark SQL.  (The authors note that they wanted to evaluate Hive on Spark, but could not make it work.)

The Allegro team first evaluated Kerberos support, YARN deployment and query fault tolerance, the available UI, JDBC support, UDF and view support as well as support for each of CSV, JSON, AVRO and Parquet formats.  For benchmarking, they used 11 HiveQL queries testing a mix of typical analytic tasks.

Some key findings:

  • Hive on Tez: ran all queries with stable and satisfactory performance
  • Spark SQL: better than average performance overall, but could not run two queries
  • Presto: convenient to use, but performance was disappointing
  • Impala: fastest overall, but could not run one of the queries
  • Drill: very fast, but could not run three queries

Apache Flink/Data Artisans

On Slideshare, Vasia Kalavri presents on overview of Gelly, Flink’s graph engine.  More about Gelly here.

Apache Ignite/GridGain

The Apache Software Foundation promotes Ignite to top-level project status.  SD Times reports.  Ignite is a high-performance integrated and distributed in-memory platform.  Ignite is the open source version of GridGain‘s commercial product.

Apache Lens

ASF also promotes Lens to top-level status.  Apache Lens is a “Unified Analytics Platform”, whatever that is.  (h/t Hadoop Weekly)

Apache Spark/Databricks

Patrick Wendell of Databricks presented a preview of Spark 1.5 last Thursday.    Spark 1.5 will be available in mid-September (exact timing depends on Apache voting process).  Developers from more than 50 companies contributed to the build.  A preview is available in Databricks now.  Key enhancements:

  • Execution concepts will be exposed: tracking memory usage, visualizing DataFrame execution tree
  • Project Tungsten will be on by default: binary processing for memory management, code generation for CPU efficiency
  • Performance optimizations in SQL/DataFrames: Metadata discovery, predicate pushdown in Parquet, outer joins and window functions
  • First class UDAF support
  • Improved interoperability with Hive
  • Read Parquet files encoded by Hive, Impala, Pig, Avro, Thrift, Spark SQL
  • Additional Python interfaces for Spark Streaming
  • R bindings for linear models
  • Python bindings for Power Iteration Clustering
  • New algorithms and transforms for ML Pipelines

There will also be some new packages available concurrently with the 1.5 release, including support for AWS Redshift, Magellan support for spatial analytics and a convex solver package.

On Datanami, George Leopold covers the story.

Alex Woodie interviews some Spark users and discovers that they often use it together with Hadoop.

Jessica Twentyman notes that Spark looks set to replace MapReduce, inquires into the pace, scope and scale of replacement.  She finds a lot of smart people who are optimistic and a few who urge caution, citing Spark’s immaturity.

Darryl Taft explains how Spark transforms Big Data processing and development.  Spoiler: it’s faster.

In readwrite, Peter Schlampp provides six reasons that Apache Spark isn’t flickering out, thereby answering a question nobody is asking.  For the record, his reasons are: advanced analytics, simplification, support for multiple languages, faster results, Hadoop distribution agnosticism and high-growth adoption.

On the Cloudera blog, Jeff Palmucci of TripAdvisor describes how his team uses Spark.

Google Cloud

announces a new release of BigQuery with UDF support.

On HomeAI, Arno Candel presents a Deep Learning Webinar.


RStudio adds a new starter plan for, a cloud service for Shiny apps.  Roger Oberg reports.