Notes from Strata 2013
Last week I attended the O’Reilly Strata 2013 Conference. Here are some notes on presentations pertinent to analytics, in four categories:
- Thought Provokers
We wouldn’t have trade shows without sponsors, and the big ones get ten minutes of fame. Some used their time well, others not so much. I’ll refrain from shaming the bloviaters, but will single out three for applause:
- John Schroeder of MapR did a nice preso on the business case for Hadoop, with a refreshing focus on measurable revenue impact and cost reduction;
- Girish Juneja from Intel delivered a thoughtful summary of Intel’s participation in open source projects. Not a lot of sizzle, but refreshingly free of hype;
- Charles Zedlewski of Cloudera provided a terrific explanation of the history and direction of Hadoop and made a compelling case for the platform.
Someone should tell O’Reilly that Skytree is a vendor. Skytree managed to get a 45-minute slot in a non-vendor track, and Alexander Gray of Skytree used the time to say stuff that data miners learned years ago.
Several presenters spoke about how their organization uses analytics. In any conference, presentations like this are often the most compelling.
- Rajat Taneja from Electronic Arts spoke about the depth of information captured by gaming companies, and how they use this information to improve the gaming experience. Good presentation, with great visuals
- Eric Colson of Stitch Fix (and formerly with Netflix) spoke about recommendation engines. Stitch Fix sends bundles of new clothing to buyers on spec, and they have finely tuned the bundling process using a mix of machine learning and human decisions. Eric spoke about the respective strengths of machine and human decisioning, and how to use them together effectively.
- Michael Bailey of Facebook gets credit for truth in packaging for “Introduction to Forecasting”. His presentation covered very basic content, the sort of thing covered in Stat 101, and he did a fine job presenting that. Michael hinted at Facebook’s complex forecasting problem — they have to simultaneously forecast eyeballs and ad placements — and it would be great to hear more about that in a future presentation.
It’s tough to deliver detailed content in a short session; most of the presenters I saw struck the right balance.
- Sharmila Shahani-Mulligan and others from ClearStory Data presented to an overflow audience interested in learning more about Spark and Shark. Spark is an open source in-memory distributed computational engine that runs on top of Hadoop. It is designed to support iterative algorithms, and supports Java, Scala and Python. Shark is part of Hive, integrates with Spark, and offers a SQL interface
- Dr. Vijay Srinivas Agneeswaran of Impetus Technologies delivered what I thought was the best presentation in the show. He summarized the limits of legacy analytics, discussed analytics in Hadoop (such as Mahout), and spoke about a third wave of distributed analytics based on technologies like Spark, HaLoop, Twister, Apache Hama and GraphLab.
- Jayant Shekhar of Cloudera delivered a very detailed presentation on how to build a recommendation engine.
Several presenters spoke on broad conceptual topics, with mixed results.
- James Hendler of RPI spoke on the subject of “broad data”. His presentation seemed thoughtful, but to be honest he lost me.
- Nathan Marz of Twitter has co-authored a book on Big Data coming out soon. After listening to his short preso on data modeling (“Human Fault Tolerance”) , I added the book to my wish list.
- Kate Crawford of Microsoft presented on the subject of hidden biases in big data. Her presentation covered material well known to seasoned analysts (“hey, did you know that your data may be biased?), Kate’s presentation was excellent, and full of good examples.
Overall, an excellent show.