Big Analytics Roundup (May 16, 2016)
This week we have more insight into Spark 2.0, scheduled for release just before Spark Summit 2016. (Yes, I’m going.) Also, kudos to BI-on-Hadoop startup AtScale for a new round of funding; Amazon releases YADLF (Yet Another Deep Learning Framework); and there are a number of new faces at H2O.ai.
Plus, we have an extended review of the Palantir story.
Buzzfeed on Palantir
Last week, I deemed Buzzfeed’s story on Palantir too dumb to link. (“Forget it, Jake. It’s Buzzfeed.”) Buzzfeed “news” reporter William Alden, who was all over a story about maggots in Facebook lunches, breathlessly mines a cache of “secret internal documents” and discovers:
- Palantir expects employee turnover of around 20% for 2016.
- Palantir lost some clients.
- Palantir books more work than it bills.
Does Palantir have an employee turnover problem? No. A 20% turnover rate is slightly above the 17% reported for all industries in 2015, and about on track for Silicon Valley. (There are companies in SV with 100% turnover rates.) On Glassdoor, employees give Palantir high marks.
Does Palantir have a client retention problem? Not exactly. The story cites four clients — American Express, Coca-Cola, Kimberley-Clark and Nasdaq — who engaged Palantir to conduct a pilot, then decided not to proceed with a long-term contract. In other words, lost sales and not cancelled contracts. The document Buzzfeed obtained is Palantir’s won/lost analysis, which shows that the company is attempting to learn from its lost sales.
Does Palantir have a revenue problem? No. Palantir’s 2015 revenue was up 50% from the previous year. Buzzfeed obsesses over the difference between Palantir’s bookings of $1.7 billion and its revenue of $420 million. A high book-to-bill ratio is typical for consultancies that pursue large multi-year projects; it is a sign of strong demand for the company’s services. Under GAAP accounting, companies can accrue revenue only as work is performed, even if they bill the work in advance. Note that consulting giant Accenture’s bookings exceed its revenue for its most recent quarter.
Does Palantir have a profitability problem? Possibly. Buzzfeed reports that the company lost $80 million last year on revenue of $420 million. Consulting margins tend to be fairly high, so a loss means that Palantir is “investing” in a lot of unbillable work. It’s hard to say if these “investments” will pay off. Palantir closed another round of funding in December, 2015, so people with more and better information than Buzzfeed obviously think they will, and are backing up their belief with cash.
By the way, you know who has an actual revenue problem? Buzzfeed.
Roger Peng attempts to draw lessons for data scientists from the Buzzfeed story, without questioning its premises. He should stick to Biostatistics.
— Databricks announces preview of Apache Spark 2.0 on Databricks Community Edition.
— From last week: Reynold Xin explains what’s new in Spark 2.0.
— Dave Ramel summarizes the new features, including faster SQL; consolidation of the Dataset and DataFrame APIs; support for ANSI (2003) SQL; and Structured Streaming, an integrated view of tables and streams.
— Now that Spark 2.0 is in preview, MapR offers Spark 1.6.1.
— Four from Adrian Colyer:
- Parameter-free data mining.
- Time Series classification.
- Dynamic Time Warping for scalable matching and for more accurate classification of time series.
— Richard Williamson explains how to build a streaming prediction engine with Spark, MADlib, Kudu and Impala.
— On the Cloudera Vision blog, Santosh Kumar explains Hive-on-Spark.
— DataStax’ Dani Traphagen explains data processing with Spark and Cassandra.
— In ZDNet, Andrew Brust explains Microsoft’s R strategy, and gets it right.
— For a planted article in Linux.com, Pam Baker interviews IBM’s Mike Breslin, who answer questions nobody is asking about using Spark and Cloudant.
— Joyce Wells recaps a presentation by Booz Allen’s Jair Aguirre, who touts Apache Drill.
— Alex Woodie attends the Apache: Big Data 2016 conference and discovers open source projects.
— In Business Insider, Sam Shead describes FBLearnerFlow, a workbench for machine learning and AI.
— Leslie D’Monte describes some ways companies use machine learning in their operations.
Open Source Announcements
— Google announces release to open source of SyntaxNet, a framework for natural language understanding. Included in the release: an English parser dubbed Parsey McParseface. Journalists respond to the latter like dogs to a squirrel.
— Salesforce donates PredictionIO to Apache.
— Apache Storm announces two new maintenance releases:
— Apache Flink announces Release 1.0.3, with bug fixes and improved documentation.
— Apache Apex pushes a release to resolve a security issue.
— H2O.ai announces new hires with a strong orientation towards visualization, suggesting the company plans to add a more robust user interface to its best-in-class machine learning engine.