Big Analytics Roundup (August 8, 2016)
So, Apple acquires Turi for $200 million. Hopefully, Apple did not pay for brand equity.
— Flink Forward announces the schedule for its second annual event, to be held September 12-14 in Berlin.
— Databricks announces the agenda for Spark Summit Europe 2016 in Brussels (October 25-27)
GraphLab Dato Turi
Geekwire breaks the story, reporting a purchase price of $200 million. According to TechCrunch, Turi notified customers that its products would no longer be available. Apple adds Turi to the portfolio of machine learning startups it has acquired in the past year, including Emotient, Perceptio, and VocalIQ. More reporting here.
GraphLab started in 2009 as an open source project led by Carlos Guestrin of Carnegie Mellon. (According to OpenHub Guestrin never contributed any code.) In May 2013, Guestrin raised $6.75M to start an eponymous venture to provide commercial support for GraphLab. In October 2014, GraphLab announced the availability of GraphLab Create, a commercially licensed software product. Contributions to the open source project actually ended in 2013; while the code remains on GitHub, the project is dead.
GraphLab changed its name to Dato in January 2015. They should have googled the name; at the time, the top links in a search included Dato Foland, a gay porn star, and Datto Inc, a data backup and recovery company in Connecticut. The latter proved problematic; Datto sued, forcing Dato to rebrand as Turi earlier this month.
Turi’s open source SFrame project remains for those who think introducing another file system into the mix is a smart thing to do.
Teradata: 9 Straight Quarters of Declining Product Revenue
For the second quarter of 2016, declining data warehouse giant Teradata reports an 11% decline in product revenue compared to Q2 2015. (Product revenue includes revenue from licensing software and hardware — boxes with the Teradata brand.) Maintenance revenue increased slightly, which means that customers aren’t pulling the plug on Teradata databases as fast as they did last year. Consulting revenue declined by 1%, which casts doubt on TDC’s stated strategy to become a services powerhouse.
Count me as skeptical about the merits of that plan. Teradata’s consulting revenue remains highly correlated with product revenue; in other words, if Teradata can’t sell its boxes, it’s not going to sell billable hours for consultants to implement those boxes. Teradata is not a credible competitor in the market for consulting-led solutions; companies like Oracle, IBM and SAS have a twenty-year head start.
Since Teradata performed better than “expectations”, Wall Street rewarded the stock with a bounce above $30. It’s a dead-cat bounce. As the Wall Street Journal notes, companies routinely game analyst expectations. TDC currently trades at 32 times trailing earnings, well above its peers; moreover, its peers are growing rather than declining.
— Kaarthik Sivashanmugam explains how to develop Apache Spark applications in .NET with Mobius.
— On the Cloudera Engineering blog, Devadutta Ghat et. al. explain the latest performance improvements in Impala 2.6.
— Parsey McParseface now has 40 cousins. On the Google Research Blog, Chris Alberti et. al. explain.
— Ujjwal Ratan explains how to use Amazon Machine Learning to predict patient readmission.
— Curt Monash offers his assessment of Spark. Highlights:
- Spark replaces MapReduce, in particular for data transformation.
- Spark is becoming the default platform for machine learning.
- Spark SQL is OK as an adjunct for other analysis.
- Spark Streaming is doing well, but there are challengers. (See below).
- Databricks’ managed service for Spark has more than 200 subscribers.
— Serdar Yegulalp deploys the tired old “pure streaming versus microbatch” argument to claim that Apache Apex, Heron, Apache Flink and Onyx are “contenders” versus Spark. Someone should show him this graph:
— In Datanami, Alex Woodie profiles Flink.
— Trevor Jones describes Microsoft Azure’s big data tools.
— Sam Dean champions Sparkling Water, H2O’s interface to Spark.
— John Snow Labs announces it will deliver curated data in Parquet format.
— Lexalytics announces the availability of its Semantria text analytics software on Azure.