Big Analytics Roundup (May 18, 2015)
Light news: announcements from Dato, Google, Oracle and Pentaho, plus other cool stuff.
On the PWC technology blog, Alan Morrison and Bo Parker interview Martin Van Ryswyk and Marko Rodriguez of Datastax about graph analytics. PWC’s headline writer gets it wrong; the article is about graph engines and not graph databases. Special-purpose graph databases, like special-purpose columnar databases, are a dead end; graph analytics will be incorporated into general-purpose tooling. The evidence? They’re interviewing guys from Datastax and not Neo.
In Data Science Central, “Data Science Girl” surveys top public data repositories, so we don’t have to keep using the 1998 KDD Cup data.
Adatao CEO blogs about why he’s placing his chips on Spark.
On the Flink blog, Fabian Huske provides one more reason not to care about Flink.
Failing to sell GemFire, Pivotal open-sourced it as Apache Geode. InfoWorld reports.
On the Databricks blog, Masaru Dobashi et. al. describe how NTT uses Spark on thousand-node clusters for operational analytics at scale. Nick Heudecker, call your office.
Nick Amato demonstrates how to classify customers with Spark MLLib.
Justin Kestelyn summarizes some lessons learned working with Spark.
The Spark team announces Spark Summit Europe, to be held October 27-29 in Amsterdam.
Data announces release of GraphLab Create, which includes support for scikit-learn models, a label propagation toolkit and a number of other new features.
By the way, it appears the folks at Dato forgot to Google the name before rebranding.
HDP releases its Q1 financials. Revenue more than doubled, while the operating loss doubled, a great example of negative operating leverage. Good news: HDP’s variable margin on services turned positive, which means they don’t have to give away consulting services as much as they did last year. Wall Street was pleased.
In VentureBeat, Barry Levine kills two birds with one stone, touts the GrowthBeat Summit and Lattice Engines’ new features. One assumes the latter sponsors the former.
NoSQL vendor MarkLogic secures a generous $102 million Series F round.
Oracle announces spatial and graph analytics for Big Data. (h/t Oliver Vagner)
Pentaho announces integration with Apache Spark, enabling orchestration of Spark jobs. Coverage here, here, here, here, here, here, and here. Reporting this story, Alex Woodie trolls another spurious Spark “concern.”
Predixion’s Marcom people show they’ve heard about IoT.
In VentureBeat, Jordan Novet reviews Wolfram’s new image identification tool, which leverages deep learning.