Big Analytics Roundup (September 26, 2016)
Note to readers: Recently, I’ve noticed that news about events that occur on Tuesdays seems stale by the time I publish on Monday. Beginning this week, I’m shifting to a new publication model, posting analysis of events as they happen instead of a weekly roundup. You could say I’m switching from batch updates to real-time updates, which should please Nathan Marz.
IBM’s Rob Thomas writes about the end of tech companies. Well, I guess he would know.
In a turgidly written essay, Alpine’s Steve Hillion makes the cogent point that it takes too long to deploy predictive models. The cure for this, he argues, is an agile development process for data science. I agree, but I don’t see this happening with Alpine.
Tony Baer takes an appropriately skeptical look at Teradata’s “solutions journey.” Teradata aspires to move up in the value chain by delivering solutions. And the San Diego Padres aspire to win the World Series some day.
BigML adds logistic regression to its platform; I’ve been experimenting with BigML recently and am impressed. BigML also adds a voice command interface, FWIW. Maybe they can get Zooey Deschanel to cut an ad.
Adrian Colyer reflects on a new paper from AMPlab on performing graph computations on graphs that change over time.
Troubles at H2O.ai?
Sri Ambati, CEO of H2O.ai, announces layoffs, including inside sellers and account executives. He also announces that the company has no plans to build a vertical in IoT, which makes sense since IoT is not a vertical, and nobody in the industry believed that H2O.ai was building one.
In VentureBeat, Jordan Novet stirs the pot, claiming that “a source familiar with the matter” told him that H2O.ai booked $800,000 in Q1 revenue after promising investors millions. Ambati disputes Novet’s numbers but does not deny missing the target. It’s a moot point: missed targets in Q1 don’t cause layoffs in September. It’s more logical to conclude that H2O.ai is having trouble converting people who download and use its free software into buyers.
That isn’t a knock on H2O software, which is excellent; the problem is that H2O’s service value proposition is unclear. Big banks and insurers tend to be unexcited about the finer points of Silicon Valley hacker culture; they like to see consultants who can deliver on project plans. H2O.ai has never developed a professional services arm, which makes it difficult for the company to implement a services business model.
Novet goes on to generalize about the struggles of the open source business model in general, citing the acquisitions of Turi, PredictionIO and Revolution Analytics. It seems that Novet does not understand the concept of “exit” — most startups would kill to command the buyout valuations those companies received. Startups either exit through an IPO (which is extremely rare) or they are acquired; the rest join the “living dead” of small companies muddling through from year to year in office parks, or they go out of business.
In his on-the-record comments, Ambati speculates that H2O.ai will offer a cloud-based managed service next year. That would have been a good idea in 2014, but it’s a little late at this point. The H2O software works on all three cloud platforms, but there is no managed service available.
Ambati also muddies the waters about H2O.ai’s next product (branded as Steam). The company previously positioned Steam as commercially licensed software, but Ambati now says it will be open sourced “later.” If H2O.ai expects to distribute Steam under an open source software license, it makes sense to do so from the beginning. Current marketing materials position H2O as “fully open source” and the Steam components as “something else,” which makes no sense at all.
H2O.ai’s strategic waffling, delayed cloud platform and lack of investment in a professional consulting arm suggest that the company has not seriously thought through how to monetize its software; which means that the current hiccup isn’t the last.
Speaking of Exits
Apple, itching to spend that offshore cash before the EU grabs it, acquires TupleJump Software. Tuplejump is a tiny company with no funding, whose workflow product has little to do with machine learning or artificial intelligence. You wouldn’t know that from reading reports of the deal in the tech press.
Forrester Surveys “BI on Hadoop.”
Forrester publishes its first “Wave” evaluation of “Native Hadoop” BI Platforms. You can pay Forrester $2,495 for a copy, get one free from Arcadia Data or you can just look at the screenshot below. In the report, Forrester evaluates Arcadia Enterprise, Attivio, Datameer, Oracle Big Data Discovery Cloud Service and Zoomdata.
There are several problems with Forrester’s analysis.
First, “Native Hadoop BI Platforms” is a nonsense category. BI users don’t want “BI on Hadoop”; they just want BI. The platform you choose to manage data is your problem, not theirs, and it’s up to you to deliver data that is accessible with the tools they want to use. Business users should not have to switch from one BI tool to another to work with different data sources. Platfora failed precisely for this reason, and I suspect Datameer won’t be around much longer either.
Second, if the category is “native” Hadoop platforms, several of the entries don’t belong. Oracle and Zoomdata, for example, run on Spark and aren’t constrained to HDFS as a file system. Of course, both products can run on Hadoop, but since they work elsewhere, it makes Forrester’s category a little silly. (Forrester evaluated Oracle’s cloud-based version. It’s a managed service, so for all anyone cares the product employs an army of trained crickets under the covers.)
Third, the actual ratings are a puzzler. Forrester rates the Oracle product 4/5 on Advanced Analytics, which is problematic because it has none. (Think I’m kidding? Check the documentation.) The same holds for SQL support, where Forrester rates Oracle at 3/5 even though the product has no SQL support.
But this is merely quibbling. “BI on Hadoop” is just the wrong way to think about BI, because it creates new silos. BI users should be able to use the tools they want to use everywhere in the organization, regardless of the physical data storage platform. Anything less than that is a distraction.
data Artisans Offers Flink Distribution
Just in time for Strata, data Artisans, the Berlin-based startup driving the Apache Flink project, announces a supported distribution of Flink. Under the dA Platform brand, the distribution includes Flink, patches, hotfixes, and the promise of 24/7/365 support from a company with 17 people, limited funding, and offices located in Berlin, Germany. Alex Woodie reports.
ASF Announces Kudu 1.0
The Apache Software Foundation announces the availability of Apache Kudu 1.0. Kudu is an open source columnar storage system optimized for table scans. Kudu is agnostic concerning SQL engines and supports queries with Drill, Impala, and Spark.
Roundup of Roundups
For those of you who prefer the convenience of a weekly news summary, here are some good ones to follow:
- Data Science Central Weekly Digest
- The Data Science Roundup
- Hadoop Weekly
- icrunchdata Weekly
- O’Reilly Data Newsletter
Books to Read
- Python Machine Learning (Sebastian Raschka)
- Enterprise IoT: A Definitive Handbook (Naveen Balani)
- Enterprise IoT: Strategies and Best Practices (Dirk Slama, et. al.)
- IOT Disruptions 2020 (Sudha Jamthe)