Big Analytics Roundup (May 11, 2015)
Lots of news this week, to compensate for last week’s lame haul.
In an excellent post on O’Reilly Radar, Ben Lorica surveys the landscape of workbooks, notebooks and workflow tools, which he categorizes by user persona.
On GitHub, a collection of links for streaming analytics (h/t O’Reilly Data).
In Wired, Cade Metz profiles Adatao founder Chris Nguyen.
In TechRepublic, Mary Shacklett reveals Alteryx’ “secret” for improving usability.
Analytics in the Cloud
Dave Wang of Databricks goes out on a limb, lists five reasons analytics in the cloud will be big in 2015.
Ian Pointer invents a bogus conflict between Flink and Spark, throws FUD at Spark in the process. Pointer’s celebration of Flink’s “pure” stream processing and Tez integration reads like a Hortonworks plant.
More on Flink here. Color me skeptical. I’ll buy into Flink when it does something.
Zubin Dowlaty of Mu Sigma unpacks the time machine, takes us back to 2011.
Cloudera announces that Phoenix, a project for SQL on Hbase, has joined Cloudera Labs.
On this podcast, Matei Zaharia discusses Spark.
In Datanami, Alex Woodie dives into Databricks’ plans to speed up Spark.
CRN names virtualization startup BlueData to its Big Data 100.
BI and visualization startup ClearStory Data announces “further” integration of its Spark-based tooling with Cloudera CDH, which will surprise those who thought ClearStory had already integrated with CDH.
It’s not clear what “further” integration means. Integration is like pregnancy; you can’t be a little bit pregnant, and you can’t integrate just a little. ClearStory’s datasheet says it can use Hadoop as a data source without specifying a distribution, which most customers would take to mean it works with CDH.
I suspect that this announcement simply means that ClearStory has certified its software on CDH. Now if they would just get around to certifying on Spark.
Jordan Novet publishes a roundup of announcements at Ignite 2015.
By open-sourcing Greenplum and outsourcing Hadoop, Pivotal seems to be trying to move up the stack. That’s a reasonable strategy — it’s doubtful there is a need for another Hadoop distribution, and Pivotal was a marginal player. Greenplum was roadkill in the data warehouse appliance wars, permanently stuck in the Visionary box in Gartner’s Magic Quadrant, so open-sourcing makes sense. It’s like donating your old Volvo to charity to take the tax deduction.
But it’s not clear what it is they’re trying to move up to; when the top of your stack is Hawq and a query optimizer, you’re in deep trouble. EMC passed on buying Alpine when it bought Greenplum, which tells me that EMC execs do not “get” business analytics.
Here’s another triumph of packaging: Studebakers badged as Packards.
After much hoopla about its “D” round, Predixion announces that it has raised $4 million, which is just sad.
RapidMiner announces a new dot release.
In ITBusinessEdge, Mike Vizard describes some planned enhancements to SAP’s offerings for predictive analytics.
Dharmendra Kapadia is very excited about Scala.
Cool visualization vendor Zoomdata announces rapid bookings growth, new executive hires.