Big Analytics Roundup (July 20, 2015)
Top story of the week: MapR reports that it doubled booking and billings year-over-year in the second quarter.
Dato holds its Data Science Summit in SFO starting today. See item below on GraphLab.
A general shout-out to Hadoop Weekly. If you don’t subscribe already, you should.
This isn’t news, but it’s cool. The Linking Open Data cloud diagram:
High Performance Computing
In Computerworld, Lucas Mearian reports that MIT researchers built a server network with FPGA and flash that runs as fast as RAM, at a fraction of the cost.
Open Source Software
On ZDNet, Andrew Brust rounds up the latest Apache news:
- Atlas for data governance announces Release 0.5
- Columnar file format Parquet delivers Release 1.8
- Whirr is toast
On YouTube, Stefan Ewan delivers an intro to Flink.
On the Cloudera blog, Marcel Kornacker et. al. summarize key milestones for the project to date and outline the roadmap through 2016.
In TechRepublic, Matt Asay interviews Mesosphere’s CEO, explains how Mesos can make your data center run like Google.
On the Confluent blog, Neha Narkhede explains how to make Kafka elastic with Mesos.
Mesosphere announces SDK for distributed apps on DCOS.
The Spark team announces maintenance release 1.4.1.
In Datanami, Alex Woodie summarizes Python vs. R in Spark, fails to note that there are currently no R bindings for Spark’s machine learning library, the principal draw for an R user.
Huawei contributes a Spark package for Spark SQL on HBase.
On LinkedIn Pulse, Bernard Marr asks: Hadoop or Spark? (Spoiler: the answer is “both”).
On the IBM Bluemix blog, Luis Arellano corrals some materials to help you get smarter about Spark.
Hacker, father and Jesus follower Evan Chan comprehensively summarizes deployment options for Spark.
Two blog posts cover new features in Spark 1.4:
- Yin Huai and Michael Armburst describe Spark SQL’s Window functions on the Databricks blog
- On the Hortonworks blog. Zhan Zhang, Cheng Lian and Patrick Wendell document ORC support
On the Microsoft BI blog, Theresa Palmer explains how to integrate Power BI with Spark on Azure.
ZeppelinHub, currently in beta, offers you the ability to share Zeppelin graphs and reports.
Databricks adds support for R notebooks to Databricks Cloud
In the O’Reilly Data Show Podcast, Carlos Guestrin recaps GraphLab’s evolution.
On Slideshare, three presentations from UseR!
- Amy Wang offers on overview of H2O.
- Spencer Aiello delivers an intro to R and H2O.
- Matt Dowle summarizes H2O design and infrastructure.