Big Analytics Roundup (July 20, 2015)

Top story of the week: MapR reports that it doubled booking and billings year-over-year in the second quarter.

Dato holds its Data Science Summit in SFO starting today.  See item below on GraphLab.

A general shout-out to Hadoop Weekly.  If you don’t subscribe already, you should.

This isn’t news, but it’s cool.  The Linking Open Data cloud diagram:


High Performance Computing

In Computerworld, Lucas Mearian reports that MIT researchers built a server network with FPGA and flash that runs as fast as RAM, at a fraction of the cost.

Open Source Software

On ZDNet, Andrew Brust rounds up the latest Apache news:

  • Atlas for data governance announces Release 0.5
  • Columnar file format Parquet delivers Release 1.8
  • Whirr is toast

Apache Flink

On YouTube, Stefan Ewan delivers an intro to Flink.

Apache Impala

On the Cloudera blog, Marcel Kornacker et. al. summarize key milestones for the project to date and outline the roadmap through 2016.

Apache Mesos

In TechRepublic, Matt Asay interviews Mesosphere’s CEO, explains how Mesos can make your data center run like Google.

On the Confluent blog, Neha Narkhede explains how to make Kafka elastic with Mesos.

Mesosphere announces SDK for distributed apps on DCOS.

Apache Spark

The Spark team announces maintenance release 1.4.1.

In Datanami, Alex Woodie summarizes Python vs. R in Spark, fails to note that there are currently no R bindings for Spark’s machine learning library, the principal draw for an R user.

Huawei contributes a Spark package for Spark SQL on HBase.

On LinkedIn Pulse, Bernard Marr asks: Hadoop or Spark?  (Spoiler: the answer is “both”).

On the Altiscale blog, Andrew Lee writes part two of his how-to series for Spark on Hadoop.  Part one is here.

On the IBM Bluemix blog, Luis Arellano corrals some materials to help you get smarter about Spark.

Hacker, father and Jesus follower Evan Chan comprehensively summarizes deployment options for Spark.

Two blog posts cover new features in Spark 1.4:

  • Yin Huai and Michael Armburst describe Spark SQL’s Window functions on the Databricks blog
  • On the Hortonworks blog. Zhan Zhang, Cheng Lian and Patrick Wendell document ORC support

On the Microsoft BI blog, Theresa Palmer explains how to integrate Power BI with Spark on Azure.

Apache Zeppelin

ZeppelinHub, currently in beta, offers you the ability to share Zeppelin graphs and reports.


Databricks adds support for R notebooks to Databricks Cloud


In the O’Reilly Data Show Podcast, Carlos Guestrin recaps GraphLab’s evolution.


On Slideshare, three presentations from UseR!

  • Amy Wang offers on overview of H2O.
  • Spencer Aiello delivers an intro to R and H2O.
  • Matt Dowle summarizes H2O design and infrastructure.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.