Big Analytics Roundup (April 25, 2016)

Mesosphere wins the internet this week with its announcement that it has open sourced DC/OS, its datacenter virtualization project built around Apache Mesos. While not an “analytics” project per se, DC/OS has the potential to transform how organizations provision and deploy their analytics platforms.

In a nutshell, Apache Mesos distributes workloads across physical IT resources. DC/OS adds a container orchestration platform; installation, management and monitoring tools; and improvements to networking, security, load balancing, security and other areas. For more details about DC/OS and why it matters, read this white paper by Benjamin Hindman and Edward Hsu of Mesosphere.

Mesosphere has assembled an alliance of 61 launch partners, including tech vendors, systems integrators and potential users. Big brands include Accenture, Capgemini, Cisco, EMC, HPE, Microsoft, MapR, Microsoft and Verizon. Notable startups include Alluxio, Canonical, Confluent, Lightbend and MemSQL.

Analysts chime in:

  • Gavin Clarke thinks Google forced Mesosphere’s hand by open sourcing Kubernetes.
  • Mike Wheatley, notes that many of the components were already open source.
  • On TechCrunch, Frederic Lardinois reports and comments.
  • In Computerworld, John Ribeiro reports.
  • Janakiram MSV wonders if DC/OS will emerge as an alternative to Kubernetes.
  • Sam Dean surveys the project and interviews Ben Hindman.
  • George Leopold notes the scope of the DC/OS ecosystem.
  • Joao Lima reports.

DC/OS ships with more than 30 open source packages ready to install as DC/OS services. Notable among them: Cassandra, Elasticsearch, Kafka, MemSQL, Spark, Storm and Zeppelin.

Explainers

— Andrie de Vries explains how he scraped CRAN to trace the growth in R packages.

— On the Cloudera Engineering blog, David Alves explains how to use Impala and Kudu for analytic workloads.

— Michael Hunger and William Lyon explain how they analyzed the Panama Papers with Neo4j.

— On the Microsoft Azure blog, Liam Cavanagh explains how to optimize document search in Azure.

— Adrian Colyer of the morning paper summarizes five papers on word vectors, reviews Global Vectors for Word Representation, delivers an overview of Deep Learning and covers ImageNet classification with deep convolutional neural networks.

— Mario Inchiosa and Roni Burd explain how Microsoft R Server delivers an R interface to Spark in HDInsight.

Perspectives

— In MIT Technology review, Tom Simonite interviews Google’s Jeff Dean, contributor to Spanner, Translate, BigTable, MapReduce, Google Brain. LevelDB and TensorFlow. They discuss the future of machine learning.

— David Weldon went to Strata and interviewed some people:

  • Ali Hodroj of GigaSpaces, a cloud enabling company. Hodroj is bullish on cloud.
  • H2O.ai’s Arno Candel, who is surprised that so many people are talking about Spark.
  • Nikita Ivanov of GridGain, who says that people are excited about in-memory computing.
  • DataArtisans’ Kostas Tzoumas, who thinks that more people would use Flink if they were better educated.

— Alex Woodie touts Apache Beam, the open source implementation of Google’s Cloud Dataflow, which aspires to unify everything.

— James Nunns surveys ten Big Analytics startups: Confluent, H2O.ai, AtScale, Interana, Tamr, Wavefront, BlueTalon, Cazena, DataTorrent and Databricks.

— In Silicon Angle, Wikibon’s Paul Gillin interviews Wikibon’s George Gilbert, who is bullish on Spark.

— John Leonard ruminates on Hadoop, noting the proliferation of cute animal logos, and the challenges of the open source business model.

— Sam Dean notices that there are quite a few new open source tools for machine learning.

— Jack Vaughan summarizes the educational challenges posed by machine learning.

Commercial Announcements

— Dataiku announces availability of Data Science Studio on Microsoft Azure.

— GridGain announces availability of a support package for Apache Ignite that includes its Professional Edition — essentially the same as Apache Ignite, with more frequent maintenance releases and some LGPL libraries.

— MemSQL announces closing on a $36 million “C” round. All existing investors participated, plus two new investors.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s