Thomas Dinsmore's Blog

Big Analytics Roundup (March 16, 2015)

March 16, 2015

Written by:

Big Analytics news and analysis from around the web. Featured this week: a new Spark release, Spark Summit East, H2O, FPGA chips, Machine Learning, RapidMiner, SQL on Hadoop and Chemistry Cat.

A reminder to readers that Spark Summit East is coming up March 18-19.

Alteryx

On the Alteryx Blog, Michael Snow plugs Alteryx and Qlik for predictive analytics.
And again, the same combo for spatial analytics.
Adam Riley blogs on testing Alteryx macros.

Apache Spark

For an overview, see the Apache Spark Page.

The Spark team announces availability of Spark 1.3.0. Release notes here. Highlights of the new release include the DataFrames API, Spark SQL graduates from Alpha, new algorithms in MLLib and Spark Streaming, a direct Kafka API for Spark Streaming, plus additional enhancements and bug fixes. More on this release separately.
On Slideshare, Matei Zaharia outlines the 2015 roadmap for Apache Spark.
Also on Slideshare, Reynold Xin and Matei review lessons learned from running large Spark clusters.
In advance of Spark Summit, O’Reilly offers discounts on Spark video training and books.
Sandy Ryza, co-author of Advanced Analytics With Spark, writes on tuning Spark jobs, on the Cloudera Engineering blog
Databricks announces that advertising automation vendor Sharethrough has selected Spark and Databricks Cloud to process Terabyte scale clickstream data. Case study published here.
Holden Karau publishes a Spark testing procedure on Git.
On RedMonk, Donnie Berkholz summarizes growing awareness and interest in Spark.

Buzzwords

In Wired, Patrick McFadin hits the trifecta with Apache Spark, NoSQL databases and IoT.

H2O

In Silicon Angle, Saroj Kar interviews H2O.ai’s CEO SriSatish Ambati; video here.

High Performance Computing

Datanami reports that a Ryft One FPGA chip (with limited functionality) offers throughput equivalent to 100-200 Spark nodes. More coverage here. Ryft’s Christian Shrauder blogs about FGPA.

Machine Learning

Ching and Daniel propose using Random Matrix Theory to analyze highly dimensional social media data.
Cheng-Tao Chu offers seven ways to mess up your next machine learning project.
AMPLab‘s Jiannen Wang blogs on human-in-the-loop machine learning. Someone should write a book about that.

RapidMiner

Shaun McGirr posts on integrating RapidMiner and R.
Tobias Malbrecht performs heroics with RapidMiner.

SQL on Hadoop

On the Pivotal blog, a podcast about Hawq.
The Apache Software Foundation announces release 0.10 of Apache Tajo; Silicon Angle reports with a backgrounder.
TechWorld reports that AirBNB has open-sourced Airpal, an application that runs on Facebook’s PrestoDB. According to the story, Airpal is an application that “allows…non-technical employees to work like data scientists”, which suggests that TechWorld thinks data scientists do nothing but SQL.
Splice Machine has updated FAQs for its RDBMS-on-Hadoop.

Zementis

Decision engine vendor Zementis announces a partnership with analytic services provider Cognizant.
Zementis also announces introduction of the ADAPA decision engine for Software AG‘s APAMA Streaming Analytics Platform.

Leave a comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Thomas Dinsmore's Blog

Big Analytics Roundup (March 16, 2015)

AI Is Coming For Your Job!!!

Spring 2024 Preview

More on AI Venture Funding

Big Analytics Roundup (March 16, 2015)

Share this:

Leave a comment Cancel reply

AI Is Coming For Your Job!!!

Spring 2024 Preview

More on AI Venture Funding