Big Analytics Roundup (December 21, 2015)
With the holidays approaching, we still have some hard news; plus, some explainers and end of 2015 roundups. I’ll post my own roundup of 2015 later this week.
- In InfoWorld, H2O.ai’s Sri Ambati delivers a well-written practical introduction to machine learning.
- There’s a new look at RTInsights, a site that aggregates interesting content on real time analytics.
- At Amigo Bulls Giulio Prisco wonders what Facebook’s open sourcing of Big Sur means for Facebook stock.
- Also on Slideshare, H2O.ai’s Matt Dowle celebrates clean data.
- On the AWS Big Data blog, Nick Corbett explains how to tune your Titan Graph database on AWS.
- On the Confluent blog, Liquan Pei explains how to build ETL with Kafka Connect.
- Rick Van Der Lans delivers an excellent guide to SQL syntax with Apache Drill.
- On DZone, Henryk Konsek explains how to connect Apache Camel with Apache Spark. FWIW.
- On the Slalom blog, Kevin Feit and Oliver Asmus explain how to get started with Microsoft Azure Machine Learning. Not that much explaining is required; AML is very easy to use.
Best of 2015
- Eric Knorr summarizes the year 2015 in cloud. Key bits: AWS pulls ahead; machine learning moves to the cloud; Microsoft’s hybrid cloud.
- On the Apache Flink blog, Robert Metzger updates the community on the year 2015 in Flink. More on Slideshare.
(1) Time Series Analytics for Spark
(2) FuxiSort Smashes Sort Records
Here’s a story from October that I missed. A team from Alibaba demolishes the sort speed records in four categories with the unfortunately named FuxiSort.
(3) Qubole Adds Google Cloud Platform Support
Big Data-as-a-service provider Qubole announces Spark service for Google Cloud Platform. Qubole Data Service offers persistent Spark notebooks and automatic provisioning for Spark Clusters. QDS is now available on the three leading Cloud platforms.
(4) TPC Releases Benchmark Standard for Big Data
The Transaction Processing Council (TPC) releases TPC-DS 2.1, an industry standard benchmark for SQL-based Big Data systems. The standard models the complete decision support process, measuring query response time, throughput, data integration performance and data load for a given system configuration. Details here.
(5) New Stuff for Microsoft Azure Machine Learning
In other AML news, Azure customers can now apply Azure Machine Learning models as a function on streaming data. On Microsoft’s Machine Learning blog, Sudhesh Suresh reports. Gary Ericson delivers a cheat sheet.
(6) Small Improvements for BigInsights
IBM announces BigInsights 4.1 Fix Pack 2, which adds support for SLES and Spark 1.5.1, plus enhancements to Big SQL and Text Analytics.
(7) New Bits for Drill
The Drill team announces Release 1.4, with minor enhancements.