Big Analytics Roundup (July 25, 2016)

We have some more summer reading this week; plus, Splice Machine announces availability of its open source Community Edition, and Google launches two new machine learning APIs. There are so many Spark stories I’ve created a special section for them. Plus we have the usual explainers, perspectives, and news.

Quant headhunter Linda Burtch repeats her survey of working analysts in her network. Preference for using SAS has steadily declined over the three years she has conducted the poll; this year a clear majority chose R or Python over SAS. Preference for open source correlates with education; the more you know, the less likely you are to use SAS.

Oracle, IBM, SAP, and Microsoft have all reported Q2 revenue and earnings, but Teradata is still crunching the numbers. I’ll do a general earnings roundup when TDC gets around to reporting its numbers. TDC’s stock price has outperformed the others since June 30, which suggests the market expects a good second quarter. Meanwhile, TDC acquires another consultancy and reveals who bought Aprimo.

Summer Reading

Adrian Colyer lists his five favorite papers from the past several months and outlines his philosophy, which you must read. And here is another link to last week’s top paper on data bazaars versus data cathedrals.

Splice Machine Shifts to Open Core

Hadoop-based RDBMS vendor Splice Machine announces general availability for its open source community edition and offers a sandbox hosted on AWS.  Sam Dean approves; Andrew Brust reports; Dave Ramel explains. Jack Germain describes Splice Machine’s changing business model.

Spark Stories

— Databricks’ Spark survey is still accepting responses. Go and fill it out if you have not done so already.

— The Spark PMC has voted favorably on a release candidate for Spark 2.0, which is now in packaging for general availability.

— On the Databricks blog, Jules Damji corrals Spark news from the past two weeks.

— Alex Woodie touts LevyxSpark, an enhanced Spark distribution based on open source Apache Spark. LevyxSpark includes some open source enhancements, plus Levyx Helium, an SSD-based key-value store.

— In a webcast, Alexander Ulanov summarizes options for deep learning on Spark.

— Sam Weaver explains how to use the new MongoDB connector for Spark.

Explainers

— Nita Dembla and Gopal Vijayaraghavan explain improvements in Hive 2.1.

— Siddharth Anand introduces Apache Airflow (Incubating), a platform to author, schedule, and monitor DAGs. Sounds like Apache Beam.

— Data Artisans’ Stephan Ewan explains savepoints in Apache Flink.

Perspectives

— Jack Clark profiles Google’s land grab in deep learning. Short version: TensorFlow is blowing away Caffe, Torch, Theano, dl4j, CNTK, and DSSTNE.

— Greg Satell theorizes about Google’s open source strategy as if a “razor and blades” strategy is something new and brilliant.

— In Fortune, Barb Darrow profiles cloud computing’s disruptive impact.

— Sam Dean confuses machine learning with artificial intelligence.

— Syncsort’s Paige Roberts interviews Dr. Ellen Friedman.

— Drew Breunig poses a theory about the business implications of machine learning.

— BuzzFeed’s Adam Kelleher attempts to explain bias, fails.

— IBM exec Rob Thomas co-authors a blog about machine learning. It’s about what you would expect from an IBM exec.

Open Source News

— Open source columnar storage engine Apache Kudu graduates to top-level status.

— Apache Chukwa announces Release 0.8, with security bug fixes, FWIW. Chukwa captures logs from distributed systems for monitoring and analysis. No, I never heard of it either.

Commercial Announcements

— Google announces open beta for its Cloud Natural Language and Cloud Speech APIs.

Hardware News

— Inspur, which claims to be China’s largest server manufacturer, announces availability of the Memory1 line of servers for big analytics. Inspur uses high-capacity flash DIMMs and memory expansion software to deliver up to 2TB of memory per server and up to 80TB per rack.

— Startup Wave Computing announces plans for a family of deep learning computers. Good luck to them. The history of computing isn’t kind to special purpose machines, which tend to eventually get buried by general purpose machines.

Funding News

— Redis Labs lands a $14 million “C” round led by Bain Capital and Carmel Ventures. Redis claims 6,200 enterprise customers and 55,000 accounts for its cloud service.

— Sift Security emerges from stealth, announces $3.25 million in angel funding. Sift uses graph analytics running on Spark and TitanDB to identify linked threats and incidents.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s