Big Analytics Roundup (September 19, 2016)

Many thanks to Australia’s Dez Blanchfield for his contributions to this roundup. We set out to create a special “Australia/APAC” edition; however, most of the stories have a global interest: chips are chips and deep learning is deep learning wherever you live. We did find this story, profiling a Tasmanian oyster farm that uses Microsoft’s IoT hub.

Well, that’s embarrassing. MapR’s new ebook leads with success stories from comScore and Wells Fargo just as both companies hit the scandal sheets, the former for inflating revenue and the latter for operating a Potemkin cross-sell operation. Perhaps MapR can land VW for a hat-trick.

Self-proclaimed AI Hacker Ben Taylor explains the racist robot beauty contest. It’s not hard to explain. The folks at who ran the event used supervised learning, and most of the reference images they used to train the models were white, so the algorithms learned “whiteness” as a predictor of “beauty”. Aside from the essential silliness of a robot beauty contest, this story illustrates the dangers of placing powerful tools in the hands of people who have no idea what they are doing.


Good Reads

— Adrian Colyer summarizes a paper on deep neural networks for YouTube recommendations.

— Databricks offers a white paper from Ovum explaining the need for just-in-time data platforms.

Shameless Commerce

Dez on Disruptive Analytics:

“Finally, the missing map for business and technology leaders struggling with the deluge of messages about Big Data, Analytics, and Disruption. Thomas Dinsmore compiles a no-nonsense, plain-speaking “must read” embracing open source software, the Hadoop ecosystem, in-memory analytics, cloud platforms, streaming analytics, deep learning and self-service analytics. And if that isn’t enough, there is also a handbook for managers, which should be required reading for anyone tasked with deliverables in this brave new data-driven world.”

And I didn’t pay him to write that.

SAS Launches Viya

SAS announces General Availability for Viya, the third modern architecture SAS has introduced since 2012. (“Honest! We got it right this time!”) SAS unveiled Viya last April at SAS Global Forum; since then, SAS has repositioned Viya as a cognitive computing platform. Alex Woodie reports; Brian Jackson examines two new SAS products that run on Viya.

SAS touts Viya as “cloud-ready”, which is exactly how SAS positioned Release 9.4 when introduced two years ago. “Cloud-ready” is a meaningless concept; all software is “cloud-ready” in the sense that you can stand up any software in a cloud instance. What matters is whether the software is available as a managed service in the cloud, with elastic pricing; it appears that SAS is leaning towards elastic pricing for Viya, but details are not yet available.

While the SAS runtime engine is proprietary software, Viya exposes Python, Java, Lua and REST APIs. SAS touts Viya’s “run anywhere” capabilities, which is irrelevant since it’s only offered as a managed service.

Nvidia, Intel Announce Chips for Deep Learning, AI 

Nvidia announces the Tesla P4 and Tesla P40 chips for deep learning, plus TensorRT and DeepStream software for video inferencing. In April, Nvidia launched the 15-billion transistor Tesla P100 chip.

The P100 supports model training; the P4 and P40 serve as platforms for speech, image or text recognition with previously trained models. The P4 can replace 13 CPU-based servers, while a server with eight P40s delivers performance equivalent to 140 CPU-based servers.

TensorRT is a software library that optimizes deep learning models for production deployment. The DeepStream SDK simplifies development of high-performance video analytics, and will be included in the Nvidia Deep Learning SDK. Both packages will be available through an Early Access program in October.

Separately, Intel announces a next-generation version of its high-end Xeon Phi server chip for AI applications, which Baidu plans to use for its Deep Speech platform. (h/t Dez)

IBM Unveils Power8 servers for Deep Learning

IBM announces three Power8 Linux servers that leverage Nvidia NVLink technology to move data faster. Designed to support AI and deep learning applications, IBM hopes the new servers will stem the bleeding in IBM’s Systems business unit, whose revenue declined 23% in Q2 2016. 

With its typical Marketing aplomb, IBM brands the servers IBM Power System S822LC, IBM Power System S821LC and IBM Power System S822LC for Big Data. (“Dude, you’re getting an IBM Power System S822LC!”) The S822LC connects an IBM Power8 processor with an Nvidia Tesla P100 GPU through the NVLink protocol. The other two boxes also leverage GPU accelerators. (h/t Dez)

Survey: Hardly Anyone Actually Uses IOT

IOT data integration vendor Bit Stew Systems releases results of an IDG Quick Pulse survey it commissioned among “senior IT executives.” Daniel Gutierrez reports. (h/t Dez)

According to the survey, most respondents believe that IoT will improve operating costs and uptime. However, few say they are doing as much as a pilot, so those benefits are abstract — like the folks who say they’re going to diet and get more exercise next year. (If respondents really believed the benefits are real, they’d be knocking down doors to get pilots funded.)

Other points:

  • 64% say that integrating data from disparate sources and formats is the biggest challenge to adopting IoT. What, wait? Isn’t that what Big Data tools were supposed to do?
  • 87% say that without a data management strategy they will be overwhelmed by data.

Pleasingly for Bit Stew, respondents aren’t happy with their existing data integration tools. Moreover, respondents who are actually doing something with IoT are least confident in existing tools.

Teradata Shuffles Deck Chairs

Doug Henschen reports from Teradata’s Partner Conference in Atlanta. Highlights of the conference: Teradata cloud and virtualization push; “Borderless Analytics,” Teradata’s catchall term for federating queries across platforms; and Teradata’s focus on solutions, an effort to get away from the bottom of the stack.

Teradata’s cloud initiatives were good ideas in 2011.

I’ll believe in borderless analytics when somebody figures out how to federate across Oracle, DB2, Teradata, and SQL Server. In other words, never.

Teradata’s leadership thinks they can climb out of the pit of declining product sales by pushing consulting and “solutions”. The premise of that strategy is that all of the other consultants and solution providers — many of whom have decades of experience — will lay down and let TDC eat their lunch. The only customers for Teradata “solutions” are folks who are stuck with old Teradata boxes; so unless Teradata can figure out how to invigorate the sale of those boxes, consulting and services won’t save the company.


— IBM’s Berni Schiefer explains improvements to Spark SQL in Spark 2.0.

— In a two-part post, Ryan Nienhuis explains how to write SQL on streaming data with Amazon Kinesis Analytics. Part one is here; part two is here.

— On the Cloudera Engineering Blog, Devadutta Ghat et. al. explains Impala performance and cost considerations for S3 vs. EBS.

— On the Alluxio blog, Calvin Jia explains why you should use Alluxio to improve the performance and consistency of HDFS.

— Spark Committer Nick Pentreath explains Spark 2.0 machine learning.

— Kostas Tzoumas explains streaming analytics.

— Andrew Brust asks: Is this the age of Big OLAP? He doesn’t answer that question, but he does explain the differences between AtScale, Kyvos Insights, and Arcadia Data.

— Zementis CEO Michael Zeller explains how to deploy deep learning with PMML.


— In New Straits Times Online, Bilqis Bahari summarizes Malaysia’s efforts to become a Big Data hub for Southeast Asia. (h/t Dez)

— Aaron Frank whines that the world depends on technology no one understands. (h/t Dez) Please. Very few people understood steam locomotives, and we managed to muddle through.

— Paul X. McCarthy invents a new word for mining secondary data sources. (h/t Dez)

— James Nunns wonders what happened to innovation in Big Data.

Earnings Watch

— Oracle kicks off the Fall earnings season with its quarterly report for the period ending August 31. It’s a familiar story: cloud revenues up, software licensing revenues down, hardware revenues in free fall.

Open Source News

Hivemall, a machine learning library for Apache Hive, enters Apache Incubation status.

Commercial Announcements

— Qubole announces plans to offer its Big Data service for Oracle Cloud Platform.

— Google announces that it has acquired Urban Engines, a provider of location-based analytics. In VentureBeat, Ken Yeung reports. (h/t Dez)

— Galactic Exchange, whose ClusterGX product promises Hadoop/Spark clusters in five minutes, closes a $1.25M seed round. VentureBeat reports. (h/t Dez)

— Boston-based machine learning startup DataRobot announces Data Science for Executives, a half-day course designed to help execs identify opportunities for machine learning. In CIO, Thor Olavsrud reports. (h/t Dez)

— Microsoft selects ten machine learning and data science companies for the fourth class of its Seattle Accelerator.

— GridGain Systems offers its product on Microsoft Azure.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.