Big Analytics Roundup (September 19, 2016)
Many thanks to Australia’s Dez Blanchfield for his contributions to this roundup. We set out to create a special “Australia/APAC” edition; however, most of the stories have a global interest: chips are chips and deep learning is deep learning wherever you live. We did find this story, profiling a Tasmanian oyster farm that uses Microsoft’s IoT hub.
Well, that’s embarrassing. MapR’s new ebook leads with success stories from comScore and Wells Fargo just as both companies hit the scandal sheets, the former for inflating revenue and the latter for operating a Potemkin cross-sell operation. Perhaps MapR can land VW for a hat-trick.
Self-proclaimed AI Hacker Ben Taylor explains the racist robot beauty contest. It’s not hard to explain. The folks at Beauty.ai who ran the event used supervised learning, and most of the reference images they used to train the models were white, so the algorithms learned “whiteness” as a predictor of “beauty”. Aside from the essential silliness of a robot beauty contest, this story illustrates the dangers of placing powerful tools in the hands of people who have no idea what they are doing.
- IBM DataFirst Launch Event
- Sydney (September 28-29):
- Las Vegas (October 23-27):
- Brussels (October 25-27): Spark Summit Europe
— Adrian Colyer summarizes a paper on deep neural networks for YouTube recommendations.
— Databricks offers a white paper from Ovum explaining the need for just-in-time data platforms.
Dez on Disruptive Analytics:
“Finally, the missing map for business and technology leaders struggling with the deluge of messages about Big Data, Analytics, and Disruption. Thomas Dinsmore compiles a no-nonsense, plain-speaking “must read” embracing open source software, the Hadoop ecosystem, in-memory analytics, cloud platforms, streaming analytics, deep learning and self-service analytics. And if that isn’t enough, there is also a handbook for managers, which should be required reading for anyone tasked with deliverables in this brave new data-driven world.”
And I didn’t pay him to write that.
SAS Launches Viya
SAS announces General Availability for Viya, the third modern architecture SAS has introduced since 2012. (“Honest! We got it right this time!”) SAS unveiled Viya last April at SAS Global Forum; since then, SAS has repositioned Viya as a cognitive computing platform. Alex Woodie reports; Brian Jackson examines two new SAS products that run on Viya.
SAS touts Viya as “cloud-ready”, which is exactly how SAS positioned Release 9.4 when introduced two years ago. “Cloud-ready” is a meaningless concept; all software is “cloud-ready” in the sense that you can stand up any software in a cloud instance. What matters is whether the software is available as a managed service in the cloud, with elastic pricing; it appears that SAS is leaning towards elastic pricing for Viya, but details are not yet available.
While the SAS runtime engine is proprietary software, Viya exposes Python, Java, Lua and REST APIs. SAS touts Viya’s “run anywhere” capabilities, which is irrelevant since it’s only offered as a managed service.
TensorRT is a software library that optimizes deep learning models for production deployment. The DeepStream SDK simplifies development of high-performance video analytics, and will be included in the Nvidia Deep Learning SDK. Both packages will be available through an Early Access program in October.
IBM Unveils Power8 servers for Deep Learning
With its typical Marketing aplomb, IBM brands the servers IBM Power System S822LC, IBM Power System S821LC and IBM Power System S822LC for Big Data. (“Dude, you’re getting an IBM Power System S822LC!”) The S822LC connects an IBM Power8 processor with an Nvidia Tesla P100 GPU through the NVLink protocol. The other two boxes also leverage GPU accelerators. (h/t Dez)
reports. (h/t Dez)survey it commissioned among “senior IT executives.” Daniel Gutierrez
According to the survey, most respondents believe that IoT will improve operating costs and uptime. However, few say they are doing as much as a pilot, so those benefits are abstract — like the folks who say they’re going to diet and get more exercise next year. (If respondents really believed the benefits are real, they’d be knocking down doors to get pilots funded.)
- 64% say that integrating data from disparate sources and formats is the biggest challenge to adopting IoT. What, wait? Isn’t that what Big Data tools were supposed to do?
- 87% say that without a data management strategy they will be overwhelmed by data.
Pleasingly for Bit Stew, respondents aren’t happy with their existing data integration tools. Moreover, respondents who are actually doing something with IoT are least confident in existing tools.
Teradata Shuffles Deck Chairs
Doug Henschen reports from Teradata’s Partner Conference in Atlanta. Highlights of the conference: Teradata cloud and virtualization push; “Borderless Analytics,” Teradata’s catchall term for federating queries across platforms; and Teradata’s focus on solutions, an effort to get away from the bottom of the stack.
Teradata’s cloud initiatives were good ideas in 2011.
I’ll believe in borderless analytics when somebody figures out how to federate across Oracle, DB2, Teradata, and SQL Server. In other words, never.
Teradata’s leadership thinks they can climb out of the pit of declining product sales by pushing consulting and “solutions”. The premise of that strategy is that all of the other consultants and solution providers — many of whom have decades of experience — will lay down and let TDC eat their lunch. The only customers for Teradata “solutions” are folks who are stuck with old Teradata boxes; so unless Teradata can figure out how to invigorate the sale of those boxes, consulting and services won’t save the company.
— IBM’s Berni Schiefer explains improvements to Spark SQL in Spark 2.0.
— On the Cloudera Engineering Blog, Devadutta Ghat et. al. explains Impala performance and cost considerations for S3 vs. EBS.
— On the Alluxio blog, Calvin Jia explains why you should use Alluxio to improve the performance and consistency of HDFS.
— Spark Committer Nick Pentreath explains Spark 2.0 machine learning.
— Kostas Tzoumas explains streaming analytics.
— Zementis CEO Michael Zeller explains how to deploy deep learning with PMML.
— In New Straits Times Online, Bilqis Bahari summarizes Malaysia’s efforts to become a Big Data hub for Southeast Asia. (h/t Dez)
— Aaron Frank whines that the world depends on technology no one understands. (h/t Dez) Please. Very few people understood steam locomotives, and we managed to muddle through.
— Paul X. McCarthy invents a new word for mining secondary data sources. (h/t Dez)
— James Nunns wonders what happened to innovation in Big Data.
— Oracle kicks off the Fall earnings season with its quarterly report for the period ending August 31. It’s a familiar story: cloud revenues up, software licensing revenues down, hardware revenues in free fall.
Open Source News
— Hivemall, a machine learning library for Apache Hive, enters Apache Incubation status.
— Qubole announces plans to offer its Big Data service for Oracle Cloud Platform.
— Boston-based machine learning startup DataRobot announces Data Science for Executives, a half-day course designed to help execs identify opportunities for machine learning. In CIO, Thor Olavsrud reports. (h/t Dez)
— Microsoft selects ten machine learning and data science companies for the fourth class of its Seattle Accelerator.
— GridGain Systems offers its product on Microsoft Azure.