This is the first installment in a four-part review of 2016 in machine learning and deep learning.
In the first post, we look back at ML/DL news organized in five high-level topic areas:
- Concerns about bias
- Interpretable models
- Deep learning accelerates
- Supercomputing goes mainstream
- Cloud platforms build ML/DL stacks
In Part Two, we cover developments in each of the leading open source machine learning and deep learning projects.
Parts Three and Four will review the machine learning and deep learning moves of commercial software vendors.
Concerns About Bias
As organizations expand the use of machine learning for profiling and automated decisions, there is growing concern about the potential for bias. In 2016, reports in the media documented racial bias in predictive models used for criminal sentencing, discriminatory pricing in automated auto insurance quotes, an image classifier that learned “whiteness” as an attribute of beauty, and hidden stereotypes in Google’s word2vec algorithm.
Two bestsellers were published in 2016 that address the issue. The first, Cathy O’Neil’s Weapons of Math Destruction, is a candidate for the National Book Award. In a review for The Wall Street Journal, Jo Craven McGinty summarizes O’Neil’s arguments as “algorithms aren’t biased, but the people who build them may be.”
A second book, Virtual Competition, written by Ariel Ezrachi and Maurice Stucke, focuses on the ways that machine learning and algorithmic decisions can promote price discrimination and collusion. Burton Malkiel notes in his review that the work “displays a deep understanding of the internet world and is outstandingly researched. The polymath authors illustrate their arguments with relevant case law as well as references to studies in economics and behavioral psychology.”
Most working data scientists are deeply concerned about bias in the work they do. Bias, after all, is a form of error, and a biased algorithm is an inaccurate algorithm. The organizations that employ data scientists, however, may not commit the resources needed for testing and validation, which is how we detect and correct bias. Moreover, people in business suits often exaggerate the accuracy and precision of predictive models or promote their use for inappropriate applications.
In Europe, GDPR creates an incentive for organizations that use machine learning to take the potential for bias more seriously. We’ll be hearing more about GDPR in 2017.
Speaking of GDPR, beginning in 2018, organizations that use machine learning to drive automated decisions must be prepared to explain those decisions to the affected subjects and to regulators. As a result, in 2016 we saw considerable interest in efforts to develop interpretable machine learning algorithms.
— The MIT Computer Science and Artificial Intelligence Laboratory announced progress in developing neural networks that deliver explanations for their predictions.
— At the International Joint Conference on Artificial Intelligence, David Gunning summarized work to date on explainability.
— MIT selected machine learning startup Rulex as a finalist in its Innovation Showcase. Rulex implements a technique called Switching Neural Networks to learn interpretable rule sets for classification and regression.
— In O’Reilly Radar, Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin explained Local Interpretable Model-Agnostic Explanations (LIME), a technique that explains the predictions of any machine learning classifier.
— The Wall Street Journal reported on an effort by Capital One to develop machine learning techniques that account for the reasoning behind their decisions.
In Nautilus, Aaron M. Bornstein asked: Is artificial intelligence permanently inscrutable? There are several issues, including a lack of clarity about what “interpretability” means.
It is important to draw a distinction between “interpretability by inspection” versus “functional” interpretability. We do not evaluate an automobile by disassembling its engine and examining the parts; we get behind the wheel and take it for a drive. At some point, we’re all going to have to get behind the idea that you evaluate machine learning models by how they behave and not by examining their parts.
Deep Learning Accelerates
In a September Fortune article, Roger Parloff explains why deep learning is suddenly changing your life. Neural networks and deep learning are not new techniques; we see practical applications emerge now for three reasons:
— Computing power is cheap and getting cheaper; see the discussion below on supercomputing.
— Deep learning works well in “cognitive” applications, such as image classification, speech recognition, and language translation.
— Researchers are finding new ways to design and train deep learning models.
In 2016, the field of DL-driven cognitive applications reached new milestones:
— A Microsoft team developed a system that recognizes conversational speech as well as humans do. The team used convolutional and long short-term memory (LSTM) neural networks built with Microsoft Cognitive Toolkit (CNTK).
— On the Google Research Blog, a Google Brain team announced the launch of the Google Neural Machine Translation System, a system based on deep learning that is currently used for 18 million translations per day.
— In TechCrunch, Ken Weiner reported on advances in DL-driven image recognition and how they will transform business.
Venture capitalists aggressively funded startups that leverage deep learning in applications, especially those that can position themselves in the market for cognitive solutions:
— Zebra Medical Vision, an Israeli startup, uses DL to examine medical images and diagnose diseases of the bones, brain, cardiovascular system, liver, and lungs. Zebra disclosed a $12 million venture round led by Intermountain Health.
There is an emerging ecosystem of startups that are building businesses on deep learning. Here are six examples:
— Deep Genomics, based in Toronto, uses deep learning to understand diseases, disease mutations and genetic therapies.
— Cybersecurity startup Deep Instinct uses deep learning to predict, prevent, and detect threats to enterprise computing systems.
— Ditto Labs uses deep learning to identify brands and logos in images posted to social media.
— Enlitic offers DL-based patient triage, disease screening, and clinical support to make medical professionals more productive.
— Gridspace provides conversational speech recognition systems based on deep learning.
— Indico offers DL-driven tools for text and image analysis in social media.
And, in a sign that commercial development of deep learning isn’t all hype and bubbles, NLP startup Idibon ran out of money and shut down. We can expect further consolidation in the DL tools market as major vendors with deep pockets ramp up their programs. The greatest opportunity for new entrants will be in specialized applications, where the founders can deliver domain expertise and packaged solutions to well-defined problems.
Supercomputing Goes Mainstream
To make deep learning practical, you need a lot of computing horsepower. In 2016, hardware vendors introduced powerful new platforms that are purpose-built for machine learning and deep learning.
While GPUs are currently in the lead, there is a serious debate under way about the relative merits of GPUs and FPGAs for deep learning. Anand Joshi explains the FPGA challenge. In The Next Platform, Nicole Hemsoth describes the potential of a hybrid approach that leverages both types of accelerators. During the year, Microsoft announced plans to use Altera FPGAs, and Baidu said it intends to standardize on Xilinx FPGAs.
NVIDIA Launches the DGX-1
NVIDIA had a monster 2016, tripling its market value in the course of the year. The company released the DGX-1, a deep learning supercomputer. The DGX-1 includes eight Tesla P100 GPUs, each of which is 12X faster than NVIDIA’s previous benchmark. For $129K you get the throughput of 250 CPU-based servers.
NVIDIA also revealed a Deep Learning SDK with Deep Learning primitives, math libraries, tools for multi-GPU communication, a CUDA toolkit and DIGITS, a model training system. The system works with popular Deep Learning frameworks like Caffe, CNTK, TensorFlow, and Theano.
Tech media salivated:
- ExtremeTech: “Is there anything a computer can’t do?”
- Gizmodo: “Stupidly powerful.”
- Engadget: “Insane.”
- TechReport: “Holy mother of GPUs.”
MIT Technology Review interviewed NVIDIA CEO Jen-Hsun Huang, who is now Wall Street’s favorite tech celebrity.
Separately, Karl Freund reports on NVIDIA’s announcements at the SC16 supercomputing show.
Early users of the DGX-1 include BenevolentAI, PartnersHealthCare, Argonne and Oak Ridge Labs, New York University, Stanford University, the University of Toronto, SAP, Fidelity Labs, Baidu, and the Swiss National Supercomputing Centre. Nicole Hemsoth explains how NVIDIA supports cancer research with its deep learning supercomputers.
Cray Releases the Urika-GX
Cray launched the Urika-GX, a supercomputing appliance that comes pre-loaded with Hortonworks Data Platform, the Cray Graph Engine, OpenStack management tools and Apache Mesos for configuration. Inside the box: Intel Xeon Broadwell cores, 22 terabytes of memory, 35 terabytes of local SSD storage and Cray’s high-performance network interconnect. Cray ships 16, 32 or 48 nodes in a rack in the third quarter, larger configurations later in the year.
The headline on the Wired story about Google’s deep learning chip — Time for Intel to Freak Out — looks prescient. Intel acquired Nervana Systems, a 28-month-old startup working on hardware and software solutions for deep learning. Re/code reported a price tag of $408 million. The customary tech media unicorn story storm ensues.
Intel said it plans to use Nervana’s software to improve the Math Kernel Library and market the Nervana Engine alongside the Xeon Phi processor. Nervana neon is YADLF — Yet Another Deep Learning Framework — that ranked twelfth in usage among deep learning frameworks in KDnuggets’ recent poll. According to Nervana, neon benchmarks well against Caffe; but then, so does CNTK.
Paul Alcorn offers additional detail on Intel’s new Xeon CPU and Deep Learning Inference Accelerator. In Fortune, Aaron Pressman argues that Intel’s strategy for machine learning and AI is smart, but lags NVIDIA. Nicole Hemsoth describes Intel’s approach as “war on GPUs.”
Cloud Platforms Build ML/DL Stacks
Machine learning use cases are inherently well-suited to cloud platforms. Workloads are ad hoc and project oriented; model training requires huge bursts of computing power for a short period. Inference workloads are a different matter, which is one of many reasons one should always distinguish between training and inference when choosing platforms.
Amazon Web Services
After a head fake earlier in the year when it publishing DSSTNE, a deep learning project that nobody wants, AWS announces that it will standardize on MXNet for deep learning. Separately, AWS launched three new machine learning managed services:
— Rekognition, for image recognition
— Polly, for text to speech
— Lex, a conversational chatbot development platform
In 2014, AWS was first to market among the cloud platforms with GPU-accelerated computing services. In 2016, AWS added P2 instances with up to 16 Tesla K8- GPUs.
Released in 2015 as CNTK, Microsoft rebranded its deep learning framework as Microsoft Cognitive Toolkit and released Version 2.0, with a new Python API and many other enhancements. The company also launched 22 cognitive APIs in Azure for vision, speech, language, knowledge, and search. Separately, MSFT released its managed service for Spark in Azure HDInsight and continued to enhance Azure Machine Learning.
MSFT also announced the Azure N-Series compute instances powered by NVIDIA GPUs for general availability in December.
Azure is one part of MSFT’s overall strategy in advanced analytics, which I’ll cover in Part Three of this review.
In February, Google released TensorFlow Serving, an open source inference engine that handles model deployment after training and manages their lifetime. On the Google Research Blog, Noah Fiedel explained.
Later in the Spring, Google announced that it was building its own deep learning chips, or Tensor Processing Units (TPUs). In Forbes, HPC expert Karl Freund dissected Google’s announcement. Freund believes that TPUs are actually used for inference and not for model training; in other words, they replace CPUs rather than GPUs.
Google launched a dedicated team in October to drive Google Cloud Machine Learning, and announced a slew of enhancements to its services:
— Google Cloud Jobs API provides businesses with capabilities to find, match and recommend jobs to candidates. Currently available in a limited alpha.
— Cloud Translation API will be available in two editions, Standard and Premium.
— Cloud Natural Language API graduates to general availability.
In 2017, GPU-accelerated instances will be available for the Google Compute Engine and Google Cloud Machine Learning. Details here.
In 2016, IBM contributed heavily to the growing volume of fake news.
At the Spark Summit in June, IBM announced a service called the IBM Data Science Experience to great fanfare. Experienced observers found the announcement puzzling; the press release described a managed service for Apache Spark with a Jupyter IDE, but IBM already had a managed service for Apache Spark with a Jupyter IDE.
In November, IBM quietly released the service without a press release, which is understandable since there was nothing to crow about. Sure enough, it’s a Spark service with a Jupyter IDE, but also includes an R service with RStudio, some astroturf “community” documents and “curated” data sources that are available for free from a hundred different places. Big Whoop.
In IBM’s other big machine learning move, the company rebranded an existing SPSS service as Watson Machine Learning. Analysts fell all over themselves raving about the new service, apparently without actually logging in and inspecting it.
Of course, IBM says that it has big plans to enhance the service. It’s nice that IBM has plans. We should all aspire to bigger and better things, but keep in mind that while IBM is very good at rebranding stuff other people built, it has never in its history developed a commercially successful software product for advanced analytics.
IBM Cloud is part of a broader strategy for IBM, so I’ll have more to say about the company in Part Three of this review.