• vertical-south-lookdown-city-beautiful-view-4k-uhd-wallpaper-1920x1200

    The Year in Machine Learning (Part Four)

    This is the fourth installment in a four-part review of 2016 in machine learning and deep learning. — Part One covered Top Trends in the field, including concerns about bias, interpretability, deep

    Read more »
  • equations

    The Year in Machine Learning (Part Three)

    This is the third installment in a four-part review of 2016 in machine learning and deep learning. In Part One, I covered Top Trends in the field, including concerns about bias, interpretability,

    Read more »
  • hashrobot

    The Year in Machine Learning (Part Two)

    This is the second installment in a four-part review of 2016 in machine learning and deep learning. Part One, here, covered general trends. In Part Two, we review the year in

    Read more »
  • terminal-interface-design-rainmeter-bluevision

    The Year in Machine Learning (Part One)

    This is the first installment in a four-part review of 2016 in machine learning and deep learning. In the first post, we look back at ML/DL news organized in five

    Read more »
  • thief0221141280jpg-39f8cd_1280w

    How to Steal a Predictive Model

    In the Proceedings of the 25th USENIX Security Symposium, Florian Tramer et. al. describe how to “steal” machine learning models via Prediction APIs. This finding won’t surprise anyone in the business,

    Read more »
  • 1906_earthquake_train

    Disruptive Analytics

    This is an introduction to my book, Disruptive Analytics, available now from Amazon and  Apress. Disruption: in business, a radical change in an industry or business strategy, especially involving the

    Read more »

Roundup 12/12/2016

piz-daint-cscs1

ICYMI: Top machine learning (ML) and deep learning (DL) stories from last week.

Note to readers: Due to the slower cadence of news as the holidays approach, the daily roundup will be on hiatus until January.  Watch for a roundup of the year in machine learning this week, and a look ahead to 2017. Thank you for reading.

News

— Microsoft announces Release 9.0 of Microsoft R Server, a bundle of components built on an enhanced R distribution. Highlights of the new release include MicrosoftML, a package of machine learning algorithms; simplified model deployment; support for Spark 2.0; Microsoft R Open (3.3.2) and Microsoft R Client (3.3.2). Serdar Yegualp reports. Linkapalooza here.

— Software AG acquires Zementis for an undisclosed amount. The press release says that Zementis provides software for deep learning. This is incorrect; Zementis offers the ADAPA and UPPI scoring engines, which read PMML documents and produce record-level predictions.

— Uber acquires AI startup Geometric Intelligence for an undisclosed amount.

— NVIDIA offers a deep learning teaching kit for educators, complete with lecture slides, videos, hands-on labs, coding projects, source code solutions, e-books and GPU resources.

— Steve Ranger reports on work by researchers from Cray, Microsoft, and the Swiss National Supercomputing Centre to speed up deep learning on supercomputers. The team has successfully used the Microsoft Cognitive Toolkit (CNTK) to train deep learning on a Cray XC50 (pictured below) with more than 1,000 NVIDIA Tesla P100 GPUs.

Good Reads

— On the Algorithmia blog, Matt Kiser explains why deep learning matters.

— In The Wall Street Journal’s CIO Journal, Sara Castellanos reports on Capital One’s pursuit of explainable machine learning models.

— McKinsey consultants Christoph Glatzel, Matt Hopkins, Tim Lange, and Uwe Weiss explain how retailers use machine learning to drive fresh food stocking.

— In IEEE Spectrum, Deliang Wang explains how his lab at OSU uses deep learning to improve hearing aids.

Explainers

— Mu Sigma’s Arpit Saxena asks: can weather data improve your predictive models? The answer must be yes, or the article wouldn’t amount to much. Saxena explains some things you should consider when you add weather data to a predictive model.

— On the Lab41 blog, “Patrick C.” argues that sometimes manual feature engineering is easier than feature learning with deep learning.

— In Part One of a series on the MapR blog, Carol McDonald explains how to use k-means in Spark to cluster Uber trips.

— On his personal blog, data scientist Burak Himmetoglu explains how to stack models for better predictions.

— Carlos Perez explains why deep learning is fundamentally different from machine learning. Carlos, co-founder of Intuition Machine, is writing a book called Deep Learning Design Patterns; he blogs regularly here.

Bottom Story of the Week

— The Facebook audience grows older and crankier, and this may harm the social media giant’s revenue.

Roundup 12/9/2016

friday

Machine learning (ML) and deep learning (DL) content from the past 24 hours.

Microsoft Releases R Server 9.0

On the Cortana Intelligence and Machine Learning Blog, Microsoft announces Release 9.0 of Microsoft R Server, a bundle of components built on an enhanced R distribution. Highlights of the new release include:

— MicrosoftML, a package of machine learning algorithms

— Simplified model deployment through mrsdeploy, a package that converts R models to web services, and Swagger, a bundled open source language-agnostic interface to REST APIs

— Support for Spark 2.0 through ScaleR, a distributed machine learning package. ScaleR can now read Hive and Parquet data sources into Spark DataFrames

— The latest releases of Microsoft R Open (3.3.2) and Microsoft R Client (3.3.2)

The MicrosoftML package includes data transformation functions and machine learning algorithms developed internally at Microsoft. The transform functions enable the user to concatenate columns, hash categorical variables, convert categoricals to an indicator array, select features and featurize text. Algorithms include a fast linear model, logistic regression, single-class SVM, fast decision tree, fast random forest and neural networks.

Serdar Yegualp reports. Linkapalooza here.

screen-shot-2016-12-08-at-7-44-47-pm

Issues

— William Vorhies warns data scientists about government regulations that will soon impact the field. Surprisingly, he does not mention the EU’s General Data Protection Regulation, set to go into effect May 25, 2018. I’m currently writing a piece on GDPR, which I expect to publish soon.

— In HBR, the Ivy Leaguers who produced the Vietnam War and Enron deliver a guide to solving social problems with machine learning. It’s actually not a bad piece, though it reads as if they took HBR’s recently published guide to solving business problems with machine learning, crossed out “business” and replaced it with “social.” Can we all agree that there is something called methodology?

Fundamentals

— In Forbes, Bernard Marr takes another whack at defining the differences between machine learning, deep learning, and artificial intelligence.

— In a publication that calls itself University Herald, Chris Brandt discovers that machine learning is a thing and artificial intelligence is a thing and they are two different things.

Methods and Techniques

— In a Databricks Webinar, Joseph K. Bradley and Jules S. Damji explain how to migrate workloads from Spark’s RDD-based machine learning API to the new DataFrames-based API. There are notebooks with working examples.

Software/Services

— RStudio’s Joseph Rickert lists his favorite new packages in R among the 189 added in November, including 9 packages for machine learning.

Hardware

— In an Inside HPC podcast, NVIDIA’s Bryan Catanzaro predicts where deep learning is going next.

Bottom Story of the Day

— Audi trains a toy car to park itself. Linkapalooza here. Now, if they can just bring a car to market without cheating on the emissions rules.

Roundup 12/8/2016

thursday

Machine learning (ML) and deep learning (DL) content from the past 24 hours.

Good Reads

— McKinsey consultants Christoph Glatzel, Matt Hopkins, Tim Lange, and Uwe Weiss explain how retailers use machine learning to drive fresh food stocking.

— In IEEE Spectrum, Deliang Wang explains how his lab at OSU uses deep learning to improve hearing aids.

People

— Unity Technologies hires Danny Lange, Uber’s head of machine learning.

Fundamentals

— Vincent Granville revisits the central limit theorem.

Research

— In an exclusive meeting, Apple reveals the state of its AI research: LiDAR, smaller neural networks and more. Apple promises to publish what it learns. No word about that headphone jack. Linkapalooza here.

— Steve Ranger reports on work by researchers from Cray, Microsoft, and the Swiss National Supercomputing Centre to speed up deep learning on supercomputers. The team has successfully used the Microsoft Cognitive Toolkit (CNTK) to train deep learning on a Cray XC50 (pictured below) with more than 1,000 NVIDIA Tesla P100 GPUs.

piz-daint-cscs1

Deep learning hardware porn.

Methods and Techniques

— Mu Sigma’s Arpit Saxena asks: can weather data improve your predictive models? The answer must be yes, or the article wouldn’t amount to much. Saxena explains some things you should consider when you add weather data to a predictive model.

— Ben Frederickson offers an interactive tutorial on numerical optimization. It starts strong, but it’s all downhill from there. If you don’t get the joke, read the article.

— In the second part of a series, Sibanjan Das explains anomaly detection with H2O deep learning. Part one, a general introduction into deep learning, is here.

Software/Services

— Qulix’ Aleksandr Sliborsky touts Azure Machine Learning in what appears to be a Microsoft astroturf blog. It’s still an interesting read.

Hardware

— Inside HPC attends the Intel HPC Developer Conference and interviews a number of people on interesting topics: accelerating machine learning, anomaly detection, optimizing deep learning frameworks, distributed KNN, and other topics.

Applications

— Molly Olmstead explains how physicists use deep learning to identify subatomic particles.

Companies

— The BigML blog profiles contenders in the Brazilian AI Startup Battle.

Bottom Story of the Day

— IBM’s James Kobelius speculates about data science in 2017.

Roundup 12/7/2016

wednesday

Machine learning (ML) and deep learning (DL) content from the past 24 hours.

On Thursday, December 8, Databricks’ Joseph Bradley and Jules Damji will deliver a webinar on migrating Spark ML workloads to DataFrames.

Good Reads

— On the Algorithmia blog, Matt Kiser explains why deep learning matters.

— In The Wall Street Journal’s CIO Journal, Sara Castellanos reports on Capital One’s pursuit of explainable machine learning models.

— Separately, WSJ’s Christoper Mims explores the challenges businesses face in delivering value from AI and machine learning. He cites three constraints: insufficient data; a shortage of business problems where AI/ML can make a difference; and the talent shortage. I would note that the data shortage problem isn’t a matter of volume but quality. Many of those petabytes that data warehousing people brag about are worthless.

Fundamentals

— Carlos Perez explains why deep learning is fundamentally different from machine learning. Carlos, co-founder of Intuition Machine, is writing a book called Deep Learning Design Patterns; he blogs regularly here.

— In ZDNet, George Anadiotis discovers that machine learning and predictive analytics go together as if this is news, thereby reinforcing the impression that the folks who publish ZDNet are clueless about both topics.

— Bernard Marr explains the difference between artificial intelligence and machine learning.

— NVIDIA offers a deep learning teaching kit for educators, complete with lecture slides, videos, hands-on labs, coding projects, source code solutions, e-books and GPU resources.

Software/Services

— The folks at Google Cloud Big Data and Machine Learning Blog — “GCBDMLB” for short — list their top ten favorite Google BigQuery user experiences of 2016.

— BigML publishes a webinar covering its Fall 2016 release.

— Google DeepMind releases its training environment to open source on GitHub. Jeremy Kahn reports in Bloomberg.

Hardware

— Kunal Jain explains how to build a machine learning/deep learning workstation for under $5,000.

— Ben Cotton discusses the ins and outs of GPUs, ASICs and FPGAs for machine learning, and profiles Graphcore, a startup with a distinctive approach.

Applications

— In Wired, Davey Alba profiles Amazon Go, a retail concept that uses RFID, sensors, and artificial intelligence to enable checkout-free shopping. You just grab what you want; the order posts to your Amazon account later. Currently, there is one Amazon Go store operating in Seattle, in beta for Amazon employees. Linkapalooza here.

— Aspire Health, a Medicare Advantage provider, develops an algorithm that can predict which patients are likely to die in the next year (based on medical records.) The company uses the algorithm to offer palliative care to patients in lieu of heroic treatment and save itself a bundle of money. They should rename the company Soylent Green and take the program to the next step if you know what I mean, wink, wink.

— Aspectiva’s Rafi Mendelsohn explains how his company uses machine learning to identify fake online reviews.

— In Recode, Eric Johnson interviews AliveCor CEO Vic Gundotra, who opines about the emerging role of machine learning in medicine.

Bottom Story of the Day

— The Facebook audience grows older and crankier, and this may harm the social media giant’s revenue.

Roundup 12/6/2016

tuesday

Machine learning (ML) and deep learning (DL) content from the past 24 hours.

On the AtScale blog, some old guy says BI-on-Hadoop is dead.

People

— The folks at DataCamp recognize the five top R package maintainers:

  • Hadley Wickham (ggplot2, dplyr, tidyr,…)
  • Yihui Xie (knitr, bookdown, rmarkdown, shiny, htmlwidgets,…)
  • Dirk Eddelbuettel (Rcpp, RPostgreSQL,…)
  • Jeroen Ooms (jsonlite, xml2,…)
  • Achim Zeileis (colorspace, zoo,…)

— In The Huffington Post, Lolita Taub interviews Dr. Satya Mallick, co-founder of Sight Commerce, who explains image recognition and other aspects of artificial intelligence.

Issues

— Stephen Hawking predicts that automation and AI are going to decimate middle-class jobs. He should stick to physics.  It’s a good example of the Luddite Fallacy in economics, the belief that technological change necessarily produces mass unemployment.

— In a review of Virtual Competition, John Naughton wonders how you throw the book at an algorithm.

Research

— Allison Lynn explains the work of researchers seeking to democratize machine learning.

— Researchers at Dartmouth and the University of Sheffield use an MIT algorithm to map U.S. regions from census data about commuter paths.

01megaregions-ngsversion-1480539744591-adapt-1900-1

Methods and Techniques

— On the Lab41 blog, “Patrick C.” argues that sometimes manual feature engineering is easier than feature learning with deep learning.

— In Part One of a series on the MapR blog, Carol McDonald explains how to use k-means in Spark to cluster Uber trips.

— On his personal blog, data scientist Burak Himmetoglu explains how to stack models for better predictions.

— On the Hyndsight blog, Rob J. Hyndman explains cross-validation for time series in R.

Hardware

— Intel reveals its AI strategy. On the Moor Insights blog, Patrick Moorhead dissects it.

— On the same blog, Karl Freund explains NVIDIA’s approach to reshaping computing.

Applications

— Reuters reports that Facebook has a project underway to detect fake news and offensive videos with machine learning. Linkapalooza here.

— In ZDNet, Bob Violino argues that deep learning will transform the future of the auto industry.

— Daily Mail reports that planetary researchers will use a recommendation engine to search for aliens, thus demonstrating the click bait power of the words “machine learning.”

Companies

— Software AG acquires Zementis for an undisclosed amount. The press release says that Zementis provides software for deep learning. This is incorrect; Zementis offers the ADAPA and UPPI scoring engines, which read PMML documents and produce record-level predictions.

— Apple confirms that it is working on machine learning and autonomous vehicles, according to a report in The Wall Street Journal. Storylanche ensues.

— Fox Business summarizes the year-to-date stock price change for five AI stocks: Alphabet, Amazon, Baidu, IBM, and NVIDIA. NVIDIA is far and away the winner, up 185%. IBM is #2, up 19%, which demonstrates that even dead cats bounce.

— Media company Valassis partners with cloud analytics company Lityx.

— Uber acquires AI startup Geometric Intelligence for an undisclosed amount.

Bottom Story of the Day

— In a video, Dave Mark demonstrates what happens when you let Amazon Echo talk to Google Home.

Roundup 12/05/2016

monday

ICYMI: Top machine learning (ML) and deep learning (DL) stories from last week.

News

— Baidu releases “Long Utterance,” a set of Chinese language APIs for its speech recognition technologies.

— Max Kuhn, author of Applied Predictive Modeling and progenitor of the caret package for machine learning moves to RStudio.

— At AWS’ re:Invent Conference, Databricks announces HIPAA compliance for its Apache Spark managed service. Databricks has also achieved AWS Public Partner status.

— Amazon Web Services launches three new services:

— TensorFlow v0.12.0 RC0 is now available, and it runs on Microsoft Windows. Features available on Windows are a subset of the full feature set. For details, read the announcement.

— Health tech startup Health Catalyst launches Healthcare.ai, a suite of open source packages for healthcare machine learning with R and Python APIs. The R package works with any R distribution and RStudio; the Python package works with Anaconda.

Good Reads

— In The Next Platform, Nicole Hemsoth explains why Microsoft invests in FPGAs for compute-intensive applications like machine learning. Separately, Nicole investigates Intel’s strategy to integrate the deep learning assets it acquired when it bought Nervana earlier this year and explains the supercomputing vision of NVIDIA CEO Jen-Hsun Huang.

Explainers

— In MIT Technology Review, Will Knight describes how a Google eye scanning algorithm can diagnose diabetic retinopathy better than human experts can. On the Google blog, Lily Peng explains. The JAMA paper is here.

— In a podcast, Ben Lorica interviews Mike Franklin, co-director of Berkeley’s recently wrapped-up AMPLab project, who talks about AMPLab’s legacy. That legacy includes Spark, Alluxio, BlinkDB, KeystoneML, and Succinct, among other projects.

— In Harvard Business Review, Anastassia Fedyk explains how to tell if machine learning can solve your business problem.

— In the second installment of a planned series on deep learning research, Adit Deshpande explains reinforcement learning. The first installment covered generative adversarial nets.

— Adrian Sampson describes three common statistical mistakes and how to avoid them.

— Here is the complete series of posts on Topic Modeling from the BigML blog. If you don’t know what Topic Modeling is, read the series.

— Bioinformatics maven Shirin Glander asks: can we predict flu deaths with machine learning and R? She proceeds to answer the question by demonstrating multiple ways to do so in a tour de force post, with graphics and code snippets.

unnamed-chunk-4-1

— Serdar Yegualp explains why AWS standardized on MXNet for DL.

— In MIT Technology ReviewNicholas Diakopoulos and Sorelle Friedler propose a framework to ensure accountability for algorithms. They stress five principles: responsibility, explainability, accuracy, auditability, and fairness.

— On GitHub, Simon Brugman builds a collection of deep learning papers.

— In Data Science Central, William Vorhies asks: has AI gone too far?  The context of his question is a paper that summarizes research into detecting criminality from facial images. In short, the researchers were able to successfully distinguish criminals from non-criminals in a sample of Chinese men aged 18 to 55 solely from facial measurements extracted from pictures. Vorhies notes that the research is rigorous, and, while the paper has evoked a chorus of criticism for its implications, critics have not yet identified a flaw in the methodology.

— Also in Forbes, Aaron Tilley chronicles NVIDIA’s transformation from a maker of gaming chips to a maker of AI chips.

Bottom Story of the Week

— In The Eponymous Pickle, Franz Dill reports on sex as an algorithm. I’m not kidding.

Roundup 12/2/2016

freevector-futuristic-machinery

Machine learning (ML) and deep learning (DL) content from the past 24 hours.

ZDNet has a special section on AI and machine learning. I’ve pulled some of the interesting pieces and linked them in the appropriate sections below.

Issues

— In Data Science Central, William Vorhies asks: has AI gone too far?  The context of his question is a paper that summarizes research into detecting criminality from facial images. In short, the researchers were able to successfully distinguish criminals from non-criminals in a sample of Chinese men aged 18 to 55 solely from facial measurements extracted from pictures. Vorhies notes that the research is rigorous, and, while the paper has evoked a chorus of criticism for its implications, critics have not yet identified a flaw in the methodology.

— In an article about fake news, Vincent Granville opines that it’s hard to detect fake news with machine learning because it’s hard to define fake news.

Fundamentals

— Alison DeNisco explains why AI and machine learning need to be part of your digital transformation strategy.

— Hope Reese lists five ways to get started implementing AI and ML.

Research

— MIT researchers develop a computational model of the human brain’s mechanism for face recognition.

— Cognitive scientist Joscha Bach ruminates on the elements of human intelligence we seem to be missing in AI.

Methods and Techniques

— In a podcast, Jon Bruner and Pete Skomoroch interview Richard Socher, chief scientist at Salesforce, and discuss how to make neural networks more accessible.

Software/Services

— Health tech startup Health Catalyst launches Healthcare.ai, a suite of open source packages for healthcare machine learning with R and Python APIs. The R package works with any R distribution and RStudio; the Python package works with Anaconda.

— Rescale’s Mark Whitney explains the ins and outs of running deep learning in the cloud. Rescale offers a managed service for deep learning in IBM Cloud.

— Reporting from AWS re:Invent, Doug Henschen argues that Amazon can make up for its late entry into machine learning services will be offset by its scale. File that under Things That Ain’t Necessarily So. AWS has a well-deserved reputation as the Stupid Cloud, and three services don’t begin to match what Microsoft, Google, and IBM offer.

— Nick Heath asks if Microsoft should be your AI and machine learning platform. He answers his own question by enumerating the many different services in the Cortana Intelligence Suite.

— Natalie Gagliordi asks the same question of Google.

— Hope Reese wonders if AWS should be your AI and machine learning platform. She quotes Gartner’s Alexander Linden, which is a bad sign.

— Conner Forrest asks if declining tech giant IBM should be your AI and machine learning platform. He doesn’t really answer the question.

Applications

— In Forbes, Suparna Goswami explains how an Indian startup uses machine learning for smarter hiring.

Companies

— Also in Forbes, Aaron Tilley chronicles NVIDIA’s transformation from a maker of gaming chips to a maker of AI chips.

— Jermy Hsu profiles Maluuba, a startup that uses deep learning to understand speech.

— Connor Forrest reports on five upstarts that are “leading the AI and machine learning revolution”: Uber, Tesla, Salesforce, NVIDIA, and Ayasdi. Wait, what? Ayasdi?  Also, for the record, Salesforce may be buying companies, but it’s not exactly leading the charge in machine learning.

Bottom Story of the Day

— GE CEO Jeff Immelt says he’s ready for Trump.

« Older Entries