Is AI Failing?

Nobody believes that every AI project succeeds. Just ask MD Anderson. Anderson blew $60 million on a Watson project before pulling the plug.

That project was a clown show. A report published by University of Texas auditors found that project leadership:

  • Did not use proper contracting and procurement procedures
  • Failed to follow IT Governance processes for project approval
  • Did not effectively monitor vendor contract delivery
  • Overspent pledged donor funds by $12 million

IT personnel working on the project hesitated to report exceptions because the project leader’s husband was MD Anderson’s President. Project scope grew like kudzu. MD Anderson executed 15 contracts and amendments in a series of incremental expansions. The budget for many of these was just below the threshold for Board approval, which suggests deliberate structuring to avoid scrutiny.

Interestingly, the massive expansion in project scope coincided with a $50 million pledge from “billionaire party boy” Low Taek Jho. (Jho recently cut a deal with the US government to avoid prosecution on charges related to the 1MDB scandal.)

So it’s not news that some AI projects fail. 

Last week, Fast Company published this piece with the clickbait title of Why AI is Failing Business. The authors, an economist and the two co-founders of a tiny startup, want you to believe that failure is the norm for AI projects. 

The article exemplifies a genre I call Everyone is Stupid Except Us. Practitioners of this approach paint a dire picture of current practices. The implicit message is that they have a magic bean that will set things straight. 

Citing an IDC report, the authors write that “most organizations reported failures among their AI projects, with a quarter of them reporting up to a 50% failure rate.”

Wow. Fifty fucking percent.

That number sounds fishy, so I pulled the report and checked with the author. Here’s the pertinent page:

The first part of the authors’ claim is correct. About 92% of the organizations surveyed by IDC reported one or more AI project failures.

The rest is misconstrued. About 2% of respondents reported failure rates as high as 50%. 21% reported a failure rate of more than 30%.

Most respondents report a failure rate below 30%.

In an ideal world, no AI project would fail. But put that failure rate in context. According to a report from the Project Management Institute, only about 70% of all projects completed in 2017 met original goals and business intent.

In other words, AI projects are no more or less likely to fail than any other IT project.

The authors of the Fast Company piece bloviate for another 11 paragraphs about why AI projects fail. They could have just shifted their eyeballs to the right on the page they misquote, where IDC tabulates the reasons for AI project failure. The top five cited by respondents are, in descending order:

  1. AI technology didn’t perform as expected or as promised
  2. Lacked staff with the necessary expertise
  3. Unrealistic expectations
  4. The business case wasn’t well enough understood
  5. Lack of follow-up from the business units

That first reason needs unpacking. Projects rarely fail because technology does not do what it is supposed to do. Projects fail because the buyer wants something the technology isn’t designed to deliver, or the organization cuts corners on implementation. In most cases, the customer and vendor share responsibility for that failure. The vendor may make misleading or exaggerated claims, the customer may fail to define requirements, or the customer may not perform the necessary due diligence.

It’s easier to blame the technology, though.

AI projects are the same as ERP projects or any other IT project. They succeed or fail based on the organization’s project management processes.

Next time you’re at a trade show when some AI vendor starts braying about their magic bean, do yourself a favor. Move on to the next booth.

How to Write Good

Break rules. That is the first principle of good writing. Conventional style and predictable prose will bore your audience. There is no greater sin.

You think I don’t know the difference between good and well, and this blog will be a train wreck. Or you think the headline is a joke, and this blog will be fun to read. Either way, you’re reading this blog and not something else. Which proves my point.

In a different medium, Beethoven understood the principle. His Eroica Symphony begins with a simple tune in the key of E-flat. It’s the sort of tune that, in the hands of Beethoven’s contemporaries, such as Bocklet, GänsbacherHüttenbrenner, or Schenk, would remain firmly in the key of E-flat. That’s the rule. You begin in one key, you stay in that key. At least until you prepare a modulation and introduce a new tune in B-flat.

Seven bars in, however, Beethoven breaks the rule. The music veers into…something strange. Definitely not in the key of E-flat:

German musicologists try to explain this gaffe. “It’s an unprepared modulation!” “It’s a chromatic passing tone!”

I’d insert a joke about German musicologists here, but I don’t want to offend my friends at KNIME and RapidMiner.

There is a simpler explanation. Beethoven broke the rules

He broke them deliberately. Imagine the surprised faces when Beethoven premiered the work at the Palais Lobkowitz in 1804. Vienna’s petty aristocrats did not like revolutionary thought, flies in the strudel, or wrong notes in symphonies. They preferred the music of Bocklet, Gänsbacher, Hüttenbrenner, or Schenk, composers who eschewed wrong notes.

You never heard of Bocklet, Gänsbacher, Hüttenbrenner, or Schenk? I rest my case.

By the way, if “Beethoven” evokes nothing other than a large St. Bernard dog, you need to get out more.

Beethoven also demonstrates the second principle of good writing: break rules sparingly.

If you break rules too often, people assume that you don’t know the rules. Or they figure you’re a loon. Most of Beethoven’s work conforms to classical style; his “wrong” notes stand out. Anton von Webern, on the other hand, wrote nothing but wrong notes, which is why you’ve never heard of Anton von Webern.

The third principle: use a &^%$# grammar checker. People send me writing samples, blog posts, press releases, white papers, and so forth. I drop the text into Grammarly and oops. Overused words, passive voice, unclear antecedents, you name it.

This is what happens to those writing samples.

What to do with bad writing.

When a writing sample fails the Grammarly test, it means the author is too lazy to check their work.

It reminds me of the story about the executive who went to a Mercedes dealer to check out a new S560. (Stop me if you’ve heard this before.) The exec admires the car in the showroom and chats with the sales rep.

“Can I take it for a test drive?” she asks.

“Certainly!” says the rep. “Wait here, I’ll bring one around.”

After a few minutes, the rep pulls up out front in a Mercedes S560 that is completely covered with bird shit.

The executive recoils. “This car is filthy!”

The rep shrugs. “It’s just bird shit. Isn’t this a beautiful car?

Don’t expect me to appreciate your ideas if your text is covered with bird shit grammar and style issues.

For the record, Grammarly does not pay me to shill for them. But they should.

The next principle: omit needless words. Yeah, I know. It’s not original. Strunk and White #13:

Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that he avoid all detail and treat his subjects only in outline, but that every word tell.

That paragraph is a thing of beauty.

So what? you say. Strunk wrote that a hundred years ago. Don’t you have any new suggestions?

If Strunk and White #13 is obvious, why do we see so much bloated and tumescent business prose?

Think of words as a tax on the reader’s brain. Every syllable needs a neuron. More syllables –> more neurons –> more work for the reader. Your reader has other things to do, like trolling people or binging on Netflix. Tax your reader’s brain too much and you lose them.

You know who really knew how to omit needless words? The Spartans.

According to Herodotus (The Histories, Book 3, Section 46), a delegation from Samos traveled to Sparta to seek food. There, before the magistrates, they delivered a long and passionate plea for help.

The magistrates turned them down flat. “We can no longer remember the first half of your speech, and thus can make nothing of the remainder.”

Regrouping, the Samians secured another audience. This time, they made a shorter speech.

Again, the magistrates dismissed them. “Too many words.”

Desperate, the Samians returned for a final audience. This time, instead of a speech, they held out an empty bag with a sign: this bag wants bread.

The Spartans agreed to help the Samians but admonished them: they could have omitted the words ‘this bag’.

Learn to write like a fucking Spartan.

That bit about the Spartans demonstrates the next principle: tell stories.

If you lack confidence in your story-telling abilities, cheer up: in business, there is only one story. It goes like this.

  1. There is a shining city on the hill, where everything is perfect. Let’s go there.
  2. Oops, there’s a dragon under the bridge who eats people.
  3. Wouldn’t it be great if there was something that could kill dragons?
  4. Fortunately, (product) kills dragons.
  5. Here’s proof.
  6. You can kill dragons and go to the shining city. All you need is (product). Here’s how to learn more.

Forget outlining. No battle plan ever survived the first shot, said Napoleon. I say: no outline ever survived the first sentence. Just keep your story in mind while you write. Works every time.

The penultimate(*) principle: revise, revise, revise, revise. You want to write good? You have to revise and rewrite, rewrite and revise. Until you have something good enough to publish.

I’ve revised this post 25 times. It’s brilliant, right? Greatest blog post ever. But if I look at it again tomorrow, I’ll find something else to revise. It’s like detailing your car. There’s always one more spot that needs a buff.

What’s that? You have no time to revise because you’re on deadline?

Fuck your deadline. There are very few real emergencies in business. In Six-Sigma factories, anyone who spots a defect can stop the assembly line. More often than not, the “deadline” comes from some asshat who wants to juke the monthly eyeballs and needs you to create “content” on the spot. Plan ahead, and you will have time to revise all you want.

Good writing tomorrow is better than shitty writing today. If anyone tells you otherwise, find another platform.

(*) look it up, dummy.

The last principle of good writing: close with a bang. Don’t write like Wagner. Wagner dragged everything out.

You’ve heard the expression: it ain’t over ’til the fat lady sings. It’s not true. In Wagner’s Der Ring Des Nibelungen, the fat lady is Brünnhilde. When she starts singing in Act II of Die Walküre, she doesn’t stop for sixteen hours. Except for a few short breaks, like when Wotan puts her to sleep and surrounds her with a ring of fire.

Even when she’s done singing, it’s not over. There’s another last gasp of Wagnerian mush while the Rhine overflows, Valhalla burns, Hagen tries to grab the Ring, the RhineMaidens stop him, they grab the Ring, Hagen drowns, and Brünnhilde burns to a crisp. You waited ten years and paid a couple grand for the lamest seats in Bayreuth. You’re not going to run for the parking lot as soon as the fat lady stops singing. You’re going to wait to see the whole sorry mess collapse.

Stravinsky, on the other hand, knew how to close.

In the final section of Le Sacre du Printemps, after an ecstatic dance, the Chosen One collapses, dead. The orchestra delivers an enormous splat. Now that is an ending.

How GDPR Affects Data Science

Adapted from a post originally published on the Cloudera VISION Blog.

If your organization collects data about citizens of the European Union (EU), you probably already know about the General Data Protection Regulation (GDPR). GDPR defines and strengthens data protection for consumers and harmonizes data security rules within the EU. The European Parliament approved the measure on April 27, 2016. It goes into effect in less than a year, on May 25, 2018.

Much of the commentary about GDPR focuses on how the new rules affect the collection and management of personally identifiable information (PII) about consumers. However, GDPR will also change how organizations practice data science. That is the subject of this blog post.

One caveat before we begin. GDPR is complicated. In some areas, GDPR defines high-level outcomes, but delegates detailed compliance rules to a new entity, the European Data Protection Board. GDPR regulations intersect with many national laws and regulations; organizations that conduct business in the United Kingdom must also assess the unknown impacts of Brexit. Organizations subject to GDPR should engage expert management and legal counsel to assist in developing a compliance plan.  

GDPR and Data Science

GDPR affects data science practice in three areas. First, GDPR imposes limits on data processing and consumer profiling. Second, for organizations that use automated decision-making, GDPR creates a “right to an explanation” for consumers. Third, GDPR holds firms accountable for bias and discrimination in automated decisions.  

Data processing and profiling. GDPR imposes controls on data processing and consumer profiling; these rules supplement the requirements for data collection and management. GDPR defines profiling as:

Any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular, to analyse or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements.

In general, organizations may process personal data when they can demonstrate a legitimate business purpose (such as a customer or employment relationship) that does not conflict with the consumer’s rights and freedoms. Organizations must inform consumers about profiling and its consequences, and provide them with the opportunity to opt out.

The Right to an Explanation. GDPR grants consumers the right “not to be subject to a decision…which is based solely on automated processing and which provides legal effects (on the subject).”  Experts characterize this rule as a “right to an explanation.”  GDPR does not precisely define the scope of decisions covered by this section. The United Kingdom’s Information Commissioner’s Office (ICO) says that the right is “very likely” to apply to credit applications, recruitment, and insurance decisions. Other agencies, law courts or the European Data Protection Board may define the scope differently.

Bias and Discrimination. When organizations use automated decision-making, they must prevent discriminatory effects based on racial or ethnic origin, political opinion, religion or beliefs, trade union membership, genetic or health status or sexual orientation, or that result in measures having such an effect. Moreover, they may not use specific categories of personal data in automated decisions except under defined circumstances.

How GDPR Affects Data Science Practice

How will the new rules affect the way data science teams do their work? Let’s examine the impact in three key areas.

Data Processing and Profiling. The new rules allow organizations to process personal data for specific business purposes, fulfill contractual commitments, and comply with national laws. A credit card issuer may process personal data to determine a cardholder’s available credit; a bank may screen transactions for money laundering as directed by regulators. Consumers may not opt out of processing and profiling performed under these “safe harbors.”

However, organizations may not use personal data for a purpose other than the original intent without securing additional permission from the consumer. This requirement could limit the amount of data available for exploratory data science.

GDPR’s constraints on data processing and profiling apply only to data that identifies an individual consumer.

The principles of data protection should therefore not apply to … personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.

The clear implication is that organizations subject to GDPR must build robust anonymization into data engineering and data science processes.

Explainable Decisions. There is some controversy about the impact of this provision. Some cheer it; others disapprove; still others deny that GDPR creates such a right. One expert in EU law argues that the requirement may force data scientists to stop using opaque techniques (such as deep learning), which can be hard to explain and interpret.

There is no question that GDPR will affect how organizations handle certain decisions. The impact on data scientists, however, may be exaggerated:

— The “right to an explanation” is limited in scope. As noted above, one regulator interprets the law to cover credit applications, recruitment, and insurance decisions. Other regulators or law courts may interpret the rules differently, but it’s clear that the right applies in specific settings. It does not apply to every automated decision.

— In many jurisdictions, a “right to an explanation” already exists and has existed for years. For example, regulations governing credit decisions in the United Kingdom are similar to those in the United States, where issuers must provide an explanation for adverse credit decisions based on credit bureau information. GDPR expands the scope of these rules, but tools for compliance are commercially available today.

— Most businesses that decline some customer requests understand that adverse decisions should be explained to customers. This is already common practice in the lending and insurance industries. Smart businesses treat adverse decisions as an opportunity to position an alternate product.

— The need to deliver an explanation affects decision engines but need not influence the choice of methods for model training. Techniques available today make it possible to “reverse-engineer” interpretable explanations for model scores even if the data scientist uses an opaque method to train the model.

Nevertheless, there are good reasons for data scientists to consider using interpretable techniques. Financial services giant Capital One considers them to be a potent weapon against hidden bias (discussed below.) But one should not conclude that GDPR will force data scientists to limit the techniques they use to train predictive models.

Bias and Discrimination. GDPR requires that organizations must avoid discriminatory effects in automated decisions. This rule places an extra burden of due diligence on data scientists who build predictive models, and on the procedures organizations use to approve predictive models for production.

Organizations that use automated decision-making must:

  • Ensure fair and transparent processing
  • Use appropriate mathematical and statistical procedures
  • Establish measures to ensure the accuracy of subject data employed in decisions

GDPR expressly prohibits the use of personal characteristics (such as age, race, ethnicity, and other enumerated classes) in automated decisions. However, it is not sufficient to just avoid using this data. The mandate against discriminatory outcomes means data scientists must also take steps to prevent indirect bias from proxy variables, multicollinearity or other causes. For example, an automated decision that uses a seemingly neutral characteristic, such as a consumer’s residential neighborhood, may inadvertently discriminate against ethnic minorities.

Data scientists must also take affirmative steps to confirm that the data they use when they develop predictive models is accurate; “garbage in/garbage out,” or GIGO, is not a defense. They must also consider whether biased training data on past outcomes can bias models. As a result, data scientists will need to concern themselves with data lineage, to trace the flow of data through all processing steps from source to target. GDPR will also drive greater concern for reproducibility, or the ability to accurately replicate a predictive modeling project.

Your Next Steps

If you do business in the European Union, now is the time to start planning for GDPR. There is much to be done: evaluating the data you collect, implementing compliance procedures, assessing your processing operations and so forth. If you are currently using machine learning for profiling and automated decisions, there are four things you need to do now.

Limit access to personally identifiable information (PII) about consumers.

Implement robust anonymization, so that by default analytic users cannot access PII. Define an exception process that permits access to PII in exceptional cases under proper security.  

Identify predictive models that currently use PII.

In each case, ask:

  • Is this data analytically necessary?
  • Does the PII provide unique and irreplaceable information value?
  • Does the predictive model support a permitted use case?

Inventory consumer-facing automated decisions.

  • Identify decisions that require explanations.
  • Implement procedures to handle consumer questions and concerns.

Establish a data science process that minimizes the risk of errors and bias.

  • Implement a workflow that ensures proper model development and testing.
  • Consider the possibility of bias “built in” to training data.
  • Rigorously test and validate predictive models.
  • Implement peer review for an independent assessment of every model.

Even if your organization is not subject to GDPR, consider implementing these practices anyway. It’s the right way to do business.