2016 Big Analytics Predictions Roundup

Before publishing my own predictions for 2016 later this week, I thought it would be fun to round up published predictions on analytics and Big Data.  Looking through this list, I see a few patterns:

— Streaming is hot.  Analysts do not seem to understand distinctions between streaming data, streaming analytics and real-time decisioning.

— “Data Science” continues to be a term that means whatever you like.

— Security and anti-fraud analytics will be a thing in 2016.  (They were also a thing in 2015.)

— Industry analysts are divided about whether or not the analytics talent crunch will persist.

— IoT is a great concept for selling data management tools, but few know how to make sense of it.

On ZDNet, Andrew Brust summarizes 60 predictions from 17 executives and sees the following:

  1. Increased adoption of streaming analytics
  2. Maturation of IoT technologies
  3. Value and maturity in Big Data products
  4. Increased deployment of artificial intelligence and machine learning

On KDnuggets, Gregory Piatetsky reports on five predictions for 2016 from Tom Davenport of the International Institute of Analytics.  (Webinar replay here.)

  1. Cognitive technology will be the next thing after automated analytics.
  2. Analytical microservices will facilitate embedded analytics.
  3. Data Science and predictive analytics will merge.
  4. The analytics talent crunch will ease due to increased enrollment in graduate programs.
  5. Analytics will focus on data curation and management.

Davenport is smoking something if he thinks cognitive computing will be a thing in 2016.

In Forbes, Gil Press synthesizes the IIA’s predictions (above) with predictions from Forrester, IDC and Gartner to get six predictions:

  1. Analytics will be embedded everywhere.
  2. Machine learning will replace manual data wrangling.
  3. The shortage of analytics talent will persist.
  4. Analytics projects will be riskier than typical IT projects.
  5. Cognitive computing will be the next buzzword.  (Press clearly does not agree with Davenport).
  6. Data monetization will take off.

Predictions (2) and (3) conflict with one another; since analysts spend 80% of their time data wrangling, tooling that automates this step will relieve the talent shortage.

On Datanami, Alex Woodie wades through “dozens” of predictions and publishes the 33 most interesting.  Many of these are self-serving, obvious or nonsensical, so I will do the work Woodie’s editor did not do and distill the list to five:

  1. Streaming analytics will mature and prove its worth.
  2. Apache Kafka will be an essential integration point in enterprise infrastructure.
  3. Business user access to Hadoop data will improve.
  4. Spark will significantly displace MapReduce for Hadoop workloads.
  5. Spark processing outside of Hadoop will also increase significantly.

Teryn O’Brien of Silicon Angle reports on a webinar hosted by Alteryx that included Bob Laurent of Alteryx, Clarke Patterson of Cloudera and Francois Ajenstat of Tableau.  The panel offered three predictions:

  1. Analyst jobs will be hot and analysts will be everyday heroes.
  2. Spark, the cloud and IoT will be big in 2016.
  3. Advanced analytics will play a key role in the Presidential election.

On ITPortal, Dell’s Todd O’Brien predicts three things for 2016:

  1. The role of Citizen Data Scientists will expand and evolve.  (Me: WTF?)
  2. Analytics will significantly affect vertical markets, especially manufacturing.
  3. All innovation will trace back to analytics

On the first point, I think that O’Brien is trying to say that companies should buy analytics software that is easy to use, like what Dell offers.

On the FICO blog, FICO’s chief analytics officer Scott Zoldi offers five predictions for 2016:

  1. Streaming analytics will come of age in 2016.
  2. “Prescriptive analytics” (his term for anomaly detection) will be a must-have security technology.
  3. “Lifestyle analytics” (predictions embedded in consumer interactions) will integrate prescriptive analytics into daily life.
  4. Businesses will rethink Big Data governance.
  5. Fake data scientists will emerge.

On a SAS blog, Polly Mitchell-Guthrie predicts five things:

  1. Machine learning (will be) established in the enterprise.
  2. IOT hype hits reality.
  3. Big Data moves beyond hype.
  4. Analytics improve cybersecurity.
  5. Analytics drives increased industry-academic interaction.

It’s standard practice at SAS to call any new IT trend “hype.”

In a press release, the health analytics vendor SCIO Health Analytics makes four predictions for 2016:

  1. Greater focus on educating health consumers.
  2. Demand for more precision in health analytics.
  3. More time will be spent on reimbursement strategies.
  4. The need for data and transparency across domains will increase.

Prediction #1 may be true, but it’s not really about health analytics.

On the Talend blog, CMO Ashley Stirrup predicts four things:

  1. Real-time analytics will take center stage
  2. New business threats will emerge
  3. CIO turnover will accelerate
  4. Businesses will retool

#2 and #4 aren’t really predictions, they simply state the obvious.

Big Analytics Roundup (April 6, 2015)

Late posting today due to holiday travel.

In the week following Spark Summit East, a number of Spark skeptics surfaced, a sign that people take Spark seriously.

The top item of the week, though, is Tiernan Ray’s interview with Michael Stonebraker in Barrons, a must-read.

Analytic Software

Forrester published its latest “wave” for Big Data Predictive Analytics Solutions, an inaptly named report that lumps together solutions that can work with Big Data and those that cannot.  I’ll write a more detailed summary later this week.  Quick takes:  Alteryx, Oracle and RapidMiner did well, but Alpine and Microsoft clearly need to shift some of their analyst relations spending from Gartner to Forrester.

Apache Drill

Apache Drill announces Release 0.8.

Apache Spark

Analysis

In opensource.com, Jen Wike Hugar interviews key Spark contributor Reynold Xin.

Mike Vizard, in the aptly named Talkin’ Cloud, describes the high potential for Spark in the cloud.  (Though he does not mention it, more than half of respondents to a recent Typesafe survey of Spark users said they deploy it in the cloud.)

Matei Zaharia, creator of Spark and CTO of Databricks, held an Ask Me Anything last week on Reddit.  Key takeaways: no, Matei is not a musician, and yes, he likes Nutella. 

Spark has clearly reached a point of inflection when skeptical analysis emerges.  Criticism is healthy, of course, but what the skeptics all seem to share is an ignorance of machine learning and streaming applications, and the challenge of making those applications work well in MapReduce.  In other words, they all seem to misunderstand the purpose of Spark, and would do well to learn more about the platform before quibbling on the margins.

  • Professional cat herder Andrew Oliver compares Spark to Tableau and, shockingly, finds it wanting.  Also, Andrew heard people say unflattering things about Hadoop at Spark Summit East.  Who knew that Hadoop devotees are so sensitive?
  • In DataMill, Nicole Leskowski asks if Apache Spark is the next big thing in Big Data Analytics, a question that would have been timely last year.
  • In TechTarget, Jack Vaughan wonders whether Spark is just a shiny new object, while ruminating about Digital Equipment and the PDP-11.  His point will be lost on most readers.
  • Returning to ZDNet from GigaOm, Andrew Brust asks if Spark is overhyped, citing unnamed second-hand sources that tell him Spark is “not ready for prime time.”   Note to Andrew: you can download the software here.

Spark Core

Matei Zaharia celebrates Spark’s fifth birthday with a brief history.

On the Cloudera blog, Sandy Ryza concludes his series on tuning Spark jobs.

Spark Streaming

On the Databricks blog. Cody Koeninger, Davies Liu and Tathagata Das describe the new direct Kakfa API available in Spark 1.3

Databricks

Databricks announced that Timeful, a startup specializing in intelligent time management, has deployed its recommendation engine in Databricks Cloud.  Case study available here.

Hadoop Ecosystem

In Datanami, Hadoop skeptic Alex Woodie asks if Hadoop needs a reality check, observing that the leading Hadoop distributors do not make money, a trait shared by most industries at comparable points of maturity.  Woodie cites Wikibon’s Big Data revenue summary as evidence that there is little money in Hadoop, without considering the validity of Wikibon’s data (which is self-reported by the vendors and lacks consistent definitions).  Even if we accept the Wikibon data at face value, Woodie also fails to note that startup Palantir (which is totally into Hadoop) now reports more Big Data revenue than industry leader SAS.  Another unanswered question: if Hadoop is so inconsequential, why has Teradata lost half its market value since 2012?

IBM

IBM announces BigInsights 4.0 just nine months after releasing BigInsights 3.0.  BigInsights includes the usual Hadoop bits, plus:

  • BigSQL, a federation engine for SQL across relational databases and Hadoop
  • Big Sheets, a Datameer-like spreadsheet-on-Hadoop tool
  • SystemML, a home-grown machine learning library that runs in MapReduce
  • Text analytics capability
  • Big R, an interface that can push embarrassingly parallel R processing into Hadoop

Streaming and Real-Time Processing

On the O’Reilly Radar blog, Ben Lorica describes platforms and applications for processing data streams.