Big Analytics Roundup (June 13, 2016)

Spark Summit 2016 met last week in SFO. There were many cool things; I will publish a separate report when presentations and videos are available.

KDnuggets releases results of its annual poll on data science software. Key findings:

  • Python use is up 51%, almost catches up to R, the #1 choice.
  • Excel and Tableau usage are up 47% and 49%, respectively.
  • Spark usage is up 91%, overtakes Hadoop.
  • SAS is down big time, drops from the top ten.

Meanwhile, Alex Woodie wraps statistics on Spark adoption, and Qubole’s Ari Amster reports on Spark usage among Qubole users.

Tim Spann recaps the week in Hadoop.

Spark Summit: Roundup of Roundups

— On the Databricks blog, Wayne Chan, Dave Wang, Jules Damji and Denny Lee recap highlights from the Summit.

— Jessica Davis rounds up the highlights.

— Jack Vaughan surrounds the story, quotes some old guy.

— Sam Dean summarizes what you need to know.

— Alex Handy collects the key bits.

— Andrew Brust separately corrals Day One and Day Two.

CFPs and Competitions

Flink Forward 2016, Berlin, September 12-14 (due June 30)

Parkinson’s Progression Markers Institute (PPMI) 2016 Challenge (due September 7)

Spark Summit Europe, Brussels, October 25-27 (closing date TBA)

Top Read

Adrian Colyer summarizes a paper on identifying architectural debt in software.

Explainers

— Deenar Torasker explains the new capabilities of HDFS.

— Ron Bodkin explains key considerations when designing continuous apps, in the second of a three part series. Part one is here.

— On his eponymous blog, Jesse Steinweg-Woods explains Gradient Boosted Trees with XGBoost in Python.

— Adam Warski explains how Kafka Streams fits into the stream processing landscape.

Perspectives

— H2O.ai’s Vinod Iyengar objects to what he calls the fragmentation of Spark support, correctly noting that Cloudera and Hortonworks support different versions of Spark in their distributions. Of course, nobody is obligated to use Spark with Cloudera and Hortonworks.

— From the Spark Summit on YouTube: Ben Lorica leads a panel discussion of incredibly smart and distinguished people, plus some old guy.

— Altiscale’s Barbara Lewis presents ten use cases for Big Data.

— Tim Wallis believes that AI will relieve boredom.

— Sam Dean touts Grappa, Drill and Kafka as successors to Spark. Grappa is going nowhere. Drill is great if all you want to do is SQL, and Kafka is great if all you want to do is streaming. Pro tip: there are no real-world analytic applications where all you want to do is streaming.

— Allen Downey opines that statistical tests are inflexible and opaque. Funny, my college roommate said the same thing when he flunked his Stat 101 mid-term.

Open Source Announcements

— LinkedIn announces release of PhotonML, a machine learning library for Spark. Feature detail here.

— Google releases TensorFlow 0.9.0, with iOS support. Speculation about deep learning on your phone ensues.

— Twitter donates DistributedLog to Apache.

Commercial Announcements

— Databricks announces general availability for the Databricks Community Edition, and completion of the first phase of Databricks Enterprise Security framework.

— Microsoft announces general availability for its managed Spark service in HDInsight, and summer availability for the Spark pushdown capability in R Server. The company also announced PowerBI support for Spark Streaming, which is confusing for those who thought PowerBI already supported Spark Streaming.

— IBM announces limited preview of a managed service branded as the Data Science Experience. IBM is coy about the details; the service definitely includes Spark, Jupyter and RStudio, H2O and “curated data sets”, and may include other bits. The service itself looks promising, but IBM’s claim to offer the “first development environment for Apache Spark” is BS.

— In an oddly opaque press release, H2O announces that it is “working with” IBM. H2O is open source software, and IBM requires no permission from H2O.ai for use or distribution; presumably, H2O will offer support contracts to users. H2O.ai did not respond to request for comment.

— Splice Machine announces plans to go open source; a company insider says they plan to donate the software to Apache. Dave Ramel reports.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s