Tag Archives: Cloudera Impala

Big Analytics Roundup (November 23, 2015)

Eleven stories this week, including a new Flink release, new developments for Splice Machine, and a very big Spark HPC cluster in Warsaw. InfoWorld publishes a well-written practical guide to Deep Learning. Here are a couple of interesting articles on Spark: MapR’s Jim Scott offers a nice overview of Spark RDDs. Ian Pointer summarizes five things he hates about Spark.

Read more

Big Analytics Roundup (September 28, 2015)

Strata+Hadoop World NYC is upon us.  Andrew Brust opines that there will be three themes at Strata this year: (1) Spark “versus” Hadoop; (2) streaming goes mainstream; (3) data governance matters.  My take: “Spark versus Hadoop” is controversy for the sake of people who like controversy.  Spark works with Hadoop, and Spark works with other platforms, or by itself.  Use

Read more

Big Analytics Roundup (May 11, 2015)

Lots of news this week, to compensate for last week’s lame haul. In an excellent post on O’Reilly Radar, Ben Lorica surveys the landscape of workbooks, notebooks and workflow tools, which he categorizes by user persona. On GitHub, a collection of links for streaming analytics (h/t O’Reilly Data). In a “twofer”, VentureBeat plugs its GrowthBeat Summit and a report on

Read more

Apache Spark for Big Analytics (Updated for Spark Summit and Release 1.0.1)

Updated and bumped July 10, 2014. For a powerpoint version on Slideshare, go here. Introduction Apache Spark is an open source distributed computing framework for advanced analytics in Hadoop.  Originally developed as a research project at UC Berkeley’s AMPLab, the project achieved incubator status in Apache in June 2013 and top-level status in February 2014.  According to one analyst, Apache Spark is among the five

Read more