Tag Archives: Apache Tez

Big Analytics Roundup (October 19, 2015)

Ten stories this week.  Don’t miss story #10, which recaps an analysis of collaboration and influence in the U.S.Congress using open source graph engines and a rich database of legislation. (1) Rexer: R Continues to Lead Rexer Analytics has released preliminary results from its 2015 survey of working analysts; Bob Muenchin reports.  One interesting snippet — reported tool use, as

Read more

Big Analytics Roundup (June 29, 2015)

The Sparkalanche continues; plus we have new releases from Flink and H2O.  And, in case you thought Spark was the last word in Big Analytics, well, think again: here comes Splash, from AMPLab. In the Wall Street Journal’s Saturday Essay, Sean Parker calls for philanthropists to focus on “hackable problems,” a message that should resonate with data scientists.  (Link may require registration.) On

Read more

Apache Spark for Big Analytics (Updated for Spark Summit and Release 1.0.1)

Updated and bumped July 10, 2014. For a powerpoint version on Slideshare, go here. Introduction Apache Spark is an open source distributed computing framework for advanced analytics in Hadoop.  Originally developed as a research project at UC Berkeley’s AMPLab, the project achieved incubator status in Apache in June 2013 and top-level status in February 2014.  According to one analyst, Apache Spark is among the five

Read more