Cloudera and Hortonworks just announced a merger.
You ask: how does the merger affect the machine learning marketplace?
My answer: not at all. Neither Cloudera nor Hortonworks competes in machine learning.
Yes, I’m aware. Cloudera and Hortonworks see themselves as part of the machine learning value chain. You can’t do machine learning without data. You need a few other things, too. But you definitely need data.
Take Cloudera’s Komatsu customer success story. IoT, analytics, machine learning, yada yada. Now read the fine print, and see what Komatsu users do. They copy data from Cloudera to Matlab, Python, and R, and run analysis there.
That’s fine. It works for them. You might even make an argument that Komatsu’s Matlab, Python, and R users have better access to data today. Maybe they do, maybe not. If they do, that’s an argument for data marts and self-service data access, not for machine learning.
I guess data marts and self-service data access aren’t cool.
For data platform vendors, talk about machine learning is demand generation. When you make and sell dog food, you encourage people to adopt dogs. “Hey, everyone! Look at this adorable cocker spaniel. Everyone’s getting a dog! Wouldn’t you like to have a sweet pooch?” That doesn’t mean you have cute puppies available, or that you offer an adoption service. You just figure if more people adopt dogs, they may buy your dog food.
That’s what the machine learning talk is about. ClouderaWorks execs want you to do more machine learning because they figure you’ll use the data sitting in your cluster.
After the merger, ClouderaWorks execs will talk about machine learning at least as much as they do today. But they won’t actually sell machine learning. Well, perhaps a little machine learning. For example, both vendors include Apache Spark in their distributions, and Spark has a machine learning module. Not a very good one, but it’s something.
Of course, if you’re a data scientist and you like working with Spark, you don’t need the rest of the bits that come with ClouderaWorks. Just use a free-standing Spark service, like Databricks.
Cloudera offers Cloudera Data Science Workbench, a containerized gateway for data scientists. CDSW is a nice product, and it works well. Unfortunately, it only works with Cloudera, so it doesn’t compete seriously with other machine learning platforms. ClouderaWorks will integrate CDSW with HDP and the “unity release” (when it’s available) but it still won’t compete seriously for the same reason. Data scientists have to work with data from all sources, including external databases.
Hortonworks lacks a machine learning story. The company pushed Apache Zeppelin for a while. Unfortunately, nobody uses Apache Zeppelin. Even its original contributors have moved on. I’d insert a joke about the Hindenberg here, but that zeppelin got off the ground, at least, and made it across the Atlantic a few times before it went up in flames.
Now Hortonworks partners with IBM to position IBM Data Science Experience. Insert IBM joke here.
A few other comments on the merger.
— It’s not a “merger of equals.” Cloudera will buy Hortonworks. Cloudera shareholders will own 60% of the combined entity. Cloudera CEO Tom Reilly will run the show, and keep a few Hortonworks execs for window dressing.
— The first thing to go overboard will be Hortonworks’ “pure open source” business model. That dog just won’t hunt.
— Tom Reilly says the new company will retain a pure open source distribution. I bet it will. Just like Cloudera’s free version of CDH, buried four levels deep on the website.
— Cloudera and Hortonworks products are identical to everyone not employed by Cloudera and Hortonworks. ClouderaWorks will not need two sales forces, two marketing organizations, and two finance departments. It won’t need two MapReduce teams. The press release touts $160 million in cost savings. Those savings won’t come from pencils and duplicate magazine subscriptions. Do the math.
Recruiters, take note.
CEO Tom Reilly says that “people expect us to be the next Oracle.” I’m not sure what people he has in mind. ClouderaWorks revenue will be just north of $700 million, which means it will rank #6 in data warehousing, behind Oracle, IBM, Microsoft, Teradata, and SAP.
I can hear the cheers at the first company meeting: “We’re number six! We’re number six!”