It’s been a tumultuous past few weeks for big data vendors. First MapR is having problems (their update). Now, Cloudera is having problems. As of today, Cloudera closed at $5.21 (June 6, 2019). To put that in perspective, at its last valuation, Confluent was valued at almost twice what Cloudera is worth now. Put another way, Cloudera is trading at a 2x multiple of yearly earnings. A 2x multiple is really low for a technology company. For the first time, we face the real prospect of going from 3 Hadoop distributions to 0. I’m worried that Cloudera will get bought by a private equity (PE) firm and sold off in pieces. One of those pieces CDH/CDP/HDP isn’t directly profitable and I’m wondering if the PE firm will really understand that. If they don’t, CDH/CDP/HDP won’t receive the love it received before. Everyone freeloading on the big data train will wonder where all their free updates and major features went. I’ve spent the last few days thinking about what all of this means. I’ve spent a longer amount of time thinking about Cloudera’s problems. I’ve spent an even longer amount of time thinking about the future of big data. I’m still convinced that you can only do big data with big data tools and it is possible for companies to get massive value from their data. But…
Low Value
We have a big problem in big data. We generate woefully low amounts of value relative to the amount spent. It’s really a dirty little secret we’ve had for a while. Myself and others are worried that companies with big data will wake up, calculate out their costs, calculate the value created, see the low value, and start cutting people and big data projects. I don’t want to come across as all doom and gloom because there are companies with big data problems creating massive value with their data using Hadoop, Spark, cloud, etc, but they had to put the effort in first. There will be a bloodbath of sorts when the big data party stops. Everyone will say that big data was a fad and didn’t really do anything useful. We all move on. I’ve been telling and helping my clients achieve the value that big data can bring. Run the right way, big data can create significant value. It’s just far more difficult to achieve that value than Cloudera and other vendors told you. There was never an easy button that just magically made all of this open source easier or work better together. This notion of easy is what I spent much of my time combating – when it comes to distributed systems, don’t believe your vendor when they tell you things are easy. This lack of easiness meant that Cloudera couldn’t do everything itself and had to create a partner ecosystem – which they tried to do. To achieve value from big data, it wasn’t just choosing a technology or vendor or bringing all of your data together in one place. Yet, this is what companies focused on – they focused on this because their vendor told them Hadoop, Kafka, or Spark or whatever would just solve the problem. The company would buy the product and still achieve the low value. The promise of value never really was achieved.Cloudera didn’t tell its customers the whole story. To achieve value, the customer would need to hire competent people, train them, and continue to help them. The customer’s management team would actually have to change and fix how they managed their data teams. In Silicon Valley this level of hand-holding doesn’t scale. Silicon Valley wants subscription businesses that are software-only, have high stickiness, and don’t require any consulting.
Before the IPO, Cloudera started to push this way. It started to reduce and eliminate the consulting to focus on subscriptions. But that didn’t make its customers successful. Unsuccessful customers don’t blame themselves – they blame the technology vendor and choose a new one. Cloudera loses another customer to the cloud or another company promising how easy it is to work with their technology.
What Really Is Hadoop?
I was reading and participating in this Twitter thread where they’re talking about MongoDB and Cloudera. From the replies it was super interesting: people still don’t know what Hadoop is. Maybe that’s really the bigger problem. Understanding Hadoop and its ecosystem is a big undertaking. Finding and solving a true big data problem is another undertaking. Put simply big data – and not just Hadoop – is long chain of big undertakings that most companies don’t realize they’ll have to do. With MongoDB, you could understand it mostly and quickly. It is a database. With Hadoop, it was lots of different things – more correctly, it was a large ecosystem of things. It takes quite a while to understand it fully and in-depth. This is something I’ve been thinking about lately. People like simple things that are easy to understand. It doesn’t matter if people are: using the right technology, the technology is well-architected, or a better fit for the use case overall. People want easy and they want it now. They want to install easily and start using it. Once things get complicated, they double down on those simple technologies whether they’re right or not. This leads to terrible workarounds and duct tape architectures. But the technology behind it is still simple! The corollary to this is when an organization doesn’t have the time to really understand the technologies and their tradeoffs. They were looking for something easy and didn’t really want or have the ability to go deeper. These projects that flounder and go nowhere – while the team blames the technology itself. If they do get into production, the terrible implementation, poor coding, and bad business decisions come up as production outages and projects that can’t be maintained or improved. Does this mean Hadoop is dead and was purely hype? Were people right that Hadoop could only index the web it like Google created it to do? Obviously, not. But in peoples’ minds this is what Hadoop is limited to and therefore hype. You’d be judicious in adopting Hadoop MapReduce – for example – but the entire big data ecosystem isn’t dead.The Hortonworks Merger
I don’t believe it’s been talked about enough, but the Hortonworks merger brought productivity to a standstill. I’ve talked to several people who are still there about how smooth the merger was. It wasn’t smooth at all. When it was first announced I surmised it would be a really difficult merger. I was right and many people left before or shortly after. Publicly, Cloudera said how much overlap there was between the two companies and that they’re working well together. During the earnings call, Tom Reilly said: However, our rapid execution on the Cloudera platform caused customers to wait until release to renew and expand their agreements.
That struck me as interesting and I think it shows real problems of the merger. The merger made it difficult for customers to see the value of their contracts. Cloudera’s output really stopped and customers could see that. Why should they continue to pay for something we may not get? Customers will give you some slack, but when you’re writing million dollar checks, that patience wears and they want results. In my company, I have advised my clients to not believe the roadmaps they were given or make plans based on the roadmap.