There Are Several Hard Problems with Big Data

There’s a common misconception that says if I just change one thing in Big Data, everything else will be easier. The answer is that there are several different hard problems in Big Data. Changing one problem doesn’t solve the other problems.

Sometimes, I’ll see tweets or posts about how companies or vendors haven’t made Big Data easy. It makes the assumption that everything about Hadoop can be made simple. Also, it continues the assumption that there’s only one hard problem to solve.

Big Data is complex. In chapter 2 “The Need for Data Engineering” in Data Engineering Teams, I show how Big Data is 10-15x more complex than small data.

The three main problems for Big Data are: operations, development, and management.

Management

Setting up the team team correctly is crucial to the success of the project. I make that point over 73 pages in Data Engineering Teams.

In the scope of making this easier, there isn’t much that can be done. I’ve written the book giving the steps. If you still need help, we provide mentoring services for management and teams.

Problems in management tend to materialize early on. These problems are the culprits behind the early failures of Big Data projects. These projects just never go anywhere because they have the wrong people on the team.

Operations

Operational problems can be the easiest to reduce in complexity. You can move entirely to the cloud and remove the majority of operational overhead. You can use purpose-built software like Cloudera Manager or Apache Ambari. These allow you to have fewer people monitor and maintain a cluster, but don’t remove the need for operations people.

Operations problems tend to manifest after the first few months of the project.

Development

Development projects are the most difficult to reduce in complexity. Many people think that the move from Apache Hadoop to Apache Spark will reduce complexity. It doesn’t.

Others think that the stems from Hadoop or Spark being immature; it comes from them being general purpose systems.

Development problems tend to manifest throughout the project. A data pipeline is constantly being updated and added to. If the development team isn’t ready, these updates will take forever or the team will say they aren’t possible.

I stress the need for qualified Data Engineers. Without proper training and resources, data engineering projects never finish.

What to Do?

Some problems can be lessened and others require smart people. Don’t fall into the misconception that these problems can be magically made easy. In Big Data, an ounce of prevention is worth a ton of cure.

There Are Several Hard Problems with Big Data

Management

Operations

Development

Want to become a Data Engineer but can't find in-depth materials?

You have Successfully Subscribed!

What to Do?

Get your free copy of Data Engineering Teams: Creating Successful Big Data Teams and Products

Data Engineering Teams Book

Would you like to know what I teach successful organizations to do?

Mentoring

We’re here to help make the process more successful and the outcome more effective.

Architecture Reviews

The right tool for the job saves countless hours, time, money. Are you using the right tool for the job?

Project Acceleration

Why do so few companies create enormous value from Big Data while most fail?

Company

Resources

Resources

Stay updated with the latest.

Have a question?

Send us a message

or give us a call at +1 775.393.9122

© 2026 Big Data Institute

Privacy

© 2026 Big Data Institute

Privacy

Have a question?

Send us a message

or give us a call at +1 775.393.9122