You’re Probably Not a Distributed Systems Engineer

As I’ve worked with software teams, I’ve found some interesting views on distributed systems. Some teams think they’re creators of distributed systems. They usually aren’t.

I think there are three main groups of teams that interact with distributed systems: users of end data products, users of existing distributed system frameworks, and creators of distributed systems frameworks.

These nuances make a big difference in how a team interacts with distributed systems. For example, a team that uses end data products will fail if they try to create their own distributed system. This is one of the more common ways I’ve seen teams fail with Big Data.

Users of End Data Products

Users of end data products are the people who work with already created data pipelines and data products. These teams may be DBAs/SQL-focused or a software engineering team. The difficult parts of the distributed systems creation is done for them. They’re given the data in an already usable form.

Users of Existing Distributed System Frameworks

Users of existing distributed systems frameworks are the people who use open source or other distributed systems to create data pipelines and data products. They’re using existing technologies like Apache Spark, Apache Hadoop, and Apache Kafka.

Creators of Distributed System Frameworks

Creators of distributed system frameworks are the people who create new distributed systems or improve existing distributed systems frameworks. They’re creating everything themselves. These include writing schedulers, resource managers, and harnesses.

Confused Teams

Sometimes teams get confused on their core competencies. An end data product team will think they’re users of distributed system frameworks. A team that uses existing distributed systems frameworks thinks they can create their own distributed system. All of these scenarios will lead to failure.

I’ve written about the increase in complexity when using Big Data. An end product team will experience a 10x increase in complexity when trying to use a Big Data framework. For most teams, this will lead to failure. They’ll need more guidance and mentoring to get through their Big Data journey.

That leads me to somewhat common issue — teams that think they can create their own distributed system. There is all sorts of failure wrapped up in creating your own distributed system. This mostly stems from the fact that you’re probably not a distributed systems engineer. There are very few people with the computer science, system design, and operational understanding to create a distributed system from scratch.

Creating your own distributed system may sound like a good idea initially. We’ll write our own that does exactly what we want. Except:

You will have to spend the time to write it
Debugging and testing a distributed system is tough
There are so many unknown unknowns that only time and usage reveals
The operations team won’t be able to leverage existing knowledge
Any operational issue will be escalated to the development team
The development team will spend their time debugging their distributed system instead of creating new features

Do yourself and your team a favor. Take an honest look at your abilities before going down one of these routes. This will save you all kinds of time, money, and heartache. Using the wrong team for the job is always a bad idea.

You’re Probably Not a Distributed Systems Engineer

Users of End Data Products

Users of Existing Distributed System Frameworks

Creators of Distributed System Frameworks

Confused Teams

Want to become a Data Engineer but can't find in-depth materials?

You have Successfully Subscribed!

Get your free copy of Data Engineering Teams: Creating Successful Big Data Teams and Products

Data Engineering Teams Book

Would you like to know what I teach successful organizations to do?

Mentoring

We’re here to help make the process more successful and the outcome more effective.

Architecture Reviews

The right tool for the job saves countless hours, time, money. Are you using the right tool for the job?

Project Acceleration

Why do so few companies create enormous value from Big Data while most fail?

Company

Resources

Resources

Stay updated with the latest.

Have a question?

Send us a message

or give us a call at +1 775.393.9122

© 2025 Big Data Institute

Privacy

© 2025 Big Data Institute

Privacy

Have a question?

Send us a message

or give us a call at +1 775.393.9122