In my book Data Engineering Teams, I separate out programming as a different skill than distributed systems. The section is the “Skills Needed in a Team” and talks about the various skills that a data engineering team needs.

Several people have emailed me for clarification about this distinction. Aren’t programming and distributed systems the same thing? How are they different?

Programming

I’ll start with my definition of programming within Big Data.

I find there are three general types of programmers:

“Coders” who code in Excel, HTML, or another quasi-programming language
Programmers who write simple systems or use simplified frameworks
Programmers who write difficult backend or frontend systems

I wrote an entire article about the programming skills needed for Big Data. This article helps to define which category each member of your team falls into.

Distributed Systems

A common misconception is that Big Data frameworks make it dead simple to do Big Data. The answer is they make it easier, but don’t make it dead simple. Creating a solution is still very complicated. The frameworks just make it easier to concentrate on the code instead of the RPCs and threading.

In my experience, the companies that think Big Data frameworks make things easy are the most likely to fail. They assign teams and individual contributors without the skills to create the solution. They have a skills gap as I talk about in the book. Skills gaps lead to failure.

How Are They Different?

The two skills are different and not often found in the same members of the team. For your team to succeed, you will need at least one person with both the programming and distributed systems skills.

I gave a list of types of programmers. Let me show you how each one relates to their distributed systems skills.

The “coders” don’t have the distributed systems skills to create a data pipeline. They’re usually the consumers of the data pipeline.

The simple programmers rarely have the distributed systems skills. They’re usually the consumers of the data pipeline.

The advanced programmers have the highest probability of having the distributed skills, though it’s not 100%. They’re the ones creating the data pipeline. They’re consuming and creating value out of the data pipeline. They help the other programmers as they get stuck working with the data pipeline.

How Are Programming and Distributed Systems Different?

Programming

Distributed Systems

How Are They Different?

Get your free copy of Data Engineering Teams: Creating Successful Big Data Teams and Products

Data Engineering Teams Book

Would you like to know what I teach successful organizations to do?

Mentoring

We’re here to help make the process more successful and the outcome more effective.

Architecture Reviews

The right tool for the job saves countless hours, time, money. Are you using the right tool for the job?

Project Acceleration

Why do so few companies create enormous value from Big Data while most fail?

Company

Resources

Resources

Stay updated with the latest.

Have a question?

Send us a message

or give us a call at +1 775.393.9122

© 2025 Big Data Institute

Privacy

© 2025 Big Data Institute

Privacy

Have a question?

Send us a message

or give us a call at +1 775.393.9122