In my book, Data Engineering Teams, I talk about the right skills and people to be on a data engineering team. The right skills and people are incredibly important to the success, or failure, of a Big Data project.
Sometimes it’s easier to understand this point with some real examples. Instead of telling you what the team should look like, I’m going to share the stories of two teams who were made up of the wrong skills and people. More importantly, I’m going to share the outcomes of their projects.
Data Warehousing Team Takes on Big Data
I taught at a large insurance company that was experiencing Big Data problems. They tried to solve these problems by using their existing data warehousing team. The idea was that they would train their SQL-focused team on Python programming and Big Data technologies.
The team had been told to memorize the Python API before I came. This showed the team’s deep misunderstanding of what programming is and what’s difficult about it. There was really only one student who spent any time learning to program before the class and she had the highest odds of getting anything accomplished.
The team told me about the systems they had created. They suffered from a fundamental lack of systems design and programming knowledge. As a direct result, the system was grossly inefficient and could barely accomplish the business goals. The team could only accomplish the business goals because the business acquiesced on every requirement. The system did about 10% of what the business needed.
Once the team started learning the Big Data side, the the 10x increase in complexity just made their eyes glaze over. A student asked about how a specific use case could use Big Data. I answered the question and told them how a correctly designed system could blow away their current system. During the break, an employee who wasn’t in the class asked a student who was in the class about the use case. The student said it wasn’t possible.
I circled back with the team a few months later. The project and Big Data rollout wasn’t successful.
Limping Along with Big Data
I tried to work with a medium-sized marketing company who told me about the current state of their Big Data project. The project kind of worked, but could only do a small percentage of what the business needed. The project was spearheaded by the company’s data warehousing team.
The project was in production and was super brittle. It broke all the time and much of their operations time was spent fixing the issues. The processing had to be done every hour and everything was manually kicked off.
The project was held together with duct tape in the form of bash scripts and Hive queries. This meant that any updates or fixes were incredibly difficult and untestable.
The company has plans re-architect the solution with the same team. They should expect equally poor results. I decided not to engage with the company.
Why Is This Happening?
These are just two of many stories. They all have the common thread that a team that doesn’t have the right skills and people fails outright or limps along. I’ve been wanting to share this data for a while, but I wanted to get more data points to validate what I’ve been seeing.
When everything is a hammer that looks like SQL, you get some real abominations. One of the biggest differences between data warehousing and Big Data is the programming and systems design involved. If a team lacks programming skills, they will try to solve the problem with the (wrong) tools that they know.
Want to become a Data Engineer but can't find in-depth materials?
Get my exclusive training video to see how to become a Data Engineer. The video will teach you how to:
• Learn difficult technologies: Understand MapReduce and Spark … even if you’re just starting out
• Target the right technologies: Identify which technologies your target companies do and don’t use
• Stop wasting your time: See the techniques I used to teach Big Data at over twenty Fortune 100 companies
You have Successfully Subscribed!
There is a 10x increase in complexity when going to Big Data. If a data warehousing team is barely able to keep up with the complexity of a small data system, they won’t be able to handle the increase in complexity.
As direct result of the lack of programming, systems design, and complexity, the team gets into a vicious cycle of low performance. The system is operationally fragile and breaks because it wasn’t well-designed. Because the system breaks all the time and the team lacks the programming skills, the team spends all of its time plugging holes and they’re unable to improve the system. Often, the team doesn’t know or understand how to improve the system.
If this sounds like scenario that you’re currently experiencing or want to avoid, I mentor teams to fix failing projects.