I teach and mentor teams that have started or are several months into their projects. I see what happens after they’ve experienced problems. I view the teams from the outside looking in. I see the manifestations of problems and I have to figure out what the root of each problem is.
These issues often come from management thinking Hadoop/Spark/Big Data is a silver bullet or that it’s going to be an easy rollout. Once they get deep into the guts or project, management and engineering find out it isn’t easy. They’re faced with the difficult decision of delaying the project or doing a half-assed job.
These incorrect assumptions made in a vacuum at the beginning of a project lead to failure. If you’re embarking on a Big Data project, make sure you’ve read and applied my Data Engineering Teams book’s advice.
The team assumes that somehow they’ll have the time to go back and do it right. They don’t ever get the time to go back and do it again. There are two main reasons. First, teams are never given the time to go back and do it right. Second, it means changing data in flight or on disk.
If you’re changing data on disk and didn’t use a schema that can evolve, you’ll have all sorts of trouble changing code. This becomes the non-starter or pushes out development timelines. For enterprises, they’ll have to convince and coordinate with other teams on code changes.
These are the types of projects and mentalities that are cancelled due to a lack of progress. Usually the post-mortem blames the technology. To the outside observer that’s reason why things failed; there was some kind of technical issue. It takes an honest look at the whole project to truly figure what caused the problems in the first place.