Last week, I gave two talks about Strata+Hadoop World. These talks covered some of the up and coming technologies in Big Data. I describe Strata as the Super Bowl of Big Data conferences. This is where you’ll find the best minds talking about the present and future conditions of Big Data.
My first session was a tutorial with Tyler Akidau from Google. We covered Apache Beam and some of the interesting features for Big Data.
My second session covered how Apache Spark and Java can be used together. There isn’t a great deal of material on using Spark and Java together. All of my classes teach how to use Spark only using Java. Unless there is a big need for dynamic languages in the use case, I don’t see the need for teams to learn Scala.
Strata Trends
The march towards real-time Big Data continues. I spoke about Kafka at last year’s Strata+Hadoop World. This year, we’re seeing more representation with Apache Flink and Apache Apex. Data Engineers will be sure to keep up-to-date on the latest changes in real-time frameworks.
We’re also seeing more productization of Big Data use cases. There are mature products targeting specific IT uses cases that require Big Data.
On the Apache Beam side, I noticed an uptick in early adopters looking it. Many data engineering teams aren’t looking to rewrite code as they more from framework to framework. Data Scientists are looking for a single API to do their programming and analysis with.
Broader Trends
I’m also seeing some broader trends in Big Data. People are starting to agree with my assertions that business value must be established before embarking on a Big Data project. Management teams need to be training just as much as technical teams.
Gartner wrote an article talking the rise of the data executive. This is a C-level position that companies are putting in place. The title is often Chief Data Officer or Chief Analytics Officer. Companies are finally putting data in the C-suite.
ZDNET wrote an article talking about why Big Data projects fail. While I disagree that the Big Data boom is over, I agree that Big Data projects fail for specific reasons. I’ve seen these issues repeated over and over in companies. To help companies, I’ve created curriculum around teaching business leaders about Big Data projects and why they’re different than small data ones. I teach how data engineering teams should be run and the people that should be on them.
Big Data projects fail due to a lack of business value, lack of training, and having the wrong people on the team.
Photo by Alex Moundalexis