Search
Close this search box.

How Training is Delivered – From the Beginning to the End

Teams will often tell me how much better my training classes are than what they’ve had before. They go on to tell me how the training they’ve attended previously were useless. My students are surprised that I can answer programming questions, no matter how difficult they are. I want to share some of the behind […]

Two Halves Don’t Make a Whole

In Chapter 3 of my Data Engineering Teams book, I show you how to do a skill gap analysis. During the analysis of the team, you either say the person has the skill or not. It’s a very binary decision. Some people have written me asking if it can be a fraction. Instead of a […]

Apache Kafka and Amazon Kinesis

This post will focus on the key differences a Data Engineer or Architect needs to know between Apache Kafka and Amazon Kinesis. Cloud vs DIY Some of the contenders for Big Data messaging systems are Apache Kafka, Amazon Kinesis, and Google Cloud Pub/Sub (discussed in this post). While similar in many ways, there are enough […]

This is Useless (Without Use Cases)

Sometimes I’ll write a post and the comments will say something to the effect of “this is useless.” Other times I’ll be finishing up a class and a student will ask me why I didn’t cover what they’re trying to. I’ve written example code and people will ask me why didn’t write it on something […]

The Blame Game

When a Big Data project fails, there’s plenty of blame to go around. When I do the retrospectives with teams who are failing or about to fail, their blame is often misplaced. There’s a focus on blaming the technology. The more difficult considerations of looking inwards at the team itself is often skipped. The teams […]

What It Looks Like From the Outside

I teach and mentor teams that have started or are several months into their projects. I see what happens after they’ve experienced problems. I view the teams from the outside looking in. I see the manifestations of problems and I have to figure out what the root of each problem is. These issues often come […]

Medium Data

Most companies aren’t experiencing Big Data or small data problems. They’re experiencing a witching hour of sorts. This a point in their growth where their data is too big for small data and too small for Big Data. As I’m teaching at companies, I’m finding as much as 80% of use cases are falling into […]

Beam 2.0 Q and A

Apache Beam just had its first API stable release. Now that we have an API stable release, I want to update what’s changed in the Beam ecosystem. I want to highlight the growth of Beam as a project and the increased usage of Beam in pre-production/development or production deployments. Each committer and user is sharing […]

The Difficulty of Transitioning to Data Pipelines

There’s a common difficulty that companies are having in transitioning to Big Data, especially Kafka. They’re coming from systems where everything is exposed as an RPC-esque call (remote procedure call/REST call/etc). They’re transitioning to a data pipeline where everything is exposed as raw data. These data pipelines are a brand new concept. With RPC’s, there […]

Five Dysfunctions of a Data Engineering Team

At Strata London, I premiered a new talk based on my Data Engineering Teams book. Companies are seeing great efficiency gains and ROI from using Big Data technologies. However, the vast majority of teams fail and never get something into production. I want to prevent that failure and here are the top 5 reasons why […]