Search
Close this search box.

Beam 2.0 Q and A

Apache Beam just had its first API stable release. Now that we have an API stable release, I want to update what’s changed in the Beam ecosystem. I want to highlight the growth of Beam as a project and the increased usage of Beam in pre-production/development or production deployments. Each committer and user is sharing […]

The Difficulty of Transitioning to Data Pipelines

There’s a common difficulty that companies are having in transitioning to Big Data, especially Kafka. They’re coming from systems where everything is exposed as an RPC-esque call (remote procedure call/REST call/etc). They’re transitioning to a data pipeline where everything is exposed as raw data. These data pipelines are a brand new concept. With RPC’s, there […]

Five Dysfunctions of a Data Engineering Team

At Strata London, I premiered a new talk based on my Data Engineering Teams book. Companies are seeing great efficiency gains and ROI from using Big Data technologies. However, the vast majority of teams fail and never get something into production. I want to prevent that failure and here are the top 5 reasons why […]

How to Evaluate an Open Source Product

Open source is a great way to solve problems. Mostly we focus on the open source project from a technical and architectural points of view. In this post, I’m going to talk about it from a business point of view. Sometimes you’re look through 3-10 different open source projects on GitHub and/or Apache. If they […]

Kafka Topic Design Checklist

Designing data for consumption in a Kafka topic requires more forethought. Instead of the messages being a consumed from point to point, there are many different consumers. You will need to decide on: Name Schema Contents Key/Ordering Number of Partitions Number of Replicas Name The choice of a topic name shouldn’t be difficult. I suggest […]

The Many Meanings of Event-Driven Architecture: Kafka Edition

I spoke at GOTO Chicago last week with Martin Fowler. He gave a keynote on The Many Meanings of Event-Driven Architecture. It wasn’t tied to or specific to any particular technology. In this post, I’m applying some of his points specifically to Kafka and Big Data. Change Events Kafka is often used to event changes. […]

Consumers and Creators of Technology

Having taught at companies around the world. I’ve found there are generally two types of companies. There are companies that consume technology and there are companies that create technology. Companies that consume technology are ones that take technology and use it at their business. Their primary product is not creating technology. Rather, they consume the […]

There Are Several Hard Problems with Big Data

There’s a common misconception that says if I just change one thing in Big Data, everything else will be easier. The answer is that there are several different hard problems in Big Data. Changing one problem doesn’t solve the other problems. Sometimes, I’ll see tweets or posts about how companies or vendors haven’t made Big […]

Asking Better Questions

Learning how to ask good questions is an important life skill. This skill doesn’t just help in business or professional life. It helps in your personal life too. When I’m teaching, I encourage questions. I ask them to ask questions all the time and as often as they’d like. There are two main reasons for […]

The Learning Disconnect

When educational material is created, it starts with a learning objective. It doesn’t matter it’s a book or a video series. This learning objective defines what the book is supposed to teach you. You’ll usually see this learning objective in the title or description of the book. This is supposed to help you choose the […]