Designing data for consumption in a Kafka topic requires more forethought. Instead of the messages being a consumed from point to point, there are many different consumers.

You will need to decide on:

Name
Schema
Contents
Key/Ordering
Number of Partitions
Number of Replicas

Name

The choice of a topic name shouldn’t be difficult. I suggest using a descriptive and long as necessary.

Don’t hardcode the name all over the place in your code. It’s a common early bug to misspell the topic name in several different places. I suggest using class that exposes topic names as public static final Strings.

Schema

You might have noticed that I broke out the actual schema of the topic apart from the contents or payload. A topic’s schema is different than the actual data sent.

Some examples of schema are JSON or Apache Avro. Don’t use XML as your post-ETL schema. Avro is the recommended format for post-ETL schema. Some organizations choose to use JSON. There are some big benefits to using a binary format such as Avro.

The contents are the actual payload of the message. Sometime this includes the key, but is primarily the value.

When you’re deciding on the contents of the value, remember not to focus on the first use case. Remember that data usage will grow over time and it’s easy to add more consumers. Other teams can, and will, write new consumers and require new data.

If the value’s contents are designed for one use case, you might have to change it for a new use case. My general suggestion is to add the data that makes sense to the value, even if all of the fields aren’t being used.

Key/Ordering

By default, your choice of key affects ordering in Kafka. In a worst case scenario, choosing the wrong key could make a future consumer’s use case impossible. It’s important to choose a key that makes sense given the value’s contents.

Number of Partitions

Partitions are how Kafka breaks down a topic into smaller pieces. Choosing the number of partitions affects the scalability of the topic for consumers.

It’s outside the scope of this post to say how to choose the number of partitions. You can change the number of partitions later on, but I highly suggest spending the time to figure out the right number of partitions.

Number of Replicas

The number of replicas comes down to: how important is your data? If don’t really care about the data, go for 1 replica. If you remotely care about your data, make you have 3 replicas.

Kafka Topic Design Checklist

Name

Schema

Contents

Key/Ordering

Number of Partitions

Number of Replicas

Get your free copy of Data Engineering Teams: Creating Successful Big Data Teams and Products

Data Engineering Teams Book

Would you like to know what I teach successful organizations to do?

Mentoring

We’re here to help make the process more successful and the outcome more effective.

Architecture Reviews

The right tool for the job saves countless hours, time, money. Are you using the right tool for the job?

Project Acceleration

Why do so few companies create enormous value from Big Data while most fail?

Company

Resources

Resources

Stay updated with the latest.

Have a question?

Send us a message

or give us a call at +1 775.393.9122

© 2025 Big Data Institute

Privacy

© 2025 Big Data Institute

Privacy

Have a question?

Send us a message

or give us a call at +1 775.393.9122