I’ve been teaching Kafka at companies without the textbook definition of Big Data problems. They don’t have, and will not have in the future, what you’d define as Big Data problems. As a result, the students ask me if using Kafka is appropriate for their use cases. Put another way, is Kafka only a Big Data tool?

For most Big Data technologies, not having or having a Big Data problem in the future is the reason not to use technologies like Apache Hadoop or Apache Spark. It’s a pretty clear pass/fail because the technical and operational overhead of these projects immediately negates any other benefits. Using Big Data for small data isn’t just massive overkill; it’s going to waste a lot of time and money.

For Kafka, it’s different. I define Kafka as a distributed publish subscribe system. Companies without clear Big Data problems are gaining value from it. They’re able to use the other interesting features of Kafka.

Here are some of the pros I see for using Kafka with small data:

Here are some of the cons I see for using Kafka compared to a traditional small data pub/sub:

With these pros and cons in mind, you can make a choice between Kafka and your small data pub/sub of choice. If the pros are really compelling and outweigh the cons, I suggest you start looking at Kafka. If the cons outweigh, you’re probably better off with your small data pub/sub.

Learn more about how Kafka works here: