mmciver13 | Big Data Institute

Creating Work Queues with Apache Kafka and Apache Pulsar

A common use case for using Kafka and Pulsar is to create work queues. The two technologies offer different implementations for accomplishing this use case. I’ll discuss the ways of implementing work queues in Kafka and Pulsar as well as the relative strengths of doing each one. What are work queues? A work queue is […]

InfiniteConf Keynote – Why Real-time is the Future

Here is my keynote from InfiniteConf 2018. I talk about why real-time is gaining so much momentum, what it does for businesses, how it helps data sciences, and some common use cases.

What is a Data Pipeline?

I’ve been seeing some questions about data pipelines lately. I realized I haven’t written a post that gives the level of detail necessary for a good definition of a data pipeline in the…

Professional Data Engineering Review – Sanjoy Roy

Note: this is a guest post from Sanjoy Roy who is reviewing my Professional Data Engineering course. Since late 2014, I have been drawn into various analytics projects which required a good mix of skills for both data engineering and data science. There are a lot of good MOOC available now which covers very focussed […]

Saying You Have Small Data Isn’t Belittling Your Use Case

There is a common beginner question for engineers starting out with Big Data. An engineer will do a post to a social media site saying “I need to know which Big Data technology to use. I have 3 billion rows in 10,000 files. The whole dataset is 100 GB. Is Big Data Technology X efficient […]

The Two Types of Data Engineering

There are two different types of data engineering. There are two different types of job types with the title data engineer. This is especially confusing to organizations and individuals who are starting out learning about data engineering. This confusion leads to the failure of many teams’ Big Data projects. Types of Data Engineering The first […]

Why Real-time is the Future

One of the benefits of teaching and consulting is the sheer number of organizations, teams, and people I get to work with. Since I deal with so many different groups, I can see patterns emerge much faster than others. One pattern I saw early on was real-time Big Data. Organizations wanted to do things in […]

The Four Types of Technologies You Need for Real-time Big Data Systems

Creating real-time data pipelines bring new challenges. There are new concepts and technologies that you’ll need to learn and understand. To help you understand the basic technologies you need in a real-time data pipeline, I break it down into 4 general types. These types are: Processors Analytics Ingestion and dissemination Storage Processors A processor is […]

What Are Batch and Real-time Big Data?

The move from batch to real-time Big Data represents change. It will entail using brand new technologies and concepts that you haven’t dealt with before. Batch Big Data Let’s start off by defining batch Big Data. For batch, all data must be there when the processing starts. Batch processes can run over fixed periods of […]

Data engineers vs. data scientists

I wrote a post for the O’Reilly data blog going into my latest thoughts and views on data engineers versus data scientists. I continue on to talk about machine learning engineers.

Author: mmciver13

Creating Work Queues with Apache Kafka and Apache Pulsar

InfiniteConf Keynote – Why Real-time is the Future

What is a Data Pipeline?

Professional Data Engineering Review – Sanjoy Roy

Saying You Have Small Data Isn’t Belittling Your Use Case

The Two Types of Data Engineering

Why Real-time is the Future

The Four Types of Technologies You Need for Real-time Big Data Systems

What Are Batch and Real-time Big Data?

Data engineers vs. data scientists

Get your free copy of Data Engineering Teams: Creating Successful Big Data Teams and Products

Data Engineering Teams Book

Would you like to know what I teach successful organizations to do?

Mentoring

We’re here to help make the process more successful and the outcome more effective.

Architecture Reviews

The right tool for the job saves countless hours, time, money. Are you using the right tool for the job?

Project Acceleration

Why do so few companies create enormous value from Big Data while most fail?

Company

Resources

Resources

Stay updated with the latest.

Have a question?

Send us a message

or give us a call at +1 775.393.9122

© 2025 Big Data Institute

Privacy

© 2025 Big Data Institute

Privacy

Have a question?

Send us a message

or give us a call at +1 775.393.9122