I Come Not To Bury Cloudera But To Praise It

It’s been a tumultuous past few weeks for big data vendors. First MapR is having problems (their update). Now, Cloudera is having problems.
As of today, Cloudera closed at $5.21 (June …

Advice for Small Teams and Startups on Data Engineering

Small data engineering teams require different tactics. Much of my writing is geared towards larger companies and teams. How should a startup or small data engineering team in a big company be set up and work? What, if anything, should be done different? Your First Data Engineer Your first data engineering hire is a crucial […]

Creating a Data Engineering Culture

At DataEngConf Barcelona, I premiered a new talk about the importance of creating a data engineering culture. I share what a data engineering culture is and what management needs to do to be successful with Big Data.
Here is the video from the conferen…

Why You Can’t Do All of Your Data Engineering with SQL

There is a common misunderstanding in data engineering that you can do everything you need to create a Big Data data pipeline with SQL. This notion is being promoted by some vendors and companies. They’re wrong and you can’t do all of your data engineering with SQL. You will eventually need a programming language to […]

Creating Work Queues with Apache Kafka and Apache Pulsar

A common use case for using Kafka and Pulsar is to create work queues. The two technologies offer different implementations for accomplishing this use case. I’ll discuss the ways of implementing work queues in Kafka and Pulsar as well as the relative strengths of doing each one. What are work queues? A work queue is […]

What is a Data Pipeline?

I’ve been seeing some questions about data pipelines lately. I realized I haven’t written a post that gives the level of detail necessary for a good definition of a data pipeline in the…

Professional Data Engineering Review – Sanjoy Roy

Note: this is a guest post from Sanjoy Roy who is reviewing my Professional Data Engineering course. Since late 2014, I have been drawn into various analytics projects which required a good mix of skills for both data engineering and data science. There are a lot of good MOOC available now which covers very focussed […]

Saying You Have Small Data Isn’t Belittling Your Use Case

There is a common beginner question for engineers starting out with Big Data. An engineer will do a post to a social media site saying “I need to know which Big Data technology to use. I have 3 billion rows in 10,000 files. The whole dataset is 100 GB. Is Big Data Technology X efficient […]