Blog, Big Data Institute, Page 3

Saving Money with Apache Pulsar Tiered Storage

As companies start to look at rolling out real-time messaging systems, it’s important to look at the overall hardware costs. With some forward planning, companies can save as much as 85% on their overall storage costs. Before we start getting into the cost comparisons, let me briefly show how Apache Kafka and Apache Pulsar store […]

Q and A: Viewpoints on Open Source

There are diverse viewpoints on open source and its usage as a service. I’ve attempted to give a synopsis of the issues and some background – but that’s only my viewpoint. I’m bringing in other people to give their diverse viewpoints to give a more well-rounded one. This is stemming from this Twitter thread. The […]

The Three Components of a Big Data Data Pipeline

The Three Components of a Big Data Data Pipeline There’s a common misconception in Big Data that you only need 1 technology to do everything that’s necessary for a data pipeline – and that’s incorrect. Data Engineering != Spark The misconception that Apache Spark is all you’ll need for your data pipeline is common. The […]

Advice for Small Teams and Startups on Data Engineering

Small data engineering teams require different tactics. Much of my writing is geared towards larger companies and teams. How should a startup or small data engineering team in a big company be set up and work? What, if anything, should be done different? Your First Data Engineer Your first data engineering hire is a crucial […]

Creating a Data Engineering Culture

At DataEngConf Barcelona, I premiered a new talk about the importance of creating a data engineering culture. I share what a data engineering culture is and what management needs to do to be successful with Big Data.
Here is the video from the conferen…

Why You Can’t Do All of Your Data Engineering with SQL

There is a common misunderstanding in data engineering that you can do everything you need to create a Big Data data pipeline with SQL. This notion is being promoted by some vendors and companies. They’re wrong and you can’t do all of your data engineering with SQL. You will eventually need a programming language to […]

Thoughts on Cloudera Merging/Buying Hortonworks

Cloudera has merged with/purchased Hortonworks. As a former Clouderan, it’s interesting to see this move on several levels. I’m going to share my insights from the outside as a former insider. Full Disclosure: Although I’m former Cloudera, I don’t own any shares of Cloudera or Hortonworks and don’t plan to purchase any in the short-term. […]

Creating Work Queues with Apache Kafka and Apache Pulsar

A common use case for using Kafka and Pulsar is to create work queues. The two technologies offer different implementations for accomplishing this use case. I’ll discuss the ways of implementing work queues in Kafka and Pulsar as well as the relative strengths of doing each one. What are work queues? A work queue is […]

InfiniteConf Keynote – Why Real-time is the Future

Here is my keynote from InfiniteConf 2018. I talk about why real-time is gaining so much momentum, what it does for businesses, how it helps data sciences, and some common use cases.

What is a Data Pipeline?

I’ve been seeing some questions about data pipelines lately. I realized I haven’t written a post that gives the level of detail necessary for a good definition of a data pipeline in the…

Category: Blog

Saving Money with Apache Pulsar Tiered Storage

Q and A: Viewpoints on Open Source

The Three Components of a Big Data Data Pipeline

Advice for Small Teams and Startups on Data Engineering

Creating a Data Engineering Culture

Why You Can’t Do All of Your Data Engineering with SQL

Thoughts on Cloudera Merging/Buying Hortonworks

Creating Work Queues with Apache Kafka and Apache Pulsar

InfiniteConf Keynote – Why Real-time is the Future

What is a Data Pipeline?

Get your free copy of Data Engineering Teams: Creating Successful Big Data Teams and Products

Data Engineering Teams Book

Would you like to know what I teach successful organizations to do?

Mentoring

We’re here to help make the process more successful and the outcome more effective.

Architecture Reviews

The right tool for the job saves countless hours, time, money. Are you using the right tool for the job?

Project Acceleration

Why do so few companies create enormous value from Big Data while most fail?

Company

Resources

Resources

Stay updated with the latest.

Have a question?

Send us a message

or give us a call at +1 775.393.9122

© 2024 Big Data Institute

Privacy

© 2024 Big Data Institute

Privacy

Have a question?

Send us a message

or give us a call at +1 775.393.9122