mmciver13 | Big Data Institute

Doing Big Data ASAP

I had an interesting question at TDWI Boston that I haven’t been asked before: If you absolutely had to do something with Hadoop and Big Data tomorrow, how would you do it? I’ll answer this from a technical and then a management point of view. Technical I call Apache Hive the Big Data technology you […]

What Happens When You Hire a Data Scientist Without a Data Engineer

Sometimes I’ll train at a company that’s creating a data engineering team. The team often includes a Data Scientist. I’ll always make a note to talk to the Data Scientist about their experience and interactions with the team before I arrived. These Data Scientists are recent hires – within the last 6 months. A clear […]

Personal Project Data Sources

You don’t have previous Big Data experience, but want to get hired as a Data Engineer. Don’t worry, you can get hired. You’ll need a well executed personal project that gets you noticed and shows your skills. I’ve verified this with hiring managers all over the place. They will hire a brand-new person if they […]

Kafka REST and JQuery Helper

I’m open sourcing one of the modules I wrote for my Real-time Data Engineering class. We use Apache Spark and Apache Kafka to process data. Then, we show the data in real-time on a webpage using this JavaScript module to pull in data from Kafka via the Kafka REST interface. The KafkaRESTHelper makes it easier […]

How Are Programming and Distributed Systems Different?

In my book Data Engineering Teams, I separate out programming as a different skill than distributed systems. The section is the “Skills Needed in a Team” and talks about the various skills that a data engineering team needs. Several people have emailed me for clarification about this distinction. Aren’t programming and distributed systems the same […]

Announcement: Data Engineering Teams Book

I’m really tired of seeing Big Data projects fail. They fail for both technical and managerial reasons. They all fail for similar reasons and that’s just sad because we can fix or prevent them. Gartner’s research shows that 85% of Big Data projects don’t even make it into production. Only 15 percent of businesses reported […]

Is Kafka Only a Big Data Tool?

I’ve been teaching Kafka at companies without the textbook definition of Big Data problems. They don’t have, and will not have in the future, what you’d define as Big Data problems. As a result, the students ask me if using Kafka is appropriate for their use cases. Put another way, is Kafka only a Big […]

What Do I Look for in Data Engineers?

I want to share with you some of the traits that I’ve found in especially good Data Engineers. Every one of these traits may not be in every Data Engineer, but you will find several. I can’t stress enough how important it is for a Data Engineer to have a strong programming background. Data Engineers […]

Q and A: Ingesting into Hadoop

Today’s blog post comes from a question from a subscriber on my mailing list. The question come from Guruprasad B.R.: What are the best ways to Ingest data in to Big Data (HBase/HDFS) from different sources like FTP, Web, Email, RDBMS,..etc There are a couple parts to this question and they’re technical: How do I […]

Hadoop MapReduce Dedupe Algorithm

In this video, I live code a dedupe algorithm. If you’re not familiar with this algorithm, you need to take several data files and remove the duplicates. I show the simple version. Then, I show a more complicated version that adds some custom logic. If you want to learn more about how to write code […]

Author: mmciver13

Doing Big Data ASAP

What Happens When You Hire a Data Scientist Without a Data Engineer

Personal Project Data Sources

Kafka REST and JQuery Helper

How Are Programming and Distributed Systems Different?

Announcement: Data Engineering Teams Book

Is Kafka Only a Big Data Tool?

What Do I Look for in Data Engineers?

Q and A: Ingesting into Hadoop

Hadoop MapReduce Dedupe Algorithm

Get your free copy of Data Engineering Teams: Creating Successful Big Data Teams and Products

Data Engineering Teams Book

Would you like to know what I teach successful organizations to do?

Mentoring

We’re here to help make the process more successful and the outcome more effective.

Architecture Reviews

The right tool for the job saves countless hours, time, money. Are you using the right tool for the job?

Project Acceleration

Why do so few companies create enormous value from Big Data while most fail?

Company

Resources

Resources

Stay updated with the latest.

Have a question?

Send us a message

or give us a call at +1 775.393.9122

© 2024 Big Data Institute

Privacy

© 2024 Big Data Institute

Privacy

Have a question?

Send us a message

or give us a call at +1 775.393.9122