Q and A: Ingesting into Hadoop, Big Data Institute

Today’s blog post comes from a question from a subscriber on my mailing list. The question come from Guruprasad B.R.:

What are the best ways to Ingest data in to Big Data (HBase/HDFS) from different sources like FTP, Web, Email, RDBMS,..etc

There are a couple parts to this question and they’re technical:

How do I get data into HDFS?
How do I get data into HBase?
How does the source of data dictate how it’s ingested?

Sqoop

I’ll start off with the easy one. How do you get data from a RDBMS into HDFS and HBase? You’d use Apache Sqoop. It can take data from both a RDBMS and put it into HDFS or HBase.

It can go the other way around too. Sqoop can move data from HDFS or HBase and put it back into the RDBMS.

Simple File Transfer

There are a few ways to do simple file transfers into HDFS. You could use:

Apache Oozie to move files as part of a workflow
Use Hue’s REST interface
Use Hadoop’s WebHDFS REST or FUSE interfaces
Write a custom program that implements FTP, HTTP, etc and puts the files into HDFS with the HDFS API

The right tool for the job depends on your use case.

Getting Data In

The far more difficult problem is how to use the data or get it into HBase. For that, you’ll need to write custom code. The suggestions above only get you to the point where you’re using HDFS as a backup. The real value is working with the data.

The programs you need to write and the right tools for the job depends on your use case. This where qualified Data Engineers are important. They’ll help the team understand the use case and how the data pipeline should be created.

Q and A: Ingesting into Hadoop

Sqoop

Simple File Transfer

Getting Data In

Get your free copy of Data Engineering Teams: Creating Successful Big Data Teams and Products

Data Engineering Teams Book

Would you like to know what I teach successful organizations to do?

Mentoring

We’re here to help make the process more successful and the outcome more effective.

Architecture Reviews

The right tool for the job saves countless hours, time, money. Are you using the right tool for the job?

Project Acceleration

Why do so few companies create enormous value from Big Data while most fail?

Company

Resources

Resources

Stay updated with the latest.

Have a question?

Send us a message

or give us a call at +1 775.393.9122

© 2025 Big Data Institute

Privacy

© 2025 Big Data Institute

Privacy

Have a question?

Send us a message

or give us a call at +1 775.393.9122