Q and A: Is a Data Engineer the same thing as a BI or DBA?

Today’s blog post comes from a question from a subscriber to my mailing list. The question come from Alpesh D.:

I have been getting your emails and they all seem to make sense. However, did I understand it correct that you believe all big data engineers need to be to use Java? I come from a heavy SQL, MPP data warehousing and BI background. With having done shell scripting from my days when I was a DBA I am able to pick up Python and move ahead but Java seems like a little too much. What are your thoughts?

I think your questions could be restated as two questions:

Is a Data Engineer the same thing as a BI or DBA?
Does a Data Engineer need to use Java?

Is a Data Engineer the same thing as a BI or DBA?

A Data Engineer is someone who has specialized their skills in creating software solutions around data. Their skills are predominantly based around Hadoop, Spark, and the open source Big Data ecosystem. They usually program in Java, Scala, or Python. They have an in-depth knowledge of creating data pipelines. Data pipelines are how data is brought in, processed, and create some kind of business value. This business value is usually reports, analytics, and dashboarding. More advanced examples are fraud analytics or predictive analytics pipelines.

They are not a DBA (Database Administrator), Business Intelligence, Data Analyst, or ETL Developer. That’s not to say a person with these titles couldn’t be a Data Engineer. Rather, people with these titles will need training and probably entirely new skills to become a Data Engineer. Usually, they’ll need more programming skills and Big Data skills than most people with these titles.

Data Engineers are tasked with creating data pipelines and data products. Complex data pipelines are often outside the abilities of non-programmers because they require custom programming and code.

Does a Data Engineer need to use Java?

A Data Engineer’s primary language needs to be Java. They’ll also need to know SQL and I highly recommend they know at least one dynamic language like Python or Scala.

If you look around the Big Data ecosystem, virtually every one of the projects has a Java API. Some projects may support a Java API and another language. That doesn’t mean everything in a data pipeline is limited to Java. Some pipelines will be a mix of Java, SQL, and a dynamic language.

I’ve trained at companies where their data team was limited to a knowledge of SQL. They are severely limited in what they can accomplish with SQL. You can do some interesting things with SQL and I recommend using SQL for some operations. But when SQL is your only tool, you can’t use the other ecosystem tools that don’t have a SQL interface and, if SQL couldn’t do it, it simply wasn’t done. They had no other alternative to create something else.

Join my mailing list and I might answer your question next time.

Q and A: Is a Data Engineer the same thing as a BI or DBA?

Is a Data Engineer the same thing as a BI or DBA?

Does a Data Engineer need to use Java?

Get your free copy of Data Engineering Teams: Creating Successful Big Data Teams and Products

Data Engineering Teams Book

Would you like to know what I teach successful organizations to do?

Mentoring

We’re here to help make the process more successful and the outcome more effective.

Architecture Reviews

The right tool for the job saves countless hours, time, money. Are you using the right tool for the job?

Project Acceleration

Why do so few companies create enormous value from Big Data while most fail?

Company

Resources

Resources

Stay updated with the latest.

Have a question?

Send us a message

or give us a call at +1 775.393.9122

© 2025 Big Data Institute

Privacy

© 2025 Big Data Institute

Privacy

Have a question?

Send us a message

or give us a call at +1 775.393.9122