Real-time Data Engineering

in the Cloud

Takes a participant through the benefits and challenges of real-time Big Data systems. We cover real-time Big Data services that are open source or managed services from Cloud providers. The class focuses on Apache Kafka and Apache Spark Streaming. It shows how to create consumers and publishers in Kafka. Then, we see how to use Apache Spark Streaming to process the data in Kafka and send it back to Kafka. Finally, the data is visualized in real-time on a webpage using Kafka REST.

Duration: 2 days

Intended Audience: Technical, Software Engineers, QA, Analysts

Prerequisites: Intermediate-Level Java

You Will Learn

  • How to create large scale real-time systems using both Apache Kafka and Apache Spark Streaming.
  • How real-time distributed systems are different from batch systems.

  • How to create Kafka producers and consumers.
  • How to process data in Kafka with Spark Streaming and place the results back into Kafka.
  • How to visualize data and show data in real-time on a web page.

Course Outline

Real-time Data Pipelines
  Real-time Technologies
  Real-time Pipelines
  Pros and Cons of Real-time
Using the Cloud
  Cloud Providers
  Real-time Technologies
  Choosing a Provider
Ingesting Data
  Real-time Ingestion
  Real-time ETL
Kafka
  About Kafka
  Kafka Internals
  Kafka API
Processing Data
  Real-time Data Processing
  Real-time Processing Technologies
Spark Streaming
  Spark Streaming
  Streaming API
  Advanced Streaming
Data Products
  Analysis of Data
  Dashboarding

Technologies Covered

In-depth Coverage

  • Apache Spark Streaming
  • Apache Kafka

Covered

  • Amazon Web Services
  • Microsoft Azure
  • Google Cloud
  • IBM SoftLayer
  • Amazon Kinesis
  • Microsoft Event Hubs
  • Google Pub/Sub
  • Apache NiFi
  • Apache Flink
  • Apache Apex
  • Apache Storm
  • Heron
  • Azure Stream Analytics
  • Google Cloud Dataflow
  • Apache Beam

I want this class