About Professional Spark Development
Takes a participant from no knowledge of Apache Spark to being able to develop with Spark professionally. It covers the main technologies of Hadoop: HDFS and MapReduce. There is an in-depth coverage of essential Big Data and Hadoop ecosystem technologies. The class ends with a consideration of how to architect Big Data solutions with Hadoop and its ecosystem.
Duration: 3 days
Intended Audience: Technical, Software Engineers, QA, Analysts
Prerequisites: Intermediate-Level Java
You Will Learn
What exists in the Big Data ecosystem so you can use the right tool for the right job.
An understanding of how HDFS works and how to interact with it.
An understanding of how MapReduce works and how each phase works.
An understanding of how Spark works and how each phase works.
- What are Java 8 Lambdas and how they make your Spark code humanly readable.
- The basics of coding a Spark job with Java to build your Big Data foundation.
- The various API methods in Spark and what they do.
- How SQL can be used with a Spark job and when that vastly improves your productivity and code.
- How to create Java code that runs as a function during a Spark SQL command to use existing Java code or do use case specific queries.
- How to process data in real-time with Spark.
- How to integrate and use Spark with the rest of your Big Data systems.
Professional Spark Development
Thinking in Big Data
Introducing Big Data
What is Hadoop?
Introduction to HDFS
Introduction to MapReduce
Coding With Spark
Using Apache Maven
Built-In Transformations and Actions
Spark and Avro
Spark SQL API
Spark SQL UDFs
Using With Hadoop MapReduce
Replacing Other Systems
- Apache Spark
- Apache Hadoop
- Apache Kafka