Course curriculum

  • 1

    Apache Spark and Scala- Overview

    • Course Objectives

    • Target Audience

    • Course Prerequisites

    • Value to the Professionals

    • Value to the Professionals- 2

    • Value to the Professionals- 3

    • Lessons Covered

    • Conclusion

  • 2

    Introduction to Spark

    • Objectives

    • Need of New Generation Distributed Systems

    • Limitations of Map reduce in Hadoop

    • Limitations of Map reduce in Hadoop-2

    • Batch vs Real-Time Processing

    • Application of Stream Processing

    • Application of In-Memory Processing

    • Introduction to Apache Spark

    • History of Spark

    • Language Flexibility in Spark

    • Spark Execution Architecture

    • Automatic Parallelization of Complex Flows

    • Automatic Parallelization of Complex Flows-Important Points

    • Apis That Match User Goals

    • Apache Spark- A Unified Platform of Big Data Apps

    • More Benefits of Apache Spark

    • Running Spark in Different Modes

    • Installing Spark as a Standalone Cluster - Configuration

    • Demo - Install Apache Spark

    • Overview of Spark on a Cluster

    • Demo-Install Apache Spark-1

    • Tasks of Spark on a Cluster

    • Companies Using Spark - Use Cases

    • Hadoop Ecosystem vs Apache Spark

    • Hadoop Ecosystem vs Apache Spark-2

    • Summary

    • Summary-2

    • Conclusion

  • 3

    Introduction to Programming in Scala

    • Objectives

    • Introduction to Scala

    • Basic Data Types

    • Basic Literals

    • Basic Literals-2

    • Basic Literals-3

    • Introduction to Operators

    • Use Basic Literals and the Arithmetic Operator

    • Demo Use Basic Literals and the Arithmetic Operator

    • Use the Logical Operator

    • Demo Use the Logical Operator

    • Introduction to Type Inference

    • Type Inference for Recursive Methods

    • Type Inference for Polymorphic Methods and Generic Classes

    • Unreliability on Type Inference Mechanism

    • Mutable Collection vs Immutable Collection

    • Functions

    • Anonymous Functions

    • Objects

    • Classes

    • Use Type Inference, Functions, Anonymous Function and Class

    • Demo Use Type Inference, Functions, Anonymous Function and Class

    • Traits as Interfaces

    • Traits - Example

    • Collections

    • Types of Collections

    • Types of Collections-2

    • Lists

    • Perform Operations on Lists

    • Demo Use Data Structures

    • Maps

    • Pattern Matching

    • Implicits

    • 3.34 Implicits-2

    • Streams

    • Use Data Structures

    • Demo Perform Operations on Lists

    • Summary

    • Summary-2

    • Conclusion

  • 4

    Using RDD for Creating Applications in Spark

    • Objectives

    • RDDS API

    • Creating RDDS

    • Creating RDDS Referencing an External Dataset

    • Referencing an External Dataset Text Files

    • Referencing an External Dataset Text Files-2

    • Referencing an External Dataset Sequence Files

    • Referencing an External Dataset other Hadoop Input Formats

    • Creating RDDS - Important Points

    • RDDS Operations

    • RDD Operations - Transformations

    • Features of RDD Persistence

    • Storage Levels of RDD Persistence

    • Invoking the Spark Shell

    • Importing Spark Classes

    • Creating the Spark context

    • Creating the Spark Context

    • Loading a File in Shell

    • Performing Some Basic Operations on Files in Spark Shell RDDS

    • Packaging a Spark Project With SBT

    • Running a Spark Project with SBT

    • Demo - Build a Scala Project

    • Build A Scala Project-1

    • Demo - Build a Spark Java Project

    • Build A Spark Java Project-1

    • Shared Variables - Broadcast

    • Shared Variables - Accumulators

    • Writing a Scala Application

    • Demo - Run a Scala Application

    • Run a Scala Application

    • Write a Scala Application Reading the Hadoop Data

    • Write a Scala Application Reading the Hadoop Data

    • Demo - Run a Scala Application Reading the Hadoop Data

    • Run a Scala Application Reading the Hadoop Data

    • DoubleRDD Methods

    • PairRDD Methods- Join

    • PairRDD Methods- Others

    • JavaPairRDD Methods

    • JavaPairRDD Methods-2

    • General RDD Methods

    • General RDD Methods-2

    • Java RDD Methods

    • Common Java RDD Methods

    • Spark Java Function Classes

    • Method for Combining JavaPairRDD Functions

    • Transformations in RDD

    • Other Methods

    • Actions in RDD

    • Key-value Pair RDD in Scala

    • Key-value Pair RDD in Java

    • Using Mapreduce and Pair RDD Operations

    • Reading Text File from HDFS

    • Reading Sequence File from HDFS

    • Writing Text Data to HDFS.mp4

    • Writing Sequence File to HDFS

    • Using Groupby

    • Using Groupby-2

    • Demo - Run a Scala Application Performing Groupby Operation

    • Run A Scala Application Performing Groupby Operation-1

    • Demo - Write and Run a Java Application

    • Write and Run a Java Application

    • Summary

    • Summary-2

    • Conclusion

  • 5

    Running SQL queries using SparkSQL

    • Objectives

    • Importance of Spark SQL

    • Benefits of Spark SQL

    • Dataframes

    • SQLContext

    • SQL Context-2

    • Creating a Dataframe

    • Using Dataframe Operations

    • Using Dataframe Operations-2

    • Demo - Run SparkSQL with a Dataframe

    • Run Spark SQL Programmatically-1

    • Save Modes

    • Saving to Persistent Tables

    • Parquet Files

    • Partition Discovery

    • Schema Merging

    • JSON Data

    • Hive Table

    • DML Operation - Hive Queries

    • Demo - Run Hive Queries Using Spark SQL

    • JDBC to other Databases

    • Supported Hive Features

    • Supported Hive Features-2

    • Supported Hive Data Types

    • Case Classes

    • Case Classes-2

    • Summary

    • Summary-2

    • Conclusion

  • 6

    Spark Streaming

    • Objectives

    • Introduction to Spark Streaming

    • Working of Spark Streaming

    • Streaming Word Count

    • Micro Batch

    • DStreams

    • DStreams-2

    • Input DStreams and Receivers

    • Input DStreams and Receivers-2

    • Basic Sources

    • Advanced Sources

    • Transformations on DStreams

    • Output Operations on DStreams

    • Design Patterns for Using ForeachRDD

    • Dataframe and SQL Operations

    • Dataframe and SQL Operations-2

    • Checkpointing

    • Enabling Checkpointing

    • Socket Stream

    • File Stream

    • Stateful Operations

    • Window Operations

    • Types of Window Operations

    • Types of Window Operations-2

    • Join Operations - Stream - Dataset Joins

    • Monitoring Spark Streaming Application

    • Performance Tuning - High Level

    • Demo - Capture and Process the Netcat Data

    • Capture and Process the Flume Data

    • Demo - Capture the Twitter Data

    • Capture the Twitter Data

    • Summary

    • Summary-2

    • Conclusion

  • 7

    Spark ML Programming

    • Objectives

    • Introduction to Machine Learning

    • Applications of Machine Learning

    • Machine Learning in Spark

    • Dataframes

    • Transformers and Estimators

    • Pipeline

    • Working of a Pipeline

    • Working of a Pipeline-2

    • Dag Pipelines

    • Runtime Checking

    • Parameter Passing

    • General Machine Learning Pipeline - Example

    • Model Selection via Cross - Validation

    • Supported Types, Algorithms and Utilities

    • Data Types

    • Feature Extraction and Basic Statistics

    • Clustering

    • K - Means

    • K - Means_1

    • K - Means_2

    • Demo - Perform Clustering Using K - Means

    • Perform Clustering Using K - Means_1

    • Gaussian Mixture

    • Power Iteration Clustering

    • Latent Dirichlet Allocation

    • Latent Dirichlet Allocation-2

    • Collaborative Filtering

    • Classification

    • Classification-2

    • Regression

    • Example of Regression

    • Demo - Perform Classification Using Linear Regression

    • Perform Classification Using Linear Regression

    • Demo - Run Linear Regression

    • Run Linear Regression

    • Demo - Perform Recommendation Using Collaborative Filtering

    • Perform Recommendation Using Collaborative Filtering

    • Demo - Run Recommendation System

    • Run Recommendation System

    • Summary

    • Summary-2

    • Conclusion

  • 8

    Spark Graphx Programming

    • Objectives

    • Introduction to Graph - Parallel System

    • Limitations of Graph Parallel System

    • Introduction to GraphX

    • Introduction to GraphX-2

    • Importing GraphX

    • The Property Graph

    • The Property Graph-2

    • Creating a Graph

    • Demo - Create a Graph Using GraphX

    • Create a Graph Using GraphX

    • Triplet View

    • Graph Operators

    • List of Operators

    • List of Operators-2

    • Property Operators

    • Structural Operators

    • Subgraphs

    • Join Operators

    • Perform Graph Operations Using GraphX

    • Perform Graph Operations Using Graphx-1

    • Demo - Perform Subgraph Operations

    • Perform Subgraph Operations-1

    • Neighborhood Aggregation

    • Map Reduce Triplets

    • Demo - Perform Map Reduce Operations

    • Perform Map Reduce Operations-1

    • Counting Degree of Vertex

    • Collecting Neighbors

    • Caching and Uncaching

    • Vertex and Edge RDDs

    • Graph System Optimizations_1

    • Summary

    • Summary-1

    • 8.35 Conclusion