Course curriculum
-
1
Apache Spark and Scala- Overview
-
Course Objectives
-
Target Audience
-
Course Prerequisites
-
Value to the Professionals
-
Value to the Professionals- 2
-
Value to the Professionals- 3
-
Lessons Covered
-
Conclusion
-
-
2
Introduction to Spark
-
Objectives
-
Need of New Generation Distributed Systems
-
Limitations of Map reduce in Hadoop
-
Limitations of Map reduce in Hadoop-2
-
Batch vs Real-Time Processing
-
Application of Stream Processing
-
Application of In-Memory Processing
-
Introduction to Apache Spark
-
History of Spark
-
Language Flexibility in Spark
-
Spark Execution Architecture
-
Automatic Parallelization of Complex Flows
-
Automatic Parallelization of Complex Flows-Important Points
-
Apis That Match User Goals
-
Apache Spark- A Unified Platform of Big Data Apps
-
More Benefits of Apache Spark
-
Running Spark in Different Modes
-
Installing Spark as a Standalone Cluster - Configuration
-
Demo - Install Apache Spark
-
Overview of Spark on a Cluster
-
Demo-Install Apache Spark-1
-
Tasks of Spark on a Cluster
-
Companies Using Spark - Use Cases
-
Hadoop Ecosystem vs Apache Spark
-
Hadoop Ecosystem vs Apache Spark-2
-
Summary
-
Summary-2
-
Conclusion
-
-
3
Introduction to Programming in Scala
-
Objectives
-
Introduction to Scala
-
Basic Data Types
-
Basic Literals
-
Basic Literals-2
-
Basic Literals-3
-
Introduction to Operators
-
Use Basic Literals and the Arithmetic Operator
-
Demo Use Basic Literals and the Arithmetic Operator
-
Use the Logical Operator
-
Demo Use the Logical Operator
-
Introduction to Type Inference
-
Type Inference for Recursive Methods
-
Type Inference for Polymorphic Methods and Generic Classes
-
Unreliability on Type Inference Mechanism
-
Mutable Collection vs Immutable Collection
-
Functions
-
Anonymous Functions
-
Objects
-
Classes
-
Use Type Inference, Functions, Anonymous Function and Class
-
Demo Use Type Inference, Functions, Anonymous Function and Class
-
Traits as Interfaces
-
Traits - Example
-
Collections
-
Types of Collections
-
Types of Collections-2
-
Lists
-
Perform Operations on Lists
-
Demo Use Data Structures
-
Maps
-
Pattern Matching
-
Implicits
-
3.34 Implicits-2
-
Streams
-
Use Data Structures
-
Demo Perform Operations on Lists
-
Summary
-
Summary-2
-
Conclusion
-
-
4
Using RDD for Creating Applications in Spark
-
Objectives
-
RDDS API
-
Creating RDDS
-
Creating RDDS Referencing an External Dataset
-
Referencing an External Dataset Text Files
-
Referencing an External Dataset Text Files-2
-
Referencing an External Dataset Sequence Files
-
Referencing an External Dataset other Hadoop Input Formats
-
Creating RDDS - Important Points
-
RDDS Operations
-
RDD Operations - Transformations
-
Features of RDD Persistence
-
Storage Levels of RDD Persistence
-
Invoking the Spark Shell
-
Importing Spark Classes
-
Creating the Spark context
-
Creating the Spark Context
-
Loading a File in Shell
-
Performing Some Basic Operations on Files in Spark Shell RDDS
-
Packaging a Spark Project With SBT
-
Running a Spark Project with SBT
-
Demo - Build a Scala Project
-
Build A Scala Project-1
-
Demo - Build a Spark Java Project
-
Build A Spark Java Project-1
-
Shared Variables - Broadcast
-
Shared Variables - Accumulators
-
Writing a Scala Application
-
Demo - Run a Scala Application
-
Run a Scala Application
-
Write a Scala Application Reading the Hadoop Data
-
Write a Scala Application Reading the Hadoop Data
-
Demo - Run a Scala Application Reading the Hadoop Data
-
Run a Scala Application Reading the Hadoop Data
-
DoubleRDD Methods
-
PairRDD Methods- Join
-
PairRDD Methods- Others
-
JavaPairRDD Methods
-
JavaPairRDD Methods-2
-
General RDD Methods
-
General RDD Methods-2
-
Java RDD Methods
-
Common Java RDD Methods
-
Spark Java Function Classes
-
Method for Combining JavaPairRDD Functions
-
Transformations in RDD
-
Other Methods
-
Actions in RDD
-
Key-value Pair RDD in Scala
-
Key-value Pair RDD in Java
-
Using Mapreduce and Pair RDD Operations
-
Reading Text File from HDFS
-
Reading Sequence File from HDFS
-
Writing Text Data to HDFS.mp4
-
Writing Sequence File to HDFS
-
Using Groupby
-
Using Groupby-2
-
Demo - Run a Scala Application Performing Groupby Operation
-
Run A Scala Application Performing Groupby Operation-1
-
Demo - Write and Run a Java Application
-
Write and Run a Java Application
-
Summary
-
Summary-2
-
Conclusion
-
-
5
Running SQL queries using SparkSQL
-
Objectives
-
Importance of Spark SQL
-
Benefits of Spark SQL
-
Dataframes
-
SQLContext
-
SQL Context-2
-
Creating a Dataframe
-
Using Dataframe Operations
-
Using Dataframe Operations-2
-
Demo - Run SparkSQL with a Dataframe
-
Run Spark SQL Programmatically-1
-
Save Modes
-
Saving to Persistent Tables
-
Parquet Files
-
Partition Discovery
-
Schema Merging
-
JSON Data
-
Hive Table
-
DML Operation - Hive Queries
-
Demo - Run Hive Queries Using Spark SQL
-
JDBC to other Databases
-
Supported Hive Features
-
Supported Hive Features-2
-
Supported Hive Data Types
-
Case Classes
-
Case Classes-2
-
Summary
-
Summary-2
-
Conclusion
-
-
6
Spark Streaming
-
Objectives
-
Introduction to Spark Streaming
-
Working of Spark Streaming
-
Streaming Word Count
-
Micro Batch
-
DStreams
-
DStreams-2
-
Input DStreams and Receivers
-
Input DStreams and Receivers-2
-
Basic Sources
-
Advanced Sources
-
Transformations on DStreams
-
Output Operations on DStreams
-
Design Patterns for Using ForeachRDD
-
Dataframe and SQL Operations
-
Dataframe and SQL Operations-2
-
Checkpointing
-
Enabling Checkpointing
-
Socket Stream
-
File Stream
-
Stateful Operations
-
Window Operations
-
Types of Window Operations
-
Types of Window Operations-2
-
Join Operations - Stream - Dataset Joins
-
Monitoring Spark Streaming Application
-
Performance Tuning - High Level
-
Demo - Capture and Process the Netcat Data
-
Capture and Process the Flume Data
-
Demo - Capture the Twitter Data
-
Capture the Twitter Data
-
Summary
-
Summary-2
-
Conclusion
-
-
7
Spark ML Programming
-
Objectives
-
Introduction to Machine Learning
-
Applications of Machine Learning
-
Machine Learning in Spark
-
Dataframes
-
Transformers and Estimators
-
Pipeline
-
Working of a Pipeline
-
Working of a Pipeline-2
-
Dag Pipelines
-
Runtime Checking
-
Parameter Passing
-
General Machine Learning Pipeline - Example
-
Model Selection via Cross - Validation
-
Supported Types, Algorithms and Utilities
-
Data Types
-
Feature Extraction and Basic Statistics
-
Clustering
-
K - Means
-
K - Means_1
-
K - Means_2
-
Demo - Perform Clustering Using K - Means
-
Perform Clustering Using K - Means_1
-
Gaussian Mixture
-
Power Iteration Clustering
-
Latent Dirichlet Allocation
-
Latent Dirichlet Allocation-2
-
Collaborative Filtering
-
Classification
-
Classification-2
-
Regression
-
Example of Regression
-
Demo - Perform Classification Using Linear Regression
-
Perform Classification Using Linear Regression
-
Demo - Run Linear Regression
-
Run Linear Regression
-
Demo - Perform Recommendation Using Collaborative Filtering
-
Perform Recommendation Using Collaborative Filtering
-
Demo - Run Recommendation System
-
Run Recommendation System
-
Summary
-
Summary-2
-
Conclusion
-
-
8
Spark Graphx Programming
-
Objectives
-
Introduction to Graph - Parallel System
-
Limitations of Graph Parallel System
-
Introduction to GraphX
-
Introduction to GraphX-2
-
Importing GraphX
-
The Property Graph
-
The Property Graph-2
-
Creating a Graph
-
Demo - Create a Graph Using GraphX
-
Create a Graph Using GraphX
-
Triplet View
-
Graph Operators
-
List of Operators
-
List of Operators-2
-
Property Operators
-
Structural Operators
-
Subgraphs
-
Join Operators
-
Perform Graph Operations Using GraphX
-
Perform Graph Operations Using Graphx-1
-
Demo - Perform Subgraph Operations
-
Perform Subgraph Operations-1
-
Neighborhood Aggregation
-
Map Reduce Triplets
-
Demo - Perform Map Reduce Operations
-
Perform Map Reduce Operations-1
-
Counting Degree of Vertex
-
Collecting Neighbors
-
Caching and Uncaching
-
Vertex and Edge RDDs
-
Graph System Optimizations_1
-
Summary
-
Summary-1
-
8.35 Conclusion
-