Apache Spark and Scala

Course curriculum

1. Course Objectives
2. Target Audience
3. Course Prerequisites
4. Value to the Professionals
5. Value to the Professionals- 2
6. Value to the Professionals- 3
7. Lessons Covered
8. Conclusion
1. Objectives
2. Need of New Generation Distributed Systems
3. Limitations of Map reduce in Hadoop
4. Limitations of Map reduce in Hadoop-2
5. Batch vs Real-Time Processing
6. Application of Stream Processing
7. Application of In-Memory Processing
8. Introduction to Apache Spark
9. History of Spark
10. Language Flexibility in Spark
11. Spark Execution Architecture
12. Automatic Parallelization of Complex Flows
13. Automatic Parallelization of Complex Flows-Important Points
14. Apis That Match User Goals
15. Apache Spark- A Unified Platform of Big Data Apps
16. More Benefits of Apache Spark
17. Running Spark in Different Modes
18. Installing Spark as a Standalone Cluster - Configuration
19. Demo - Install Apache Spark
20. Overview of Spark on a Cluster
21. Demo-Install Apache Spark-1
22. Tasks of Spark on a Cluster
23. Companies Using Spark - Use Cases
24. Hadoop Ecosystem vs Apache Spark
25. Hadoop Ecosystem vs Apache Spark-2
26. Summary
27. Summary-2
28. Conclusion
1. Objectives
2. Introduction to Scala
3. Basic Data Types
4. Basic Literals
5. Basic Literals-2
6. Basic Literals-3
7. Introduction to Operators
8. Use Basic Literals and the Arithmetic Operator
9. Demo Use Basic Literals and the Arithmetic Operator
10. Use the Logical Operator
11. Demo Use the Logical Operator
12. Introduction to Type Inference
13. Type Inference for Recursive Methods
14. Type Inference for Polymorphic Methods and Generic Classes
15. Unreliability on Type Inference Mechanism
16. Mutable Collection vs Immutable Collection
17. Functions
18. Anonymous Functions
19. Objects
20. Classes
21. Use Type Inference, Functions, Anonymous Function and Class
22. Demo Use Type Inference, Functions, Anonymous Function and Class
23. Traits as Interfaces
24. Traits - Example
25. Collections
26. Types of Collections
27. Types of Collections-2
28. Lists
29. Perform Operations on Lists
30. Demo Use Data Structures
31. Maps
32. Pattern Matching
33. Implicits
34. 3.34 Implicits-2
35. Streams
36. Use Data Structures
37. Demo Perform Operations on Lists
38. Summary
39. Summary-2
40. Conclusion
1. Objectives
2. RDDS API
3. Creating RDDS
4. Creating RDDS Referencing an External Dataset
5. Referencing an External Dataset Text Files
6. Referencing an External Dataset Text Files-2
7. Referencing an External Dataset Sequence Files
8. Referencing an External Dataset other Hadoop Input Formats
9. Creating RDDS - Important Points
10. RDDS Operations
11. RDD Operations - Transformations
12. Features of RDD Persistence
13. Storage Levels of RDD Persistence
14. Invoking the Spark Shell
15. Importing Spark Classes
16. Creating the Spark context
17. Creating the Spark Context
18. Loading a File in Shell
19. Performing Some Basic Operations on Files in Spark Shell RDDS
20. Packaging a Spark Project With SBT
21. Running a Spark Project with SBT
22. Demo - Build a Scala Project
23. Build A Scala Project-1
24. Demo - Build a Spark Java Project
25. Build A Spark Java Project-1
26. Shared Variables - Broadcast
27. Shared Variables - Accumulators
28. Writing a Scala Application
29. Demo - Run a Scala Application
30. Run a Scala Application
31. Write a Scala Application Reading the Hadoop Data
32. Write a Scala Application Reading the Hadoop Data
33. Demo - Run a Scala Application Reading the Hadoop Data
34. Run a Scala Application Reading the Hadoop Data
35. DoubleRDD Methods
36. PairRDD Methods- Join
37. PairRDD Methods- Others
38. JavaPairRDD Methods
39. JavaPairRDD Methods-2
40. General RDD Methods
41. General RDD Methods-2
42. Java RDD Methods
43. Common Java RDD Methods
44. Spark Java Function Classes
45. Method for Combining JavaPairRDD Functions
46. Transformations in RDD
47. Other Methods
48. Actions in RDD
49. Key-value Pair RDD in Scala
50. Key-value Pair RDD in Java
51. Using Mapreduce and Pair RDD Operations
52. Reading Text File from HDFS
53. Reading Sequence File from HDFS
54. Writing Text Data to HDFS.mp4
55. Writing Sequence File to HDFS
56. Using Groupby
57. Using Groupby-2
58. Demo - Run a Scala Application Performing Groupby Operation
59. Run A Scala Application Performing Groupby Operation-1
60. Demo - Write and Run a Java Application
61. Write and Run a Java Application
62. Summary
63. Summary-2
64. Conclusion
1. Objectives
2. Importance of Spark SQL
3. Benefits of Spark SQL
4. Dataframes
5. SQLContext
6. SQL Context-2
7. Creating a Dataframe
8. Using Dataframe Operations
9. Using Dataframe Operations-2
10. Demo - Run SparkSQL with a Dataframe
11. Run Spark SQL Programmatically-1
12. Save Modes
13. Saving to Persistent Tables
14. Parquet Files
15. Partition Discovery
16. Schema Merging
17. JSON Data
18. Hive Table
19. DML Operation - Hive Queries
20. Demo - Run Hive Queries Using Spark SQL
21. JDBC to other Databases
22. Supported Hive Features
23. Supported Hive Features-2
24. Supported Hive Data Types
25. Case Classes
26. Case Classes-2
27. Summary
28. Summary-2
29. Conclusion
1. Objectives
2. Introduction to Spark Streaming
3. Working of Spark Streaming
4. Streaming Word Count
5. Micro Batch
6. DStreams
7. DStreams-2
8. Input DStreams and Receivers
9. Input DStreams and Receivers-2
10. Basic Sources
11. Advanced Sources
12. Transformations on DStreams
13. Output Operations on DStreams
14. Design Patterns for Using ForeachRDD
15. Dataframe and SQL Operations
16. Dataframe and SQL Operations-2
17. Checkpointing
18. Enabling Checkpointing
19. Socket Stream
20. File Stream
21. Stateful Operations
22. Window Operations
23. Types of Window Operations
24. Types of Window Operations-2
25. Join Operations - Stream - Dataset Joins
26. Monitoring Spark Streaming Application
27. Performance Tuning - High Level
28. Demo - Capture and Process the Netcat Data
29. Capture and Process the Flume Data
30. Demo - Capture the Twitter Data
31. Capture the Twitter Data
32. Summary
33. Summary-2
34. Conclusion
1. Objectives
2. Introduction to Machine Learning
3. Applications of Machine Learning
4. Machine Learning in Spark
5. Dataframes
6. Transformers and Estimators
7. Pipeline
8. Working of a Pipeline
9. Working of a Pipeline-2
10. Dag Pipelines
11. Runtime Checking
12. Parameter Passing
13. General Machine Learning Pipeline - Example
14. Model Selection via Cross - Validation
15. Supported Types, Algorithms and Utilities
16. Data Types
17. Feature Extraction and Basic Statistics
18. Clustering
19. K - Means
20. K - Means_1
21. K - Means_2
22. Demo - Perform Clustering Using K - Means
23. Perform Clustering Using K - Means_1
24. Gaussian Mixture
25. Power Iteration Clustering
26. Latent Dirichlet Allocation
27. Latent Dirichlet Allocation-2
28. Collaborative Filtering
29. Classification
30. Classification-2
31. Regression
32. Example of Regression
33. Demo - Perform Classification Using Linear Regression
34. Perform Classification Using Linear Regression
35. Demo - Run Linear Regression
36. Run Linear Regression
37. Demo - Perform Recommendation Using Collaborative Filtering
38. Perform Recommendation Using Collaborative Filtering
39. Demo - Run Recommendation System
40. Run Recommendation System
41. Summary
42. Summary-2
43. Conclusion
1. Objectives
2. Introduction to Graph - Parallel System
3. Limitations of Graph Parallel System
4. Introduction to GraphX
5. Introduction to GraphX-2
6. Importing GraphX
7. The Property Graph
8. The Property Graph-2
9. Creating a Graph
10. Demo - Create a Graph Using GraphX
11. Create a Graph Using GraphX
12. Triplet View
13. Graph Operators
14. List of Operators
15. List of Operators-2
16. Property Operators
17. Structural Operators
18. Subgraphs
19. Join Operators
20. Perform Graph Operations Using GraphX
21. Perform Graph Operations Using Graphx-1
22. Demo - Perform Subgraph Operations
23. Perform Subgraph Operations-1
24. Neighborhood Aggregation
25. Map Reduce Triplets
26. Demo - Perform Map Reduce Operations
27. Perform Map Reduce Operations-1
28. Counting Degree of Vertex
29. Collecting Neighbors
30. Caching and Uncaching
31. Vertex and Edge RDDs
32. Graph System Optimizations_1
33. Summary
34. Summary-1
35. 8.35 Conclusion

About this course

Free
281 lessons
4 hours of video content

START

Apache Spark and Scala

Course curriculum

Apache Spark and Scala- Overview

Introduction to Spark

Introduction to Programming in Scala

Using RDD for Creating Applications in Spark

Running SQL queries using SparkSQL

Spark Streaming

Spark ML Programming

Spark Graphx Programming

About this course