Spark 3 on Google Cloud Platform-Beginner to Advanced Level

Focused View

No Latency

5:36:29

107 View

1. Introduction

1. Course Introduction and Overview.mp4

02:35

2. GitHub repository for the course.html

3. Setup a Trial GCP Account.mp4

02:24

4. Install and Setup the Gcloud SDK.mp4

03:09

2. Getting Started with Spark Fundamentals

1. Introduction to Dataproc on GCP.mp4

02:58

2. Overview of Sparks Architecture.mp4

02:44

3. Datalake vs Datawarehouse.mp4

02:39

4. Role of Spark in Big Data Ecosystem.mp4

04:39

5. Overview of Spark APIs.mp4

04:02

6. Whats new in Spark3 .mp4

01:31

7. Should i be learning Spark in 2023.mp4

01:30

3. Getting started with Spark DataFrame API

1.1 DataframeAPI-Source-Code.zip

1. Section Introduction.mp4

00:59

2. Lab - Create a Dataproc Cluster.mp4

06:10

3. Lab - Walkthrough of Jupyter Notebook and different components.mp4

02:55

4. Lab- Basic Dataframe Operations in PySpark.mp4

15:32

5. Lab - Typecasting & timestamp column extraction.mp4

13:20

6. Labs - Dataframe Aggregations.mp4

10:00

7. Assignment on Dataframe Aggregations.html

8. Transformations and Actions in Spark.mp4

02:40

9. Lab - Advanced transformations using Window Functions.mp4

17:40

10. Lab - Rolling Window Operations.mp4

10:41

11. Lab - Write transformed data back to a sink GCS Bucket and BigQuery.mp4

09:42

12. Lab - Use Spark-Submit to submit jobs to dataproc clusters.mp4

06:34

4. Getting started with SparkSql in Spark3

1.1 Data-For-Joins.zip

1.2 SparkSql-Source-Code.zip

1. Introduction to SparkSql.mp4

03:07

2. Different Types of Tables in Spark.mp4

01:49

3. Lab - Create Tables for SparkSql.mp4

08:35

4. Lab - Analytical Window Functions and creating permanent tables.mp4

16:27

5.1 Data-For-Joins.zip

5. Lab - Perform Joins on Dataframes.mp4

13:25

6. What are Partitions in Spark Dataframes.mp4

02:45

7. Lab - Perform repartitioning of dataframes.mp4

05:35

8. Data Shuffling in Joins.mp4

02:33

9. Lab - User defined functions in Spark.mp4

08:36

5. Spark Concepts - Autoscaling , Optimization and Alerting

1. What is a catalyst optimizer in spark .mp4

02:58

2. Cache and Persist in Spark.mp4

04:01

3. What is Autoscaling in spark and dataproc.mp4

01:54

4. Lab - Apply Autoscaling Policies to Dataproc Clusters.mp4

05:24

5. Introduction to Dataproc Workflows.mp4

01:39

6. Lab - Execute GCP Workflows.mp4

04:51

7. Lab - Cloud Scheduler to automate Workflow Execution.mp4

08:49

8. What is Checkpointing in Spark.mp4

01:22

9. What are Broadcast Joins.mp4

03:05

10. Lab - Setup Alerting Policies for Spark Jobs.mp4

04:21

6. Project - End to End Batch processing pipeline using Spark

1.1 Project-Source-Code.zip

1. Project Introduction.mp4

01:28

2. Lab - Setup MySql Instance and Database on GCP.mp4

03:51

3. Lab - Ingest Data into MySql.mp4

03:46

4. Lab - Setup Dataproc with initialization actions.mp4

02:53

5. Assignment Lab - Setup Connectivity from PySpark to MySql Db.mp4

06:33

6. Assignment Lab - Perform transformations using PySpark.mp4

05:47

7. Lab - Setup Workflows to execute end-to-end pipeline.mp4

05:52

7. Real Time Analytics With Spark Structured Streaming

1.1 Section-Source-Code.zip

1. Section Introduction.mp4

00:36

2. Overview of PusSub Lite.mp4

01:52

3. What are Tumbling Windows .mp4

02:49

4. What is Watermarking.mp4

03:47

5. What are Sliding Windows.mp4

02:50

6. Lab - Create PubSub Lite Reservation.mp4

03:32

7. Lab - Publish Data to PubSub and Testing using PySpark.mp4

06:45

8. Lab - Implement Tumbling Windows.mp4

04:59

9. Lab -Implement Tumbling Window with Watermarking.mp4

03:31

10. Lab- Implement Sliding Windows.mp4

03:31

8. Joins on Streaming Data

1.1 Section-Source-Code.zip

1. Overview of Joining Streaming Dataframe.mp4

03:12

2. Lab -Join Streaming Dataframe with Static Dataframe.mp4

03:37

3. Lab - Join 2 Streaming Dataframes.mp4

05:24

4. Lab - Use Watermarking in Streaming Joins.mp4

03:39

9. Real Time Collaborative Filtering Project

1.1 Project-Source-Code.zip

1. Overview of the Use Case.mp4

02:54

2. Lab - Model Training using ML Library and Code Walkthrough.mp4

06:28

3. Lab - Code Walkthrough and Publish Data.mp4

04:13

4. Lab - Real Time Product Recommendation Model in Action.mp4

02:03

10. Prep Up for the Interview Questions on Spark

1. Introduction and Tips.mp4

01:11

2. Batch Data Processing Interview Questions - Part 1.mp4

03:21

3. Batch Data Processing Interview Questions - Part 2.mp4

03:26

4. Batch Processing Interview Questions - Part 3.mp4

02:46

5. Real Time Data Processing Interview Questions - Part 1.mp4

01:37

6. Real Time Data Processing Interview Questions - Part 2.mp4

02:36

Description

Build Scalable Batch and Real Time Data Processing Pipelines with PySpark and Dataproc

What You'll Learn?

Understand the fundamentals of Apache Spark3, including the architecture and components
Develop and Deploy PySpark Jobs to Dataproc on GCP including setting up a cluster and managing resources
Gain practical experience in using Spark3 for advanced batch data processing , Machine learning and Real Time analytics
Best practices for optimizing Spark3 performance on GCP including Autoscaling , fine tuning and integration with other GCP Components

Who is this for?

Data engineers or data analysts who want to learn how to use Spark3 on the Google Cloud Platform (GCP) for large-scale data processing and analysis

Software developers who want to integrate Spark3 into their applications or workflows running on GCP

Data scientists who want to leverage Spark3's machine learning capabilities on GCP for building and deploying predictive models

Anyone who wants to get started with their cloud journey with Spark 3

What You Need to Know?

Prior experience in writing basic coding in Python & Sql

Basic background on programming and Big Data

More details

Description
Are you looking to dive into big data processing and analytics with Apache Spark and Google Cloud? This course is designed to help you master PySpark 3.3 and leverage its full potential to process large volumes of data in a distributed environment. You'll learn how to build efficient, scalable, and fault-tolerant data processing jobs by learn how to apply
Dataframe transformations with the Dataframe APIs ,
SparkSQL
Deployment of Spark Jobs as done in real world scenarios
Integrating spark jobs with other components on GCP
Implementing real time machine learning use-cases by building a product recommendation system.
This course is intended for data engineers, data analysts, data scientists, and anyone interested in big data processing with Apache Spark and Google Cloud. It is also suitable for students and professionals who want to enhance their skills in big data processing and analytics using PySpark and Google Cloud technologies.
Why take this course?
In this course, you'll gain hands-on experience in designing, building, and deploying big data processing pipelines using PySpark on Google Cloud. You'll learn how to process large data sets in parallel in the most practical way without having to install or run anything on your local computer .
By the end of this course, you'll have the skills and confidence to tackle real-world big data processing problems and deliver high-quality solutions using PySpark and other Google Cloud technologies.
Whether you're a data engineer, data analyst, or aspiring data scientist, this comprehensive course will equip you with the skills and knowledge to process massive amounts of data using PySpark and Google Cloud.
Plus, with a final section dedicated to interview questions and tips, you'll be well-prepared to ace your next data engineering or big data interview.

Who this course is for:
Data engineers or data analysts who want to learn how to use Spark3 on the Google Cloud Platform (GCP) for large-scale data processing and analysis
Software developers who want to integrate Spark3 into their applications or workflows running on GCP
Data scientists who want to leverage Spark3's machine learning capabilities on GCP for building and deploying predictive models
Anyone who wants to get started with their cloud journey with Spark 3

User Reviews

Rating

average 0

Total votes0

Focused display

I am a Business oriented Data Architect with a vast experience in the field of Software Development,Distributed processing and data engineering on cloud . I have worked on different cloud platforms such as AWS & GCP and also with on-prem hadoop clusters. I also give seminars on Distributed processing using Spark , real time streaming and analytics and best practices for ETL and data governance.I am also a passionate coder ,love writing and building optimal data pipelines for robust data processing and streaming solutions .

Udemy

View courses Udemy

Students take courses primarily to improve job-related skills.Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.