Companies Home Search Profile

Spark 3 on Google Cloud Platform-Beginner to Advanced Level

Focused View

No Latency

5:36:29

107 View
  • 1. Course Introduction and Overview.mp4
    02:35
  • 2. GitHub repository for the course.html
  • 3. Setup a Trial GCP Account.mp4
    02:24
  • 4. Install and Setup the Gcloud SDK.mp4
    03:09
  • 1. Introduction to Dataproc on GCP.mp4
    02:58
  • 2. Overview of Sparks Architecture.mp4
    02:44
  • 3. Datalake vs Datawarehouse.mp4
    02:39
  • 4. Role of Spark in Big Data Ecosystem.mp4
    04:39
  • 5. Overview of Spark APIs.mp4
    04:02
  • 6. Whats new in Spark3 .mp4
    01:31
  • 7. Should i be learning Spark in 2023.mp4
    01:30
  • 1.1 DataframeAPI-Source-Code.zip
  • 1. Section Introduction.mp4
    00:59
  • 2. Lab - Create a Dataproc Cluster.mp4
    06:10
  • 3. Lab - Walkthrough of Jupyter Notebook and different components.mp4
    02:55
  • 4. Lab- Basic Dataframe Operations in PySpark.mp4
    15:32
  • 5. Lab - Typecasting & timestamp column extraction.mp4
    13:20
  • 6. Labs - Dataframe Aggregations.mp4
    10:00
  • 7. Assignment on Dataframe Aggregations.html
  • 8. Transformations and Actions in Spark.mp4
    02:40
  • 9. Lab - Advanced transformations using Window Functions.mp4
    17:40
  • 10. Lab - Rolling Window Operations.mp4
    10:41
  • 11. Lab - Write transformed data back to a sink GCS Bucket and BigQuery.mp4
    09:42
  • 12. Lab - Use Spark-Submit to submit jobs to dataproc clusters.mp4
    06:34
  • 1.1 Data-For-Joins.zip
  • 1.2 SparkSql-Source-Code.zip
  • 1. Introduction to SparkSql.mp4
    03:07
  • 2. Different Types of Tables in Spark.mp4
    01:49
  • 3. Lab - Create Tables for SparkSql.mp4
    08:35
  • 4. Lab - Analytical Window Functions and creating permanent tables.mp4
    16:27
  • 5.1 Data-For-Joins.zip
  • 5. Lab - Perform Joins on Dataframes.mp4
    13:25
  • 6. What are Partitions in Spark Dataframes.mp4
    02:45
  • 7. Lab - Perform repartitioning of dataframes.mp4
    05:35
  • 8. Data Shuffling in Joins.mp4
    02:33
  • 9. Lab - User defined functions in Spark.mp4
    08:36
  • 1. What is a catalyst optimizer in spark .mp4
    02:58
  • 2. Cache and Persist in Spark.mp4
    04:01
  • 3. What is Autoscaling in spark and dataproc.mp4
    01:54
  • 4. Lab - Apply Autoscaling Policies to Dataproc Clusters.mp4
    05:24
  • 5. Introduction to Dataproc Workflows.mp4
    01:39
  • 6. Lab - Execute GCP Workflows.mp4
    04:51
  • 7. Lab - Cloud Scheduler to automate Workflow Execution.mp4
    08:49
  • 8. What is Checkpointing in Spark.mp4
    01:22
  • 9. What are Broadcast Joins.mp4
    03:05
  • 10. Lab - Setup Alerting Policies for Spark Jobs.mp4
    04:21
  • 1.1 Project-Source-Code.zip
  • 1. Project Introduction.mp4
    01:28
  • 2. Lab - Setup MySql Instance and Database on GCP.mp4
    03:51
  • 3. Lab - Ingest Data into MySql.mp4
    03:46
  • 4. Lab - Setup Dataproc with initialization actions.mp4
    02:53
  • 5. Assignment Lab - Setup Connectivity from PySpark to MySql Db.mp4
    06:33
  • 6. Assignment Lab - Perform transformations using PySpark.mp4
    05:47
  • 7. Lab - Setup Workflows to execute end-to-end pipeline.mp4
    05:52
  • 1.1 Section-Source-Code.zip
  • 1. Section Introduction.mp4
    00:36
  • 2. Overview of PusSub Lite.mp4
    01:52
  • 3. What are Tumbling Windows .mp4
    02:49
  • 4. What is Watermarking.mp4
    03:47
  • 5. What are Sliding Windows.mp4
    02:50
  • 6. Lab - Create PubSub Lite Reservation.mp4
    03:32
  • 7. Lab - Publish Data to PubSub and Testing using PySpark.mp4
    06:45
  • 8. Lab - Implement Tumbling Windows.mp4
    04:59
  • 9. Lab -Implement Tumbling Window with Watermarking.mp4
    03:31
  • 10. Lab- Implement Sliding Windows.mp4
    03:31
  • 1.1 Section-Source-Code.zip
  • 1. Overview of Joining Streaming Dataframe.mp4
    03:12
  • 2. Lab -Join Streaming Dataframe with Static Dataframe.mp4
    03:37
  • 3. Lab - Join 2 Streaming Dataframes.mp4
    05:24
  • 4. Lab - Use Watermarking in Streaming Joins.mp4
    03:39
  • 1.1 Project-Source-Code.zip
  • 1. Overview of the Use Case.mp4
    02:54
  • 2. Lab - Model Training using ML Library and Code Walkthrough.mp4
    06:28
  • 3. Lab - Code Walkthrough and Publish Data.mp4
    04:13
  • 4. Lab - Real Time Product Recommendation Model in Action.mp4
    02:03
  • 1. Introduction and Tips.mp4
    01:11
  • 2. Batch Data Processing Interview Questions - Part 1.mp4
    03:21
  • 3. Batch Data Processing Interview Questions - Part 2.mp4
    03:26
  • 4. Batch Processing Interview Questions - Part 3.mp4
    02:46
  • 5. Real Time Data Processing Interview Questions - Part 1.mp4
    01:37
  • 6. Real Time Data Processing Interview Questions - Part 2.mp4
    02:36
  • Description


    Build Scalable Batch and Real Time Data Processing Pipelines with PySpark and Dataproc

    What You'll Learn?


    • Understand the fundamentals of Apache Spark3, including the architecture and components
    • Develop and Deploy PySpark Jobs to Dataproc on GCP including setting up a cluster and managing resources
    • Gain practical experience in using Spark3 for advanced batch data processing , Machine learning and Real Time analytics
    • Best practices for optimizing Spark3 performance on GCP including Autoscaling , fine tuning and integration with other GCP Components

    Who is this for?


  • Data engineers or data analysts who want to learn how to use Spark3 on the Google Cloud Platform (GCP) for large-scale data processing and analysis
  • Software developers who want to integrate Spark3 into their applications or workflows running on GCP
  • Data scientists who want to leverage Spark3's machine learning capabilities on GCP for building and deploying predictive models
  • Anyone who wants to get started with their cloud journey with Spark 3
  • What You Need to Know?


  • Prior experience in writing basic coding in Python & Sql
  • Basic background on programming and Big Data
  • More details


    Description

    Are you looking to dive into big data processing and analytics with Apache Spark and Google Cloud? This course is designed to help you master PySpark 3.3 and leverage its full potential to process large volumes of data in a distributed environment. You'll learn how to build efficient, scalable, and fault-tolerant data processing jobs by learn how to apply

    • Dataframe transformations with the Dataframe APIs ,

    • SparkSQL

    • Deployment of Spark Jobs as done in real world scenarios

    • Integrating spark jobs with other components on GCP

    • Implementing real time machine learning use-cases by building a product recommendation system.

    This course is intended for data engineers, data analysts, data scientists, and anyone interested in big data processing with Apache Spark and Google Cloud. It is also suitable for students and professionals who want to enhance their skills in big data processing and analytics using PySpark and Google Cloud technologies.

    Why take this course?

    In this course, you'll gain hands-on experience in designing, building, and deploying big data processing pipelines using PySpark on Google Cloud. You'll learn how to process large data sets in parallel in the most practical way without having to install or run anything on your local computer .

    By the end of this course, you'll have the skills and confidence to tackle real-world big data processing problems and deliver high-quality solutions using PySpark and other Google Cloud technologies.

    Whether you're a data engineer, data analyst, or aspiring data scientist, this comprehensive course will equip you with the skills and knowledge to process massive amounts of data using PySpark and Google Cloud.

    Plus, with a final section dedicated to interview questions and tips, you'll be well-prepared to ace your next data engineering or big data interview.


    Who this course is for:

    • Data engineers or data analysts who want to learn how to use Spark3 on the Google Cloud Platform (GCP) for large-scale data processing and analysis
    • Software developers who want to integrate Spark3 into their applications or workflows running on GCP
    • Data scientists who want to leverage Spark3's machine learning capabilities on GCP for building and deploying predictive models
    • Anyone who wants to get started with their cloud journey with Spark 3

    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    I am a Business oriented Data Architect with a vast experience in the field of Software Development,Distributed processing and data engineering on cloud . I have worked on different cloud platforms such as AWS & GCP and also with on-prem hadoop clusters. I also give seminars on Distributed processing using Spark , real time streaming and analytics and best practices for ETL and data governance.I am also a passionate coder ,love writing and building optimal data pipelines for robust data processing and streaming solutions .
    Students take courses primarily to improve job-related skills.Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.
    • language english
    • Training sessions 71
    • duration 5:36:29
    • Release Date 2023/06/23