Companies Home Search Profile

Apache Spark 3 Fundamentals

Focused View

Mohit Batra

6:18:41

82 View
  • 1. Course Trailer.mp4
    01:46
  • 1. Introduction and Course Outline.mp4
    02:05
  • 2. Version Check.mp4
    00:35
  • 3. Need for Apache Spark.mp4
    06:19
  • 4. Understanding Spark Architecture and Ecosystem.mp4
    04:44
  • 5. How Execution Happens in Spark.mp4
    11:20
  • 6. Spark APIs RDDs, DataFrames and Datasets.mp4
    03:24
  • 7. Summary.mp4
    01:55
  • 01. Module Overview.mp4
    01:05
  • 02. Understanding Spark Environments.mp4
    05:30
  • 03. Installing Spark.mp4
    08:07
  • 04. Monitoring Spark with Web UI.mp4
    02:04
  • 05. Option 1 - Running Spark in Command Line.mp4
    03:43
  • 06. Option 2 - Running Spark with Jupyter Notebooks.mp4
    05:21
  • 07. Option 3 - Creating Project with PyCharm IDE.mp4
    03:22
  • 08. Option 4 - Running Jobs with Spark Submit.mp4
    03:17
  • 09. Setting Up Multi-Node Cluster.mp4
    04:45
  • 10. Summary.mp4
    01:31
  • 1. Module Overview.mp4
    00:50
  • 2. Understanding RDDs.mp4
    07:40
  • 3. Creating RDDs.mp4
    07:28
  • 4. Working with Pair RDDs.mp4
    06:53
  • 5. Applying Operations on RDDs.mp4
    06:36
  • 6. Using Narrow Transformations.mp4
    04:24
  • 7. Wide Transformations and Data Shuffling.mp4
    08:39
  • 8. Spark Application Concepts - Jobs, Stages and Tasks.mp4
    08:58
  • 9. Summary.mp4
    02:14
  • 1. Module Overview.mp4
    00:55
  • 2. Understanding DataFrames.mp4
    07:35
  • 3. Creating DataFrames.mp4
    06:25
  • 4. Applying Schemas.mp4
    05:23
  • 5. Analyzing and Cleaning Data.mp4
    07:04
  • 6. Applying Transformations.mp4
    11:01
  • 7. Handling Corrupt Data.mp4
    03:42
  • 8. Saving Processed Data to Files.mp4
    06:26
  • 9. Summary.mp4
    01:58
  • 1. Module Overview.mp4
    00:46
  • 2. Running SQL Queries on DataFrames.mp4
    05:21
  • 3. Working with Spark Tables.mp4
    07:03
  • 4. Working with User Defined Functions (UDFs).mp4
    03:53
  • 5. Performing Operations on Multiple Datasets.mp4
    08:08
  • 6. Performing Window Operations.mp4
    07:02
  • 7. Summary.mp4
    01:50
  • 01. Module Overview.mp4
    00:56
  • 02. Working with Spark Partitions.mp4
    08:24
  • 03. Changing DataFrame Partitions.mp4
    05:30
  • 04. Memory Management.mp4
    05:56
  • 05. Persisting Data.mp4
    06:07
  • 06. Spark Join Strategies and Broadcast Joins.mp4
    06:17
  • 07. Optimizing Shuffle Sort Join with Bucketing.mp4
    05:03
  • 08. Dynamic Resource Allocation.mp4
    07:16
  • 09. Resource Allocation Using Fair Scheduling.mp4
    02:49
  • 10. Summary.mp4
    01:56
  • 1. Introduction to Apache Spark 3.mp4
    04:10
  • 2. Adaptive Query Execution - Dynamic Coalescing.mp4
    05:51
  • 3. Adaptive Query Execution - Dynamic Join.mp4
    07:09
  • 4. Adaptive Query Execution - Handling Skew.mp4
    07:38
  • 5. Dynamic Partition Pruning.mp4
    07:35
  • 6. Summary.mp4
    02:13
  • 01. Module Overview.mp4
    00:50
  • 02. Need for Delta Lake with Spark.mp4
    08:57
  • 03. How Delta Lake Works.mp4
    07:23
  • 04. ACID Guarantees on Delta Lake.mp4
    03:49
  • 05. Creating Delta Tables.mp4
    07:24
  • 06. Inserting Data to Delta Table.mp4
    03:06
  • 07. Performing DML Operations.mp4
    06:07
  • 08. Applying Table Constraints.mp4
    02:52
  • 09. Accessing Data with Time Travel.mp4
    04:54
  • 10. Summary.mp4
    01:46
  • 1. Module Overview.mp4
    00:33
  • 2. Understanding Streaming in Spark.mp4
    05:42
  • 3. Structured Streaming Processing Model.mp4
    07:19
  • 4. Extracting Streaming Data from Source.mp4
    05:53
  • 5. Transforming and Loading Data.mp4
    05:37
  • 6. Summary.mp4
    01:22
  • 1. Module Overview.mp4
    00:30
  • 2. Using Spark in Databricks.mp4
    05:03
  • 3. Using Spark in Azure Synapse Analytics.mp4
    04:16
  • 4. Summary.mp4
    01:21
  • Description


    Learn the Fundamentals of Apache Spark 3: process data, set up the environment, use RDDs & DataFrames, optimize apps, build pipelines with Databricks and Azure Synapse. Familiarize yourself with Spark's ecosystem here in this course.

    What You'll Learn?


      Apache Spark is one of the most widely used analytics engines. It performs distributed data processing and can handle petabytes of data. Spark can work with a variety of data formats, process data at high speeds, and support multiple use cases. Version 3 of Spark brings a whole new set of features and optimizations. In this course, Apache Spark 3 Fundamentals, you'll learn how Apache Spark can be used to process large volumes of data, whether batch or streaming data, and about the growing ecosystem of Spark. First, you'll learn what Apache Spark is, its architecture, and its execution model. You'll then see how to set up the Spark environment. Next, you'll learn about two Spark APIs – RDDs and DataFrames – and see how to use them to extract, analyze, clean, and transform batch data. Then, you'll learn various techniques to optimize your Spark applications, as well as the new optimization features of Apache Spark 3. After that, you'll see how to reliably store data in a Data Lake using the Delta Lake format and build streaming pipelines with Spark. Finally, you'll see how to use Spark in cloud services like Databricks and Azure Synapse Analytics. By the end of this course, you'll have the knowledge and skills to work with Apache Spark and use its capabilities and ecosystem to build large-scale data processing pipelines. So, let's get started!

    More details


    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    Category
    Mohit is a Data Engineer, a Microsoft Certified Trainer (MCT) and a consultant. Mohit has 15+ years of extensive experience in architecting large scale Business Intelligence, Data Warehousing and Big Data solutions with companies like Microsoft and some leading investment banks. As an expert in his field, Mohit has often shared his knowledge in Azure, Spark, SQL Server and Power BI at various public forums and as a corporate trainer. Mohit truly loves to teach and enjoys producing high-quality, engaging learning materials for his sessions. In his free time, Mohit loves to read, enjoys photography and music.
    Pluralsight, LLC is an American privately held online education company that offers a variety of video training courses for software developers, IT administrators, and creative professionals through its website. Founded in 2004 by Aaron Skonnard, Keith Brown, Fritz Onion, and Bill Williams, the company has its headquarters in Farmington, Utah. As of July 2018, it uses more than 1,400 subject-matter experts as authors, and offers more than 7,000 courses in its catalog. Since first moving its courses online in 2007, the company has expanded, developing a full enterprise platform, and adding skills assessment modules.
    • language english
    • Training sessions 79
    • duration 6:18:41
    • level preliminary
    • English subtitles has
    • Release Date 2023/05/09