Companies Home Search Profile

Processing Streaming Data with Apache Spark on Databricks

Focused View

Janani Ravi

2:00:53

33 View
  • 01. Course Overview.mp4
    02:11
  • 02. Prerequisites and Course Outline.mp4
    02:04
  • 03. Batch Processing vs. Stream Processing.mp4
    06:03
  • 04. Micro-batch and Continuous Processing.mp4
    05:01
  • 05. Stream Processing in Apache Spark.mp4
    03:07
  • 06. Continuous Applications in Spark.mp4
    03:51
  • 07. Demo-Reading Batch and Streaming Data.mp4
    05:19
  • 08. Demo-Running a Simple Streaming Query.mp4
    03:17
  • 09. Demo-Processing and Visualizing Streams.mp4
    04:41
  • 10. Triggers.mp4
    04:00
  • 11. Demo-Configuring Triggers.mp4
    04:18
  • 12. Module Summary.mp4
    01:19
  • 13. Streaming Sources and Sinks.mp4
    02:00
  • 14. Auto Loader.mp4
    05:22
  • 15. Demo-Auto Loader and Rescued Data.mp4
    05:40
  • 16. Demo Writing Streams to File Sinks.mp4
    04:31
  • 17. Demo-Performing Transformations on Streams.mp4
    05:25
  • 18. Demo-Stream Processing.mp4
    01:27
  • 19. Output Modes.mp4
    04:33
  • 20. Demo-Append Mode.mp4
    03:01
  • 21. Demo-Complete Mode.mp4
    03:13
  • 22. Demo-Update Mode.mp4
    03:23
  • 23. Demo-Executing SQL Queries to Process Streams.mp4
    06:56
  • 24. Demo-Creating an AWS User and S3 Bucket.mp4
    03:59
  • 25. Demo-Mounting an S3 Bucket to DBFS.mp4
    03:53
  • 26. Demo-Auto Loader to Read from an S3 Bucket Source.mp4
    02:06
  • 27. Demo-Applying UDFs on Streaming Data.mp4
    02:09
  • 28. Checkpointing.mp4
    02:10
  • 29. Demo-Checkpointing.mp4
    07:20
  • 30. Demo-Running a Streaming Job on a Cluster.mp4
    04:42
  • 31. Demo-Viewing Job Results.mp4
    02:26
  • 32. Summary and Further Study.mp4
    01:26
  • Description


    This course will teach you how to use Spark abstractions for streaming data and perform transformations on streaming data using the Spark structured streaming APIs on Azure Databricks.

    What You'll Learn?


      Structured streaming in Apache Spark treats real-time data as a table that is being constantly appended. This leads to a stream processing model that uses the same APIs as a batch processing model - it is up to Spark to incrementalize our batch operations to work on the stream. The burden of stream processing shifts from the user to the system, making it very easy and intuitive to process streaming data with Spark.

      In this course, Processing Streaming Data with Apache Spark on Databricks, you’ll learn to stream and process data using abstractions provided by Spark structured streaming. First, you’ll understand the difference between batch processing and stream processing and see the different models that can be used to process streaming data. You will also explore the structure and configurations of the Spark structured streaming APIs.

      Next, you will learn how to read from a streaming source using Auto Loader on Azure Databricks. Auto Loader automates the process of reading streaming data from a file system, and takes care of the file management and tracking of processed files making it very easy to ingest data from external cloud storage sources. You will then perform transformations and aggregations on streaming data and write data out to storage using the append, complete, and update models.

      Finally, you will learn how to use SQL-like abstractions on input streams. You will connect to an external cloud storage source, an Amazon S3 bucket, and read in your stream using Auto Loader. You will then run SQL queries to process your data. Along the way, you will make your stream processing resilient to failures using checkpointing and you will also implement your stream processing operation as a job on a Databricks Job Cluster.

      When you’re finished with this course, you’ll have the skills and knowledge of streaming data in Spark needed to process and monitor streams and identify use-cases for transformations on streaming data.

    More details


    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing high-quality content for technical skill development. Loonycorn is working on developing an engine (patent filed) to automate animations for presentations and educational content.
    Pluralsight, LLC is an American privately held online education company that offers a variety of video training courses for software developers, IT administrators, and creative professionals through its website. Founded in 2004 by Aaron Skonnard, Keith Brown, Fritz Onion, and Bill Williams, the company has its headquarters in Farmington, Utah. As of July 2018, it uses more than 1,400 subject-matter experts as authors, and offers more than 7,000 courses in its catalog. Since first moving its courses online in 2007, the company has expanded, developing a full enterprise platform, and adding skills assessment modules.
    • language english
    • Training sessions 32
    • duration 2:00:53
    • level average
    • Release Date 2023/12/15