Conceptualizing the Processing Model for Azure Databricks Service

Focused View

Mohit Batra

2:51:11

16 View

01 - Course Overview

01 - Course Overview.mp4

01:38

02 - Getting Started with Structured Streaming on Azure Databricks#

02 - Module Overview.mp4

01:50

03 - Course Outline.mp4

01:52

04 - Modern Data Pipelines on Databricks.mp4

08:09

05 - Spark 101.mp4

05:49

06 - Structured Streaming Processing Model.mp4

09:08

07 - What Is Databricks.mp4

09:55

08 - What Is Azure Databricks.mp4

03:45

09 - Summary.mp4

01:37

03 - Setting up Databricks Environment

10 - Module Overview.mp4

00:47

11 - Setting up Workspace.mp4

03:38

12 - Creating Cluster.mp4

06:56

13 - Understanding Cluster Pools and Autoscaling.mp4

05:35

14 - Working with Notebook.mp4

04:11

15 - Configuring Security.mp4

03:01

16 - Scenario Walkthrough.mp4

02:14

17 - Summary.mp4

01:16

04 - Configuring Source and Sink Stores

18 - Module Overview.mp4

00:47

19 - Structured Streaming Fault Tolerance.mp4

04:02

20 - Source and Sink Options.mp4

04:07

21 - Setup Azure Event Hubs and Get Maven Coordinates.mp4

03:41

22 - Source - Configure Azure Event Hubs Using Databricks Libraries.mp4

05:19

23 - Sink - Mount Azure Storage Services to DBFS.mp4

06:55

24 - Setup Sample App to Send NYC Taxi Events.mp4

03:20

25 - Summary.mp4

01:14

05 - Building Streaming Pipeline Using Structured Streaming

26 - Module Overview.mp4

00:55

27 - Extract and Process Source Data.mp4

08:38

28 - Load Data to Files.mp4

05:19

29 - Working with Spark SQL and Visualizing Data.mp4

04:55

30 - Summary.mp4

01:32

06 - Making Streaming Pipeline Production Ready

31 - Module Overview.mp4

00:39

32 - Parameterize Streaming Pipeline.mp4

02:31

33 - Scheduling with Databricks Jobs.mp4

04:40

34 - Best Practices.mp4

04:59

35 - Summary.mp4

01:25

07 - Understanding Pricing, Workloads, and Competition

36 - Module Overview.mp4

00:35

37 - Workloads, Tiers, and Pricing.mp4

07:32

38 - Comparison with Other Streaming Services.mp4

06:14

39 - Summary.mp4

01:22

08 - Customizing the Cluster

40 - Module Overview.mp4

00:46

41 - Working with Initialization Scripts.mp4

04:47

42 - Understand Databricks Container Services.mp4

05:52

43 - Build and Deploy Custom Docker Image on Cluster.mp4

05:57

44 - Summary.mp4

01:47

Description

In this course, you will learn about the Spark based Azure Databricks platform. You will see how Spark Structured Streaming processing model works, and then use it to build end-to-end production ready streaming pipeline on Azure Databricks platform.

What You'll Learn?

Modern data pipelines often include streaming data, that needs to be processed in real-time. While Apache Spark is very popular for big data processing and can help us build reliable streaming pipelines, managing the Spark environment is no cakewalk.

In this course, Conceptualizing the Processing Model for Azure Databricks Service, you will learn how to use Spark Structured Streaming on Databricks platform, which is running on Microsoft Azure, and leverage its features to build an end-to-end streaming pipeline quickly and reliably. And all this while learning about collaboration options and optimizations that it brings, but without worrying about the infrastructure management.

First, you will learn about the processing model of Spark Structured Streaming, about the Databricks platform and features, and how it is runs on Microsoft Azure.

Next, you will see how to setup the environment, like workspace, clusters, and security; configure streaming sources and sinks, and see how Structured Streaming fault tolerance works.

Followed by this, you will learn how to build each phase of streaming pipeline, by extracting the data from source, transforming it, and loading it in a sink. And then make it production ready, and run it using Databricks jobs.

You will also see, how to customize the cluster using Initialization scripts and Docker containers, to suit your business requirements.

Finally, you will explore other aspects. You will see what are the different workloads available, and how pricing works. We will also talk about best practices, in terms of development, performance, stability and cost. And lastly, you will see how Spark Structured Streaming on Azure Databricks compares to other managed services, like Flink on AWS, Azure Stream Analytics, Beam on Google Cloud etc.

By the end of this course, you will have the skills and knowledge of Azure Databricks platform needed to build an end-to-end streaming pipeline, using Spark Structured streaming.

More details

User Reviews

Rating

average 0

Total votes0

Focused display

Data Science

Microsoft Azure

Mohit Batra

Instructor's Courses

Mohit is a Data Engineer, a Microsoft Certified Trainer (MCT) and a consultant. Mohit has 15+ years of extensive experience in architecting large scale Business Intelligence, Data Warehousing and Big Data solutions with companies like Microsoft and some leading investment banks. As an expert in his field, Mohit has often shared his knowledge in Azure, Spark, SQL Server and Power BI at various public forums and as a corporate trainer. Mohit truly loves to teach and enjoys producing high-quality, engaging learning materials for his sessions. In his free time, Mohit loves to read, enjoys photography and music.

Pluralsight

View courses Pluralsight

Pluralsight, LLC is an American privately held online education company that offers a variety of video training courses for software developers, IT administrators, and creative professionals through its website. Founded in 2004 by Aaron Skonnard, Keith Brown, Fritz Onion, and Bill Williams, the company has its headquarters in Farmington, Utah. As of July 2018, it uses more than 1,400 subject-matter experts as authors, and offers more than 7,000 courses in its catalog. Since first moving its courses online in 2007, the company has expanded, developing a full enterprise platform, and adding skills assessment modules.