Stream processing frameworks for big data: the internals

Focused View

Giselle van Dongen

3:08:49

77 View

1 - Introduction

1 - Introduction.mp4

03:02

2 - Course overview.mp4

02:21

2 - General characteristics

3 - Overview.mp4

01:40

4 - Stream processing and distributed processing.mp4

08:52

5 - Frameworks Flink.mp4

02:15

6 - Frameworks Kafka Streams.mp4

03:28

7 - Frameworks Spark Streaming and Structured Streaming.mp4

04:48

8 - Ecosystem Connectors.mp4

04:47

9 - Ecosystem Batch Processing.mp4

04:50

10 - Ecosystem ML Libraries and Other Libraries.mp4

04:17

11 - Maturity.mp4

03:16

12 - Streaming models.mp4

04:23

3 - APIs

13 - Programming languages.mp4

04:50

14 - API levels.mp4

02:20

15 - Operators.mp4

01:35

16 - Operators Sliding and Tumbling Windows.mp4

04:09

17 - Operators Session and Count Windows.mp4

02:51

18 - Operators Joining.mp4

03:03

19 - Operators Lowlevel Operators.mp4

03:18

20 - Configuration.mp4

01:43

4 - Time

21 - Time characteristics l.mp4

06:07

22 - Time characteristics II.mp4

04:03

23 - Outoforder processing.mp4

05:16

24 - Triggers.mp4

07:19

5 - Performance Latency and throughput

25 - Latency Definition and influence of streaming model.mp4

09:25

26 - Latency influence of operation.mp4

03:56

27 - Latency predictability.mp4

02:51

28 - Throughput.mp4

01:35

29 - General advice.mp4

02:30

6 - Scalability elasticity and parallelization

30 - Scalability.mp4

06:55

31 - Elasticity.mp4

04:50

32 - Parallelization.mp4

02:13

7 - State management

33 - State.mp4

03:07

34 - State backends.mp4

07:26

35 - State features.mp4

01:19

8 - Fault tolerance

36 - Message delivery guarantees.mp4

15:55

37 - Checkpointing.mp4

11:10

38 - Checkpointing savepoints.mp4

02:38

39 - Writeaheadlogs.mp4

01:07

40 - Fault tolerance in Kafka Streams.mp4

01:25

41 - Master and worker failures.mp4

08:27

9 - Summary

42 - Summary.mp4

07:27

Description

A deep dive into the internals of Flink, Spark Streaming, Structured Streaming, and Kafka Streams

What You'll Learn?

The features and internals of Flink, Spark Streaming, Structured Streaming and Kafka Streams.
How to select the right stream processing framework for a use case.
The current state-of-the-art of distributed stream processing.
References to equivalent implementations in all frameworks.
This is not a programming course! This is a course on understanding how these systems work.

Who is this for?

Anybody who needs to get a feeling on how to select the right framework for a use case.

Anybody who wants to build up firm, in-depth knowledge on the differences and characteristics of these frameworks.

Anybody who wants to build up a deep understanding of stream processing in general.

More details

Description
Do you need to use stream processing for your next project but have no idea where to begin? Or do you want to grow into a data engineering role and want to start building up knowledge on stream processing?
In this course, we give a detailed explanation and comparison of several popular stream processing frameworks. At the finish line, you will be able to make a well-grounded selection of the right framework forÂ your use case or to start your learning process. We will cover Flink, Kafka Streams, Spark Streaming and Structured Streaming. These are the four frameworks that are currently the state-of-the-art in the industry.
You will understand their features, characteristics and differences. This course gives you the perfect primer to start learning and better understand the APIs and programming languages behind these frameworks.
This course covers all relevant aspects:
- their general characteristics
- APIs
- latency and throughput performance
- scalability
- elasticity
- fault tolerance
- state management
- deployment
- ...
We will dive deeply into the workings and the advantages and disadvantages of the different mechanisms and approaches.
!!! This course is not a programming course but focuses on more theoretical aspects.
At the end, you will be provided with a concise overview on what was covered.
The content of this course is based on the results of Giselle's PhD work in which she benchmarked and analyzed these frameworks on all these characteristics.Â
Who this course is for:
Anybody who needs to get a feeling on how to select the right framework for a use case.
Anybody who wants to build up firm, in-depth knowledge on the differences and characteristics of these frameworks.
Anybody who wants to build up a deep understanding of stream processing in general.

User Reviews

Rating

average 0

Total votes0

Focused display

Big Data

Giselle van Dongen

Instructor's Courses

Giselle van Dongen did a PhD at Ghent University, teaching and benchmarking real-time distributed processing systems such as Spark Streaming, Flink and Kafka Streams. She studied their performance features (latency, throughput) and their scalability and fault tolerance. She also works as a Lead Data Scientist at Klarrio, focusing on real-time data processing and analytics.

Udemy

View courses Udemy

Students take courses primarily to improve job-related skills.Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.