Companies Home Search Profile

Stream processing frameworks for big data: the internals

Focused View

Giselle van Dongen

3:08:49

77 View
  • 1 - Introduction.mp4
    03:02
  • 2 - Course overview.mp4
    02:21
  • 3 - Overview.mp4
    01:40
  • 4 - Stream processing and distributed processing.mp4
    08:52
  • 5 - Frameworks Flink.mp4
    02:15
  • 6 - Frameworks Kafka Streams.mp4
    03:28
  • 7 - Frameworks Spark Streaming and Structured Streaming.mp4
    04:48
  • 8 - Ecosystem Connectors.mp4
    04:47
  • 9 - Ecosystem Batch Processing.mp4
    04:50
  • 10 - Ecosystem ML Libraries and Other Libraries.mp4
    04:17
  • 11 - Maturity.mp4
    03:16
  • 12 - Streaming models.mp4
    04:23
  • 13 - Programming languages.mp4
    04:50
  • 14 - API levels.mp4
    02:20
  • 15 - Operators.mp4
    01:35
  • 16 - Operators Sliding and Tumbling Windows.mp4
    04:09
  • 17 - Operators Session and Count Windows.mp4
    02:51
  • 18 - Operators Joining.mp4
    03:03
  • 19 - Operators Lowlevel Operators.mp4
    03:18
  • 20 - Configuration.mp4
    01:43
  • 21 - Time characteristics l.mp4
    06:07
  • 22 - Time characteristics II.mp4
    04:03
  • 23 - Outoforder processing.mp4
    05:16
  • 24 - Triggers.mp4
    07:19
  • 25 - Latency Definition and influence of streaming model.mp4
    09:25
  • 26 - Latency influence of operation.mp4
    03:56
  • 27 - Latency predictability.mp4
    02:51
  • 28 - Throughput.mp4
    01:35
  • 29 - General advice.mp4
    02:30
  • 30 - Scalability.mp4
    06:55
  • 31 - Elasticity.mp4
    04:50
  • 32 - Parallelization.mp4
    02:13
  • 33 - State.mp4
    03:07
  • 34 - State backends.mp4
    07:26
  • 35 - State features.mp4
    01:19
  • 36 - Message delivery guarantees.mp4
    15:55
  • 37 - Checkpointing.mp4
    11:10
  • 38 - Checkpointing savepoints.mp4
    02:38
  • 39 - Writeaheadlogs.mp4
    01:07
  • 40 - Fault tolerance in Kafka Streams.mp4
    01:25
  • 41 - Master and worker failures.mp4
    08:27
  • 42 - Summary.mp4
    07:27
  • Description


    A deep dive into the internals of Flink, Spark Streaming, Structured Streaming, and Kafka Streams

    What You'll Learn?


    • The features and internals of Flink, Spark Streaming, Structured Streaming and Kafka Streams.
    • How to select the right stream processing framework for a use case.
    • The current state-of-the-art of distributed stream processing.
    • References to equivalent implementations in all frameworks.
    • This is not a programming course! This is a course on understanding how these systems work.

    Who is this for?


  • Anybody who needs to get a feeling on how to select the right framework for a use case.
  • Anybody who wants to build up firm, in-depth knowledge on the differences and characteristics of these frameworks.
  • Anybody who wants to build up a deep understanding of stream processing in general.
  • More details


    Description

    Do you need to use stream processing for your next project but have no idea where to begin? Or do you want to grow into a data engineering role and want to start building up knowledge on stream processing?

    In this course, we give a detailed explanation and comparison of several popular stream processing frameworks. At the finish line, you will be able to make a well-grounded selection of the right framework for  your use case or to start your learning process. We will cover Flink, Kafka Streams, Spark Streaming and Structured Streaming. These are the four frameworks that are currently the state-of-the-art in the industry.

    You will understand their features, characteristics and differences. This course gives you the perfect primer to start learning and better understand the APIs and programming languages behind these frameworks.

    This course covers all relevant aspects:

    - their general characteristics

    - APIs

    - latency and throughput performance

    - scalability

    - elasticity

    - fault tolerance

    - state management

    - deployment

    - ...

    We will dive deeply into the workings and the advantages and disadvantages of the different mechanisms and approaches.

    !!! This course is not a programming course but focuses on more theoretical aspects.

    At the end, you will be provided with a concise overview on what was covered.

    The content of this course is based on the results of Giselle's PhD work in which she benchmarked and analyzed these frameworks on all these characteristics. 

    Who this course is for:

    • Anybody who needs to get a feeling on how to select the right framework for a use case.
    • Anybody who wants to build up firm, in-depth knowledge on the differences and characteristics of these frameworks.
    • Anybody who wants to build up a deep understanding of stream processing in general.

    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    Category
    Giselle van Dongen
    Giselle van Dongen
    Instructor's Courses
    Giselle van Dongen did a PhD at Ghent University, teaching and benchmarking real-time distributed processing systems such as Spark Streaming, Flink and Kafka Streams. She studied their performance features (latency, throughput) and their scalability and fault tolerance. She also works as a Lead Data Scientist at Klarrio, focusing on real-time data processing and analytics.
    Students take courses primarily to improve job-related skills.Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.
    • language english
    • Training sessions 42
    • duration 3:08:49
    • Release Date 2023/03/16

    Courses related to Big Data