Companies Home Search Profile

Mathematics Behind Large Language Models and Transformers

Focused View

Patrik Szepesi

4:42:14

39 View
  • 1. Introduction to Tokenization.mp4
    11:44
  • 2. Tokenization in Depth.mp4
    11:23
  • 3. Encoding Tokens.mp4
    11:27
  • 4. Programatically Understanding Tokenizations.mp4
    08:59
  • 5. BERT vs. DistilBERT.mp4
    05:07
  • 6. Embeddings in a Continuous Vector Space.mp4
    07:50
  • 1. Introduction to Positional Encodings.mp4
    05:20
  • 2. How Positional Encodings Work.mp4
    04:18
  • 3. Understanding Even and Odd Indicies with Positional Encodings.mp4
    10:31
  • 4. Why we Use Sine and Cosine Functions for Positional Encodings.mp4
    05:15
  • 5. Understanding the Nature of Sine and Cosine Functions.mp4
    10:04
  • 6. Visualizing Positional Encodings in Sine and Cosine Graphs.mp4
    09:34
  • 7. Solving the Equations to get the Positional Encodings.mp4
    18:47
  • 1. Introduction to Attention Mechanisms.mp4
    03:02
  • 2. Query, Key, and Value Matrix.mp4
    18:20
  • 3. Getting started with our Step by Step Attention Calculation.mp4
    07:09
  • 4. Calculating Key Vectors.mp4
    20:44
  • 5. Query Matrix Introduction.mp4
    10:34
  • 6. Calculating Raw Attention Scores.mp4
    21:59
  • 7. Understanding the Mathematics behind Dot products and Vector Alignment.mp4
    13:56
  • 8. Visualising Raw Attention Scores in 2 Dimensions.mp4
    05:56
  • 9. Converting Raw Attention Scores to Probability Distributions with Softmax.mp4
    09:32
  • 10. Normalisation and Scaling.mp4
    03:24
  • 11. Understanding the Value Matrix and Value Vector.mp4
    09:24
  • 12. Calculating the Final Context Aware Rich Representation for the word river.mp4
    10:59
  • 13. Understanding the Output.mp4
    01:57
  • 14. Understanding Multi Head Attention.mp4
    12:13
  • 15. Multi Head Attention Example, and Subsequent layers.mp4
    10:13
  • 16. Masked Language Modeling.mp4
    02:33
  • Description


    Deep Dive into Transformer Mathematics: From Tokenization to Multi-Head Attention to Masked Language Modeling & Beyond

    What You'll Learn?


    • Mathematics Behind Large Language Models
    • Positional Encodings
    • Multi Head Attention
    • Query, Value and Key Matrix
    • Attention Masks
    • Masked Language Modeling
    • Dot Products and Vector Alignments
    • Nature of Sine and Cosine functions in Positional Encodings
    • How models like ChatGPT work under the hood
    • Bidirectional Models
    • Context aware word representations
    • Word Embeddings
    • How dot products work
    • Matrix multiplication
    • Programatically Create tokens

    Who is this for?


  • For ambitious learners aiming to reach the upper echelon of the programming world: This content is designed for those who aspire to be within the top 1% of data scientists and machine learning engineers. It is particularly geared towards individuals who are keen to gain a deep understanding of transformers, the advanced technology behind large language models. This course will equip you with the foundational knowledge and technical skills required to excel in the development and implementation of cutting-edge AI applications
  • What You Need to Know?


  • Basic HS math(linear algebra)
  • More details


    Description

    Welcome to the Mathematics of Transformers, an in-depth course crafted for those eager to understand the mathematical foundations of large language models like GPT, BERT, and beyond. This course delves into the complex mathematical algorithms that allow these sophisticated models to process, understand, and generate human-like text. Starting with tokenization, students will learn how raw text is converted into a format understandable by models through techniques such as the WordPiece algorithm. We’ll explore the core components of transformer architectures—key matrices, query matrices, and value matrices—and their roles in encoding information. A significant focus will be on the mechanics of the attention mechanism, including detailed studies of multi-head attention and attention masks. These concepts are pivotal in enabling models to focus on relevant parts of the input data, enhancing their ability to understand context and nuance. We will also cover positional encodings, essential for maintaining the sequence of words in inputs, utilizing cosine and sine functions to embed the position information mathematically. Additionally, the course will include comprehensive insights into bidirectional and masked language models, vectors, dot products, and multi-dimensional word embeddings, crucial for creating dense representations of words. By the end of this course, participants will not only master the theoretical underpinnings of transformers but also gain practical insights into their functionality and application. This knowledge will prepare you to innovate and excel in the field of machine learning, placing you among the top echelons of AI engineers and researchers

    Who this course is for:

    • For ambitious learners aiming to reach the upper echelon of the programming world: This content is designed for those who aspire to be within the top 1% of data scientists and machine learning engineers. It is particularly geared towards individuals who are keen to gain a deep understanding of transformers, the advanced technology behind large language models. This course will equip you with the foundational knowledge and technical skills required to excel in the development and implementation of cutting-edge AI applications

    User Reviews
    Rating
    0
    0
    0
    0
    0
    average 0
    Total votes0
    Focused display
    Category
    Patrik Szepesi
    Patrik Szepesi
    Instructor's Courses
    I am an AWS certified machine learning engineer , working as a machine learning engineer at Blue River Technology, a Silicon Valley company creating computer vision machine learning solutions(such as autonomous vehicles ) for John Deere. I have worked as a data scientist at companies like Morgan Stanley, and I am also participating in several artificial intelligence related researches with Óbuda University. I am here to share the most cutting edge technologies surrounding machine learning and AWS.
    Students take courses primarily to improve job-related skills.Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.
    • language english
    • Training sessions 29
    • duration 4:42:14
    • Release Date 2024/08/11