Mathematics Behind Large Language Models and Transformers

Focused View

Patrik Szepesi

4:42:14

39 View

1. Tokenization and Multidimensional Word Embeddings

1. Introduction to Tokenization.mp4

11:44

2. Tokenization in Depth.mp4

11:23

3. Encoding Tokens.mp4

11:27

4. Programatically Understanding Tokenizations.mp4

08:59

5. BERT vs. DistilBERT.mp4

05:07

6. Embeddings in a Continuous Vector Space.mp4

07:50

2. Positional Encodings

1. Introduction to Positional Encodings.mp4

05:20

2. How Positional Encodings Work.mp4

04:18

3. Understanding Even and Odd Indicies with Positional Encodings.mp4

10:31

4. Why we Use Sine and Cosine Functions for Positional Encodings.mp4

05:15

5. Understanding the Nature of Sine and Cosine Functions.mp4

10:04

6. Visualizing Positional Encodings in Sine and Cosine Graphs.mp4

09:34

7. Solving the Equations to get the Positional Encodings.mp4

18:47

3. Attention Mechanism and Transformer Architecture

1. Introduction to Attention Mechanisms.mp4

03:02

2. Query, Key, and Value Matrix.mp4

18:20

3. Getting started with our Step by Step Attention Calculation.mp4

07:09

4. Calculating Key Vectors.mp4

20:44

5. Query Matrix Introduction.mp4

10:34

6. Calculating Raw Attention Scores.mp4

21:59

7. Understanding the Mathematics behind Dot products and Vector Alignment.mp4

13:56

8. Visualising Raw Attention Scores in 2 Dimensions.mp4

05:56

9. Converting Raw Attention Scores to Probability Distributions with Softmax.mp4

09:32

10. Normalisation and Scaling.mp4

03:24

11. Understanding the Value Matrix and Value Vector.mp4

09:24

12. Calculating the Final Context Aware Rich Representation for the word river.mp4

10:59

13. Understanding the Output.mp4

01:57

14. Understanding Multi Head Attention.mp4

12:13

15. Multi Head Attention Example, and Subsequent layers.mp4

10:13

16. Masked Language Modeling.mp4

02:33

Description

Deep Dive into Transformer Mathematics: From Tokenization to Multi-Head Attention to Masked Language Modeling & Beyond

What You'll Learn?

Mathematics Behind Large Language Models
Positional Encodings
Multi Head Attention
Query, Value and Key Matrix
Attention Masks
Masked Language Modeling
Dot Products and Vector Alignments
Nature of Sine and Cosine functions in Positional Encodings
How models like ChatGPT work under the hood
Bidirectional Models
Context aware word representations
Word Embeddings
How dot products work
Matrix multiplication
Programatically Create tokens

Who is this for?

For ambitious learners aiming to reach the upper echelon of the programming world: This content is designed for those who aspire to be within the top 1% of data scientists and machine learning engineers. It is particularly geared towards individuals who are keen to gain a deep understanding of transformers, the advanced technology behind large language models. This course will equip you with the foundational knowledge and technical skills required to excel in the development and implementation of cutting-edge AI applications

What You Need to Know?

Basic HS math(linear algebra)

More details

Description
Welcome to the Mathematics of Transformers, an in-depth course crafted for those eager to understand the mathematical foundations of large language models like GPT, BERT, and beyond. This course delves into the complex mathematical algorithms that allow these sophisticated models to process, understand, and generate human-like text. Starting with tokenization, students will learn how raw text is converted into a format understandable by models through techniques such as the WordPiece algorithm. Weâ€™ll explore the core components of transformer architecturesâ€”key matrices, query matrices, and value matricesâ€”and their roles in encoding information. A significant focus will be on the mechanics of the attention mechanism, including detailed studies of multi-head attention and attention masks. These concepts are pivotal in enabling models to focus on relevant parts of the input data, enhancing their ability to understand context and nuance. We will also cover positional encodings, essential for maintaining the sequence of words in inputs, utilizing cosine and sine functions to embed the position information mathematically. Additionally, the course will include comprehensive insights into bidirectional and masked language models, vectors, dot products, and multi-dimensional word embeddings, crucial for creating dense representations of words. By the end of this course, participants will not only master the theoretical underpinnings of transformers but also gain practical insights into their functionality and application. This knowledge will prepare you to innovate and excel in the field of machine learning, placing you among the top echelons of AI engineers and researchers
Who this course is for:
For ambitious learners aiming to reach the upper echelon of the programming world: This content is designed for those who aspire to be within the top 1% of data scientists and machine learning engineers. It is particularly geared towards individuals who are keen to gain a deep understanding of transformers, the advanced technology behind large language models. This course will equip you with the foundational knowledge and technical skills required to excel in the development and implementation of cutting-edge AI applications

User Reviews

Rating

average 0

Total votes0

Focused display

Machine Learning

Patrik Szepesi

Instructor's Courses

I am an AWS certified machine learning engineer , working as a machine learning engineer at Blue River Technology, a Silicon Valley company creating computer vision machine learning solutions(such as autonomous vehicles ) for John Deere. I have worked as a data scientist at companies like Morgan Stanley, and I am also participating in several artificial intelligence related researches with Óbuda University. I am here to share the most cutting edge technologies surrounding machine learning and AWS.

Udemy

View courses Udemy

Students take courses primarily to improve job-related skills.Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.