CUDA Parallel Programming on NVIDIA GPUs (HW and SW)

Focused View

13:16:49

0 View

1 - Introduction to the Nvidia GPUs hardware

1 - 01.mp4

21:16

1 - 01-CPUs-and-GPUs.pptx

1 - GPU vs CPU very important.mp4

20:49

1 - Top500.txt

2 - NVidias history How Nvidia started dominating the GPU sector.mp4

05:18

3 - Architectures and Generations relationship Hopper Ampere GeForce and Tesla.mp4

16:07

4 - A100 techpowerup.txt

4 - How to know the Architecture and Generation.mp4

06:56

4 - RTX 3090.txt

5 - The difference between the GPU and the GPU Chip.mp4

04:50

6 - The architectures and the corresponding chips.mp4

05:25

7 - A100.txt

7 - Nvidia GPU architectures From Fermi to hopper.mp4

12:24

7 - RTX 3090.txt

7 - The history of Nvidia.txt

7 - V100.txt

8 - Main-Parameters-to-evaluate-the-GPU-performance.pdf

8 - Parameters required to compare between different Architectures.mp4

24:05

9 - Half single and double precision operations.mp4

06:27

10 - Compute capability and utilizations of the GPUs.mp4

08:30

11 - Before reading any whitepapers look at this.mp4

08:43

12 - VoltaAmperePascalSIMD Dont skip.mp4

52:08

12 - research paper 01.txt

12 - research paper 02.txt

12 - research paper 03.txt

2 - Installing Cuda and other programs

13 - What features installed with the CUDA toolkit.mp4

06:38

14 - Installing CUDA on Windows.mp4

04:48

15 - Installing WSL to use Linux on windows OS.mp4

06:27

16 - Installing Cuda toolkits on Linux.mp4

03:47

3 - Introduction to CUDA programming

17 - Mapping SW from CUDA to HW introducing CUDA.mp4

12:49

17 - S3-01-Introduction-to-CUDA.pdf

18 - 001 Hello World program threads Blocks.mp4

20:39

18 - 4.txt

18 - 64.txt

18 - L2-cache-forums.txt

18 - picture2.zip

19 - Compiling Cuda on Linux.mp4

09:54

20 - 002 Hello World program WarpIDs.mp4

09:03

20 - picture1.zip

20 - test02.txt

21 - 003 Vector addition the Steps for any CUDA project.mp4

22:30

22 - 004 Vector addition blocks and thread indexing GPU performance.mp4

18:21

23 - 005 levels of parallelization Vector addition with Extralarge vectors.mp4

18:31

4 - Profiling

24 - Query the device properties using the Runtime APIs.mp4

18:46

24 - S4-01-Quering-the-device-props.pdf

25 - Nvidiasmi and its configurations Linux User.mp4

27:30

25 - S4-02-nvidia-smi.pdf

25 - thth.txt

26 - S4-03-occupancy.pdf

26 - The GPUs Occupancy and Latency hiding.mp4

52:32

27 - Allocated active blocks per SM important.mp4

16:55

27 - S4-04-Allocated-Active-Blocks-Per-SM.pdf

28 - Starting with the nsight compute first issue.mp4

09:01

29 - All profiling tools from NVidia Nsight systems compute nvprof.mp4

04:35

30 - Error checking APIs look at chat GPU there is an example.mp4

30:17

31 - Nsight Compute performance using command line analysis.mp4

39:21

31 - S4-08-nsight-compute-CLI.pptx

31 - The Documentation.txt

31 - The Second Documentation.txt

32 - Graphical Nsight Compute windows and linux.mp4

01:00:49

32 - Graphic kernel profiling.txt

5 - Performance analysis for the previous applications

33 - Performance analysis.mp4

32:41

33 - S5-001-number-of-waves-and-performance-analysis.pptx

33 - graph-1.zip

34 - Vector addition with a size not power of 2 important.mp4

11:58

34 - sec5-002.zip

6 - 2D Indexing

35 - Matrices addition using 2D of blocks and threads.mp4

51:02

35 - S5-001-number-of-waves-and-performance-analysis.pdf

36 - Why L1 Hitrate is zero.mp4

24:38

7 - Shared Memory Warp Divergence Shuffle Operations

2 - Quiz 1.html

37 - NVidia GTC Lecture and powerpoint.txt

37 - Shared-Memory.pdf

37 - The shared memory.mp4

34:50

38 - Good detailed lecture about the warp divergence.txt

38 - Warp Divergence.mp4

15:17

8 - Debugging tools

39 - Debugging using visual studio important 1.mp4

40:12

39 - Getting Started with the CUDA Debugger.txt

39 - NVIDIA Developer Tools.txt

39 - NVIDIA Nsight Integration.txt

39 - NVIDIA Nsight Visual Studio Code Edition.txt

Description

Performance Optimization and Analysis for High-Performance Computing

What You'll Learn?

Comprehensive Understanding of GPU vs CPU Architecture
learn the history of graphical processing unit (GPU) until the most recent products
Understand the internal structure of GPU
Understand the different types of memories and how they affect the performance
Understand the most recent technologies in the GPU internal components
Understand the basics of the CUDA programming on GPU
Start programming GPU using both CUDA on Both windows and linux
understand the most efficient ways for parallelization
Profiling and Performance Tuning
Leveraging Shared Memory

Who is this for?

For any one interested in GPU and CUDA like engineering students, researchers and any other one

What You Need to Know?

C and C++ basics

Linux and windows basics

Computer Architecture basics

More details

Description
This comprehensive course is designed for anyone looking to dive deep into CUDA programming and NVIDIA GPU architectures. Starting from the basics of GPU hardware, the course walks you through the evolution of NVIDIA's architectures, their key performance features, and the computational power of CUDA. With practical programming examples and step-by-step instruction, students will develop an in-depth understanding of GPU computing, CUDA programming, and performance optimization. Whether you're an experienced developer or new to parallel computing, this course provides the knowledge and skills necessary to harness the full potential of GPU programming.

Here's a refined summary of what you will gain from this CUDA programming course:
Comprehensive Understanding of GPU vs CPU Architecture: Students will learn the fundamental differences between GPUs and CPUs, gaining insight into how GPUs are designed for parallel processing tasks.
Deep Dive into NVIDIA's GPU Architectures: The course covers the evolution of NVIDIA's GPU architectures, including Fermi, Pascal, Volta, Ampere, and Hopper, and teaches how to compare different generations based on key performance parameters.
Hands-On CUDA Installation: Students will learn how to install CUDA across various operating systems, including Windows, Linux, and using WSL, while exploring the essential features that come with the CUDA toolkit.
Introduction to CUDA Programming Concepts: Through practical examples, students will understand core CUDA programming principles, including thread and block management, and how to develop parallel applications like vector addition.
Profiling and Performance Tuning: The course will guide students through using NVIDIAâ€™s powerful profiling tools like Nsight Compute and nvprof to measure GPU performance and optimize code by addressing issues like occupancy and latency hiding.
Mastering 2D Indexing for Matrix Operations: Students will explore 2D indexing techniques for efficient matrix computations, learning to optimize memory access patterns and enhance performance.
Performance Optimization Techniques: They will acquire skills to optimize GPU programs through real-world examples, including handling non-power-of-2 data sizes and fine-tuning operations for maximum efficiency.
Leveraging Shared Memory: The course dives into how shared memory can boost CUDA application performance by improving data locality and minimizing global memory accesses.
Understanding Warp Divergence: Students will learn about warp divergence and its impact on performance, along with strategies to minimize it and ensure smooth execution of parallel threads.
Real-World Application of Profiling and Debugging: The course emphasizes practical use cases, where students will apply debugging techniques, error-checking APIs, and advanced profiling methods to fine-tune their CUDA programs for real-world applications.
By the end of the course, students will be proficient in CUDA programming, profiling, and optimization, equipping them with the skills to develop high-performance GPU applications.
Who this course is for:
For any one interested in GPU and CUDA like engineering students, researchers and any other one

User Reviews

Rating

average 0

Total votes0

Focused display

Graphic Design

Udemy

View courses Udemy

Students take courses primarily to improve job-related skills.Some courses generate credit toward technical certification. Udemy has made a special effort to attract corporate trainers seeking to create coursework for employees of their company.