AI021 Professional

CUDA Programming Guide

The official, comprehensive resource for developers to learn the CUDA programming model and how to write high-performance code that executes on NVIDIA GPUs. This guide covers the platform architecture, programming interface, advanced hardware features, and technical specifications.

5.0

30.0h

1762 students

1 likes

Artificial Intelligence

Start Learning

Lessons

Lesson

1 Lesson 1

This lesson introduces the fundamental shift from latency-optimized CPU architectures to throughput-oriented GPU computing. Students will learn to distinguish between these processing models and understand how the CUDA programming platform enables massive parallel execution for data-intensive tasks.

2 Lesson 2

This lesson introduces the fundamentals of CUDA kernel development, focusing on the SIMT execution model and the use of the __global__ specifier to launch parallel functions on GPU Streaming Multiprocessors. Students will learn how to manage asynchronous kernel execution, handle device memory, and structure code to ensure effective hardware utilization.

3 Lesson 3

This lesson explores the fundamental differences between von Neumann and Harvard architectures, focusing on how memory access pathways impact computational performance. Students will learn to identify the von Neumann bottleneck, understand the benefits of split-cache Harvard designs, and analyze how modern systems utilize a Modified Harvard Architecture to balance throughput with programming flexibility.

4 Lesson 4

AI021: Optimization, Graphs, and Hardware Accelerators (Lesson 4) explores the shift from CPU-bottlenecked stream execution to GPU-autonomous workflows. Students will learn to utilize modern primitives like CUDA Graphs, lazy loading, and asynchronous memory prefetching to minimize host-side overhead and maximize hardware efficiency.

5 Lesson 5

This lesson explores the technical reference and language extensions in CUDA, focusing on the relationship between virtual architectures (PTX) and real hardware (SASS). Students will learn to manage compute capabilities, utilize architecture-specific macros, and navigate language constraints to ensure code portability and performance.

Course Overview

📚 Content Summary

Master the art of parallel computing with the industry-standard guide to NVIDIA CUDA.

Author: NVIDIA Corporation

🎯 Learning Objectives