AI024 Professional

Introduction to ROCm and HIP Programming: A Practical Tutorial

A practical, modern guide to AMD GPU programming with ROCm and HIP. It covers the full software stack, installation, build workflows, kernel programming, memory management, performance engineering, library usage, CUDA porting, and production debugging practices.

5.0

30.0h

361 students

0 likes

Artificial Intelligence

Start Learning

Lessons

Lesson

1 Lesson 1

This lesson introduces the ROCm platform and the HIP programming model as a bridge for porting CUDA applications to AMD hardware. Students will learn how to use automated tools like hipify to migrate code while understanding the importance of architecture-aware tuning to achieve optimal performance.

2 Lesson 2

This lesson covers the essential steps for installing and configuring the ROCm software stack, including dependency management, environment variable setup, and user permission requirements. Students will learn how to verify their system environment and ensure successful hardware-software communication through diagnostic tools and proper configuration.

3 Lesson 3

This lesson explores the distinction between source portability and binary performance in the ROCm ecosystem, emphasizing that while HIP code is functionally portable, achieving peak throughput requires architecture-specific compilation. Students will learn to utilize the hipcc toolchain and CMake to manage build configurations that optimize code for specific hardware instruction sets.

4 Lesson 4

This lesson introduces the HIP programming model, focusing on the transition from sequential CPU iteration to spatial GPU parallelism using the Parallel Pivot approach. Students will learn to map independent data tasks to thread grids, manage memory, and implement kernel execution with proper boundary checks and error handling.

5 Lesson 5

AI024: Memory Management and Data Patterns (Lesson 5) explores the memory-centric nature of GPU performance, focusing on the Roofline Model and the critical importance of minimizing data movement between host and device. Students will learn to distinguish between memory-bound and compute-bound kernels while mastering strategies to optimize data residence and bandwidth utilization.

6 Lesson 6

This lesson explores the transition from synchronous to asynchronous GPU execution, focusing on how to use HIP streams to decouple CPU and GPU tasks. Students will learn to optimize performance by implementing non-blocking memory transfers and kernel launches to maximize hardware utilization and eliminate execution bottlenecks.

7 Lesson 7

This lesson introduces a systematic, data-driven approach to performance engineering on AMD GPUs, emphasizing the use of tools like rocprofv3 to identify bottlenecks rather than relying on intuition. Students will learn to follow a six-step scientific workflow to optimize memory access, instruction throughput, and hardware utilization while avoiding common performance "superstitions."

8 Lesson 8

This lesson introduces the Library-First Engineering Principle, which emphasizes using optimized ROCm libraries like rocBLAS and rocFFT to reduce technical debt and ensure hardware portability. Students will learn to prioritize these vendor-tuned solutions over custom kernel development to achieve better performance and easier maintenance across evolving GPU architectures.

9 Lesson 9

AI024: Porting CUDA Applications to HIP (Lesson 9) covers the systematic, incremental migration of CUDA code to the HIP platform using tools like HIPIFY-Clang and HIPIFY-Perl. Students will learn to distinguish between mechanical API translations and architectural optimizations, such as adjusting for warp-size differences, to ensure functional and performance parity on AMD ROCm hardware.

10 Lesson 10

This lesson explores the GPU Developer’s Creed, which prioritizes functional correctness and architectural isolation over raw performance when working with ROCm and HIP. Students will learn to implement systematic debugging, testing, and CI/CD practices to ensure stable, reproducible, and accurate GPU kernel deployments.

Course Overview

📚 Content Summary

Master AMD GPU programming and CUDA-to-HIP portability with this technical deep dive.

Author: EvoClass

Acknowledgments: AMD official ROCm and HIP documentation base, including projects like ROCm, HIP, and ROCm LLVM.

🎯 Learning Objectives

Define HIP and its role within the ROCm ecosystem in a single concise sentence.
Distinguish between ROCm (platform), HIP (interface), and ROCm libraries (building blocks).
Identify the hierarchical layers of the ROCm architecture from hardware to application frameworks.
Define the relationship between the HIP SDK and the ROCm platform across different operating systems.
Execute a systematic installation workflow, including support matrix verification and post-installation path configuration.
Compile and run a minimal verification program to troubleshoot common driver and environment access issues.
Understand why a robust build strategy is essential for reconciling source portability with architecture-specific performance.
Implement portable kernel launches using the hipLaunchKernelGGL macro as an alternative to CUDA's triple-angle-bracket syntax.
Configure production-grade CMake projects that target specific ROCm architectures and manage external library dependencies.
Define the anatomy of a HIP kernel and apply the basic execution formula for thread indexing.