CUDA Basics

Objective

Introduce important CUDA basic concepts.

Table of Contents

Kernel Call

kernel<<<N, 1>>>(...);

N blocks will run in parallel each with a single thread.

kernel<<<1, N>>>(...);

N threads will run in parallel within a single block.

For accessing kernel specific dims and indicies the following are available:

Grid Example

~~~
~~~
~~~
~~~
~~~
~~~

The above example would have:

Bandwidth vs Throughput vs Latency

CPU vs GPU

GPU Layout

GPU Memory Layout

References