DX12 Basics

A work-in-progress reference for some DX12 basics as I learn along. Check out Frank D. Luna’s books for an excellent and thorough introduction to the topic.

Conceptual Diagram

Executing Commands

An application submits commands to the GPU via a CommandQueue. Execution is asynchronous. The GPU idles if the queue is empty, and the CPU stalls on submission if the queue is full. A good application keeps both busy.

Commands are recorded in CommandLists. CommandLists are submitted to the CommandQueue via ExecuteCommandLists(). CommandLists are executed in order.

A CommandList must be Close()d before it can be executed.

Once a CommandList has been executed, it can be Reset() and re-used to record a new set of commands. Reset() re-initializes the CommandList and is cheaper than destroying it and creating a new one.

Commands in a CommandList are recorded into a CommandAllocator. This is the memory backing of commands. Therefore the CommandAllocator cannot be reset until the GPU finishes executing the commands. This requires synchronization (Fence).

Once the GPU has finished executing a CommandList, the CommandList’s CommandAllocator can be Reset() to record new commands.

Multiple CommandLists can be associated with the same CommandAllocator. However, only one of them can record at the same time, and the others must be in a closed state. In essence, commands are allocated contiguously in the CommandAllocator while a CommandList is recording.

When a CommandList is created or Reset(), it defaults to an open state. It might be convenient to Close() it right away.

Resources and Descriptors/Views

DX12 (and Vulkan) decouples resources and descriptors. Descriptors are also known as views.

A Resource is the texture or buffer data in memory.

A Descriptor describes how the Resource is accessed in different stages of the graphics pipeline. For example, a render target view (RTV) draws into a texture. A shader resource view (SRV) allows a shader to read from a texture. A Descriptor can also map to a subregion of the Resource and reinterpret the type of the data elements (for typeless resources).

If a Resource is typeless, then the Descriptor must specify a type. Typed Resources are best for performance; use typeless Resources only when strictly necessary.

Descriptor creation incurs some validation overhead. Create them during initialization if possible.

Types of Descriptors

CBV: constant buffer view, for reading constant buffer data.
SRV: shader resource view, for reading textures.
UAV: unordered access view, to read/write texture and buffer data.
Sampler: to sample textures via their SRVs.
RTV: render target view, to render into textures.
DSV: depth/stencil view, to describe depth/stencil buffers.

Descriptor Heaps

Descriptors are allocated from a DescriptorHeap. A DescriptorHeap is the memory backing for a type of Descriptor. An application will need at least one DescriptorHeap for each type of Descriptor used. Multiple DescriptorHeaps of the same type can also exist.

Resource Heaps

Resources are also allocated in heaps. When creating a resource (CreateCommittedResource()), we must specify the desired heap type:

Default heap: for resources exclusively accessed by the GPU.
Upload heap: for resources that require data uploads from the CPU to the GPU.
Readback heap: for resources that need to be read back by the CPU.

Synchronization

CPU-GPU Synchronization

Fences are used for CPU-GPU synchronization.

The example above shows how to safely Reset() a CommandAllocator by making sure that all commands in the CommandQueue backed by the CommandAllocator have been executed on the GPU.

To establish a synchronization point, the CPU calls CommandQueue::Signal() on a Fence and with a given fence value. When the GPU reaches the synchronization point, it signals the CPU by setting the Fence to the given value. Typically this value can be incremented by one every time a new synchronization point is established.

The CPU can check the Fence value in two ways. One is to call Fence::GetCompletedValue(), which is non-blocking. The other way is blocking: create a Windows event object, call SetEventOnCompletion(), then WaitForSingleObject(); the calling thread is put to sleep until the GPU signals the Fence.

GPU Workload Synchronization

Unlike OpenGL and previous versions of DirectX, applications also need to manage GPU workload synchronization. For example, if shader A writes to a texture through an RTV or UAV and shader B reads from it through an SRV or UAV, then a synchronization point must be established to prevent a resource hazard.

CommandList::ResourceBarrier() establishes a synchronization point between GPU workloads. Two common types of barriers are:

Resource Transition Barrier: declares a transition in a resource’s usage.
UAV Barrier: declares that all current UAV accesses to a resource must complete before future accesses can begin.

Resources are associated with a usage or state that defines how a resource is used. A Resource Transition Barrier declares a change in a resource’s state. The GPU then inserts synchronization points when it encounters barriers to prevent resource hazards.

For example, when beginning a new frame, the previous frame’s front buffer becomes the current frame’s back buffer. Before we can render to this resource in the current frame, we must transition it from D3D12_RESOURCE_STATE_PRESENT to D3D12_RESOURCE_STATE_RENDER_TARGET. Then, once we have rendered the current frame and are ready to Present(), we perform another transition from D3D12_RESOURCE_STATE_RENDER_TARGET back to D3D12_RESOURCE_STATE_PRESENT.

A UAV Barrier, on the other hand, synchronizes access to a UAV. This is typically needed when shader A writes to a UAV that shader B then reads from, or when two shaders write to a given UAV. In the first case, A must finish work before B can start executing. In the latter, a barrier is needed to guarantee write order unless we can gurantee that the shaders write to different parts of the UAV.

Uploading Buffer Data

Buffers should be placed in the default heap for best performance. However, resources on the default heap are not CPU-writeable. To upload buffer data, the application must instead create a buffer on the upload heap (“upload buffer”), upload data to that buffer, and then copy the upload buffer into the original buffer with CopyResource() or CopyBufferRegion().

The upload buffer cannot be released or re-used until the GPU finishes executing the CopyResource() or CopyBufferRegion() command. This requires CPU-GPU synchronization as usual.

The target buffer must also undergo the appropriate resource transitions during the transfer.

Diagrams rendered with PlantUML.