A work-in-progress reference for some DX12 basics as I learn along. Check out Frank D. Luna’s books for an excellent and thorough introduction to the topic.
An application submits commands to the GPU via a
CommandQueue. Execution is asynchronous. The GPU idles if
the queue is empty, and the CPU stalls on submission if the queue is
full. A good application keeps both busy.
Commands are recorded in
CommandLists are submitted to the
executed in order.
CommandList must be
Close()d before it
can be executed.
CommandList has been executed, it can be
Reset() and re-used to record a new set of commands.
Reset() re-initializes the
CommandList and is
cheaper than destroying it and creating a new one.
Commands in a
CommandList are recorded into a
CommandAllocator. This is the memory backing of commands.
CommandAllocator cannot be reset until the
GPU finishes executing the commands. This requires synchronization
Once the GPU has finished executing a
CommandAllocator can be
Reset() to record new commands.
CommandLists can be associated with the same
CommandAllocator. However, only one of them can record at
the same time, and the others must be in a closed state. In essence,
commands are allocated contiguously in the
CommandList is recording.
CommandList is created or
it defaults to an open state. It might be convenient to
Close() it right away.
Resources and Descriptors/Views
DX12 (and Vulkan) decouples resources and descriptors. Descriptors are also known as views.
Resource is the texture or buffer data in memory.
Descriptor describes how the
accessed in different stages of the graphics pipeline. For example, a
render target view (
RTV) draws into a texture. A shader
resource view (
SRV) allows a shader to read from a texture.
Descriptor can also map to a subregion of the
Resource and reinterpret the type of the data elements (for
Resource is typeless, then the
Descriptor must specify a type. Typed
Resources are best for performance; use typeless
Resources only when strictly necessary.
Descriptor creation incurs some validation overhead.
Create them during initialization if possible.
Types of Descriptors
CBV: constant buffer view, for reading constant buffer data.
SRV: shader resource view, for reading textures.
UAV: unordered access view, to read/write texture and buffer data.
Sampler: to sample textures via their
RTV: render target view, to render into textures.
DSV: depth/stencil view, to describe depth/stencil buffers.
Descriptors are allocated from a
DescriptorHeap is the memory
backing for a type of
Descriptor. An application will need
at least one
DescriptorHeap for each type of
Descriptor used. Multiple
the same type can also exist.
Resources are also allocated in heaps. When creating a resource
CreateCommittedResource()), we must specify the desired
Default heap: for resources exclusively accessed by the GPU.
Upload heap: for resources that require data uploads from the CPU to the GPU.
Readback heap: for resources that need to be read back by the CPU.
Fences are used for CPU-GPU synchronization.
The example above shows how to safely
CommandAllocator by making sure that all commands in the
CommandQueue backed by the
have been executed on the GPU.
To establish a synchronization point, the CPU calls
CommandQueue::Signal() on a
Fence and with a
given fence value. When the GPU reaches the synchronization point, it
signals the CPU by setting the
Fence to the given value.
Typically this value can be incremented by one every time a new
synchronization point is established.
The CPU can check the
Fence value in two ways. One is to
Fence::GetCompletedValue(), which is non-blocking. The
other way is blocking: create a Windows event object, call
WaitForSingleObject(); the calling thread is put to sleep
until the GPU signals the
GPU Workload Synchronization
Unlike OpenGL and previous versions of DirectX, applications also
need to manage GPU workload synchronization. For example, if shader A
writes to a texture through an
shader B reads from it through an
then a synchronization point must be established to prevent a resource
CommandList::ResourceBarrier() establishes a
synchronization point between GPU workloads. Two common types of
Resource Transition Barrier: declares a transition in a resource’s usage.
UAV Barrier: declares that all current UAV accesses to a resource must complete before future accesses can begin.
Resources are associated with a usage or state
that defines how a resource is used. A
Resource Transition Barrier declares a change in a
resource’s state. The GPU then inserts synchronization points when it
encounters barriers to prevent resource hazards.
For example, when beginning a new frame, the previous frame’s front
buffer becomes the current frame’s back buffer. Before we can render to
this resource in the current frame, we must transition it from
D3D12_RESOURCE_STATE_RENDER_TARGET. Then, once we have
rendered the current frame and are ready to
perform another transition from
D3D12_RESOURCE_STATE_RENDER_TARGET back to
UAV Barrier, on the other hand, synchronizes access to
UAV. This is typically needed when shader A writes to a
UAV that shader B then reads from, or when two shaders
write to a given
UAV. In the first case, A must finish work
before B can start executing. In the latter, a barrier is needed to
guarantee write order unless we can gurantee that the shaders write to
different parts of the UAV.
Uploading Buffer Data
Buffers should be placed in the default heap for best performance.
However, resources on the default heap are not CPU-writeable. To upload
buffer data, the application must instead create a buffer on the upload
heap (“upload buffer”), upload data to that buffer, and then copy the
upload buffer into the original buffer with
The upload buffer cannot be released or re-used until the GPU
finishes executing the
CopyBufferRegion() command. This requires CPU-GPU
synchronization as usual.
The target buffer must also undergo the appropriate resource transitions during the transfer.
Diagrams rendered with PlantUML.