Skip to content

Multistream rendering and instancing

Chuck Walbourn edited this page Aug 25, 2021 · 30 revisions

In this lesson we learn how to use multistream rendering to implement GPU instancing.

Input assembler

For these tutorial lessons, we've been providing a single stream of vertex data to the input assembler. Generally the most efficient rendering is a single vertex buffer with a stride of 16, 32, or 64 bytes, but there are times when arranging the vertex data in such a layout is expensive. The Direct3D Input Assembler can therefore pull vertex information from up to 16 (or 32 with Hardware Feature Level 11 or better) vertex buffers. This provides a lot of freedom in managing your vertex buffers.

For example, if we return to a case from Simple rendering, here is the 'stock' vertex input layout for VertexPositionNormalTexture:

const D3D11_INPUT_ELEMENT_DESC c_InputElements[] =
{
    { "SV_Position", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
    { "NORMAL",      0, DXGI_FORMAT_R32G32B32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
    { "TEXCOORD",    0, DXGI_FORMAT_R32G32_FLOAT,    0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
};

This describes a single vertex stream with three elements. We could instead arrange this into three VBs as follows:

// Position in VB#0, NORMAL in VB#1, TEXCOORD in VB#2
const D3D11_INPUT_ELEMENT_DESC c_InputElements[] =
{
    { "SV_Position", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
    { "NORMAL",      0, DXGI_FORMAT_R32G32B32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
    { "TEXCOORD",    0, DXGI_FORMAT_R32G32_FLOAT,    2, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
};

To render, we'd need to create an InputLayout object using this with our vertex shader, and then bind the vertex buffers to each slot:

ID3D11Buffer* vbs[3] = { m_positionVB.Get(), m_normalVB.Get(), m_texcoordVB.Get() };
UINT strides[3] = { sizeof(float) * 3, sizeof(float) * 3, sizeof(float) * 2 };
UINT offsets[3] = {};
context->IASetVertexBuffers(0, 3, vbs, strides, offsets);

Note if we are using DrawIndexed, then the same index value is used to retrieve the 'ith' element from each vertex buffer (i.e. there is only one index per vertex, and all VBs must be at least as long as the highest index value).

Per-vertex vs. Per-instance

In addition to pulling vertex data from multiple streams, the input assembler can also 'loop' over some streams to implement a feature called "instancing". Here the same vertex data is drawing multiple times with some per-vertex data changing "once per instance" as it loops over the other data. This allows you to efficiently render a large number of the same object in many locations, such as grass or boulders.

The NormalMapEffect supports GPU instancing using a per-vertex XMFLOAT3X4 matrix which can include translations, rotations, scales, etc. For example if we were using VertexPositionNormalTexture vertex data with instancing, we'd create an input layout as follows:

// VertexPositionNormalTexture in VB#0, XMFLOAT3X4 in VB#1
const D3D11_INPUT_ELEMENT_DESC c_InputElements[] =
{
    { "SV_Position", 0, DXGI_FORMAT_R32G32B32_FLOAT,    0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA,   0 },
    { "NORMAL",      0, DXGI_FORMAT_R32G32B32_FLOAT,    0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA,   0 },
    { "TEXCOORD",    0, DXGI_FORMAT_R32G32_FLOAT,       0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA,   0 },
    { "InstMatrix",  0, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_INSTANCE_DATA, 1 },
    { "InstMatrix",  1, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_INSTANCE_DATA, 1 },
    { "InstMatrix",  2, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_INSTANCE_DATA, 1 },
};

Here the first vertex buffer has enough VertexPositionNormalTexture vertex data for one instance, and the second vertex buffer has as many XMFLOAT3X4 entries as instances.

GPU instancing is not supported on Direct3D Hardware Feature Level 9.1 or 9.2, and is only partially supported on 9.3. The DirectX Tool Kit shaders that support GPU instancing are written for Shader Model 4.0 and require Feature Level 10 or greater.

Setup

First create a new project using the instructions from the earlier lessons: Using DeviceResources and Adding the DirectX Tool Kit which we will use for this lesson.

Instancing

Start by saving spnza_bricks_a.dds, spnza_bricks_a_normal.dds, spnza_bricks_a_specular.dds into your new project's directory, and then from the top menu select Project / Add Existing Item.... Select "spnza_bricks_a.dds" and click "OK". Repeat for the two other files.

In the Game.h file, add the following variables to the bottom of the Game class's private declarations:

DirectX::SimpleMath::Matrix m_view;
DirectX::SimpleMath::Matrix m_proj;

std::unique_ptr<DirectX::NormalMapEffect> m_effect;
std::unique_ptr<DirectX::GeometricPrimitive> m_shape;

Microsoft::WRL::ComPtr<ID3D11InputLayout> m_instanceLayout;
Microsoft::WRL::ComPtr<ID3D11Buffer> m_instancedVB;
Microsoft::WRL::ComPtr<ID3D11ShaderResourceView> m_brickDiffuse;
Microsoft::WRL::ComPtr<ID3D11ShaderResourceView> m_brickNormal;
Microsoft::WRL::ComPtr<ID3D11ShaderResourceView> m_brickSpecular;

UINT m_instanceCount;
std::unique_ptr<DirectX::XMFLOAT3X4[]> m_instanceTransforms;

In Game.cpp file, modify the Game constructor to initialize the new variable:

Game::Game() noexcept(false) :
    m_instanceCount(0)
{
    m_deviceResources = std::make_unique<DX::DeviceResources>();
    m_deviceResources->RegisterDeviceNotify(this);
}

In Game.cpp, add to the TODO of CreateDeviceDependentResources:

auto context = m_deviceResources->GetD3DDeviceContext();
m_shape = GeometricPrimitive::CreateSphere(context);

DX::ThrowIfFailed(CreateDDSTextureFromFile(device, L"spnza_bricks_a.DDS",
    nullptr, m_brickDiffuse.ReleaseAndGetAddressOf()));

DX::ThrowIfFailed(CreateDDSTextureFromFile(device, L"spnza_bricks_a_normal.DDS",
    nullptr, m_brickNormal.ReleaseAndGetAddressOf()));

DX::ThrowIfFailed(CreateDDSTextureFromFile(device, L"spnza_bricks_a_specular.DDS",
    nullptr, m_brickSpecular.ReleaseAndGetAddressOf()));

m_effect = std::make_unique<NormalMapEffect>(device);
m_effect->EnableDefaultLighting();
m_effect->SetTexture(m_brickDiffuse.Get());
m_effect->SetNormalTexture(m_brickNormal.Get());
m_effect->SetSpecularTexture(m_brickSpecular.Get());
m_effect->SetInstancingEnabled(true);

const D3D11_INPUT_ELEMENT_DESC c_InputElements[] =
{
    { "SV_Position", 0, DXGI_FORMAT_R32G32B32_FLOAT,    0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA,   0 },
    { "NORMAL",      0, DXGI_FORMAT_R32G32B32_FLOAT,    0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA,   0 },
    { "TEXCOORD",    0, DXGI_FORMAT_R32G32_FLOAT,       0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA,   0 },
    { "InstMatrix",  0, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_INSTANCE_DATA, 1 },
    { "InstMatrix",  1, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_INSTANCE_DATA, 1 },
    { "InstMatrix",  2, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_INSTANCE_DATA, 1 },
};

DX::ThrowIfFailed(
    CreateInputLayoutFromEffect(device, m_effect.get(),
        c_InputElements, std::size(c_InputElements),
        m_instanceLayout.ReleaseAndGetAddressOf()));

// Create instance transforms.
{
    size_t j = 0;
    for (float y = -6.f; y < 6.f; y += 1.5f)
    {
        for (float x = -6.f; x < 6.f; x += 1.5f)
        {
            ++j;
        }
    }
    m_instanceCount = static_cast<UINT>(j);

    m_instanceTransforms = std::make_unique<XMFLOAT3X4[]>(j);

    constexpr XMFLOAT3X4 s_identity = { 1.f, 0.f, 0.f, 0.f, 0.f, 1.f, 0.f, 0.f, 0.f, 0.f, 1.f, 0.f };

    j = 0;
    for (float y = -6.f; y < 6.f; y += 1.5f)
    {
        for (float x = -6.f; x < 6.f; x += 1.5f)
        {
            m_instanceTransforms[j] = s_identity;
            m_instanceTransforms[j]._14 = x;
            m_instanceTransforms[j]._24 = y;
            ++j;
        }
    }

    auto desc = CD3D11_BUFFER_DESC(
        static_cast<UINT>(j * sizeof(XMFLOAT3X4)),
        D3D11_BIND_VERTEX_BUFFER,
        D3D11_USAGE_DYNAMIC,
        D3D11_CPU_ACCESS_WRITE);

    D3D11_SUBRESOURCE_DATA initData = { m_instanceTransforms.get(), 0, 0 };

    DX::ThrowIfFailed(
        device->CreateBuffer(&desc, &initData,
            m_instancedVB.ReleaseAndGetAddressOf())
    );
}

In Game.cpp, add to the TODO of CreateWindowSizeDependentResources:

auto size = m_deviceResources->GetOutputSize();
m_view = Matrix::CreateLookAt(Vector3(0.f, 0.f, 12.f),
    Vector3::Zero, Vector3::UnitY);
m_proj = Matrix::CreatePerspectiveFieldOfView(XM_PI / 4.f,
    float(size.right) / float(size.bottom), 0.1f, 25.f);

m_effect->SetView(m_view);
m_effect->SetProjection(m_proj);

In Game.cpp, add to the TODO of OnDeviceLost:

m_effect.reset();
m_shape.reset();
m_instanceLayout.Reset();
m_instancedVB.Reset();
m_brickDiffuse.Reset();
m_brickNormal.Reset();
m_brickSpecular.Reset();

In Game.cpp, add to the TODO of Render:

m_shape->DrawInstanced(m_effect.get(), m_instanceLayout.Get(), m_instanceCount, false, false, 0, [=]()
    {
        UINT stride = sizeof(XMFLOAT3X4);
        UINT offset = 0;
        context->IASetVertexBuffers(1, 1, m_instancedVB.GetAddressOf(), &stride, &offset);
    });

Build and run to see 64 spheres rendered.

Screenshot of many sphere

Adding animation

You can provide updated instance data per-frame to animate the locations, while leaving the original vertex data 'static' for best performance.

In Game.cpp, add to the TODO of Update:

auto time = static_cast<float>(m_timer.GetTotalSeconds());

size_t j = 0;
for (float y = -6.f; y < 6.f; y += 1.5f)
{
    for (float x = -6.f; x < 6.f; x += 1.5f)
    {
        XMMATRIX m = XMMatrixTranslation(x,
            y,
            cos(time + float(x) * XM_PIDIV4)
            * sin(time + float(y) * XM_PIDIV4)
            * 2.f);
        XMStoreFloat3x4(&m_instanceTransforms[j], m);
        ++j;
    }
}

assert(j == m_instanceCount);

In Game.cpp, add to the TODO of Render prior to the call to DrawInstanced:

{
    MapGuard map(context, m_instancedVB.Get(), 0, D3D11_MAP_WRITE_DISCARD, 0);
    memcpy(map.pData, m_instanceTransforms.get(),
        m_instanceCount * sizeof(XMFLOAT3X4));
}

It's important that the map be in it's own scope since it's an RAII helper class which needs to get destructed before the call to DrawInstanced.

Build and run to see the spheres moving individually.

Screenshot of many moving sphere

More to explore

  • GPU instancing is also supported by DebugEffect and PBREffect

  • While BasicEffect does not support instancing, you can use NormalMapEffect to emulate BasicEffect by providing a default 1x1 white texture (i.e. RGBA32 value 0xFFFFFFFF) and/or a smooth 1x1 normal map texture (i.e. RGBA32 value 0x8080FFFF).

  • To render Model with instancing, you must use custom drawing to update the effect(s) to support instancing, create the proper input layout object, as well as call DrawInstanced for each ModelMeshPart.

Next lessons: Creating custom shaders with DGSL

For Use

  • Universal Windows Platform apps
  • Windows desktop apps
  • Windows 11
  • Windows 10
  • Windows 8.1
  • Xbox One

Architecture

  • x86
  • x64
  • ARM64

For Development

  • Visual Studio 2022
  • Visual Studio 2019 (16.11)
  • clang/LLVM v12 - v18
  • MinGW 12.2, 13.2
  • CMake 3.20

Related Projects

DirectX Tool Kit for DirectX 12

DirectXMesh

DirectXTex

DirectXMath

Win2D

Tools

Test Suite

Model Viewer

Content Exporter

DxCapsViewer

Clone this wiki locally