Skip to content

TechDocs AMDGpu

Michael Sartain edited this page Oct 3, 2017 · 2 revisions

XDC 2017 Talk

GPUVis uses AMD GPU events added to Linux Kernel v4.12 to create the gpu timeline.

"gfx" row shows the "SW queue", "HW queue", and "HW Execution" durations. "gfx hw" row shows "HW Execution" sections.

Hovering the mouse over a "HW Execution" block in "gfx hw" will highlight the corresponding section in the "gfx" row and vice versa with a tooltip for details on event IDs, time, and duration.

AMD GPU Events

From TraceEvents::calculate_event_durations() in gpuvis.cpp

 amdgpu_cs_ioctl   amdgpu_sched_run_job   fence_signaled
       |-----------------|---------------------|
       |user-->          |hw-->                |
                                               |
          amdgpu_cs_ioctl  amdgpu_sched_run_job|   fence_signaled
                |-----------------|------------|--------|
                |user-->          |hwqueue-->  |hw->    |  

amdgpu_cs_ioctl

Appears when a job is received from userspace.
Dictates the userspace PID for the whole unit of work.
  (The process that owns the work executing on the gpu represented by the bar.)
Only event executed within the context of the userspace process.

amdgpu_sched_run_job

Links a job to a dma_fence object, the queue into the HW event.
Start of the bar in the gpu timeline;
  either right now if no job is running, or when the currently running job finishes

*fence_signaled

Job completed: dictates the end of the bar

Example

  ; userspace submission
    SkinningApp-2837 475.1688: amdgpu_cs_ioctl:      sched_job=185904, timeline=gfx, context=249, seqno=91446, ring_name=ffff94d7a00d4694, num_ibs=3
  ; gpu starting job
            gfx-477  475.1689: amdgpu_sched_run_job: sched_job=185904, timeline=gfx, context=249, seqno=91446, ring_name=ffff94d7a00d4694, num_ibs=3
  ; job completed
         <idle>-0    475.1690: fence_signaled:       driver=amd_sched timeline=gfx context=249 seqno=91446

Notes

  • amdgpu_cs_ioctl and amdgpu_sched_run_job have a common job handle
  • We want to match timeline, context, seqno in the events
  • There are separate timelines for each gpu engine
  • There are two dma timelines (one per engine)
  • 8 compute timelines (one per hw queue)
  • They are all concurrently executed
  • Many apps will only have a gfx timeline
  • Expect to see traffic on some queues that was not directly initiated by an app as there is some work the kernel submits itself and that won't be linked to any cs_ioctl
Clone this wiki locally