Skip to content

Bottlenecks Bench

Philip Bedoukian edited this page Apr 9, 2021 · 3 revisions

Bottlenecks per bench.

Gemm/2mm/3mm

  • Vector pipeline stalls.

Fdtd2d

  • Router stalls and frame stalls --> trying to increase fetch width 8->16

Gesummv

  • Frame stalls --> try to increase fetch width 8->16

Conv2d

  • Scalar core too much work. Longlines mostly resolves this. But still seems like the main bottleneck

syrk/syr2k

  • Not a ton of stalls, but pretty sensitive to network width. Not sure why SIMD throughput irrelevant (+12% perf) on this one.
Clone this wiki locally