Inference Load Modes

Perf Analyzer has several modes for generating inference request load for a model.

Concurrency Mode

In concurrency mode, Perf Analyzer attempts to send inference requests to the server such that N requests are always outstanding during profiling. For example, when using --concurrency-range=4, Perf Analyzer will to attempt to have 4 outgoing inference requests at all times during profiling.

Request Rate Mode

In request rate mode, Perf Analyzer attempts to send N inference requests per second to the server during profiling. For example, when using --request-rate-range=20, Perf Analyzer will attempt to send 20 requests per second during profiling.

Custom Interval Mode

In custom interval mode, Perf Analyzer attempts to send inference requests according to intervals (between requests, looping if necessary) provided by the user in the form of a text file with one time interval (in microseconds) per line. For example, when using --request-intervals=my_intervals.txt, where my_intervals.txt contains:

100000
200000
500000

Perf Analyzer will attempt to send requests at the following times: 0.1s, 0.3s, 0.8s, 0.9s, 1.1s, 1.6s, and so on, during profiling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference_load_modes.md

inference_load_modes.md

Inference Load Modes

Concurrency Mode

Request Rate Mode

Custom Interval Mode

Files

inference_load_modes.md

Latest commit

History

inference_load_modes.md

File metadata and controls

Inference Load Modes

Concurrency Mode

Request Rate Mode

Custom Interval Mode