Skip to content

Latest commit

 

History

History
66 lines (54 loc) · 2.78 KB

inference_load_modes.md

File metadata and controls

66 lines (54 loc) · 2.78 KB

Inference Load Modes

Perf Analyzer has several modes for generating inference request load for a model.

Concurrency Mode

In concurrency mode, Perf Analyzer attempts to send inference requests to the server such that N requests are always outstanding during profiling. For example, when using --concurrency-range=4, Perf Analyzer will to attempt to have 4 outgoing inference requests at all times during profiling.

Request Rate Mode

In request rate mode, Perf Analyzer attempts to send N inference requests per second to the server during profiling. For example, when using --request-rate-range=20, Perf Analyzer will attempt to send 20 requests per second during profiling.

Custom Interval Mode

In custom interval mode, Perf Analyzer attempts to send inference requests according to intervals (between requests, looping if necessary) provided by the user in the form of a text file with one time interval (in microseconds) per line. For example, when using --request-intervals=my_intervals.txt, where my_intervals.txt contains:

100000
200000
500000

Perf Analyzer will attempt to send requests at the following times: 0.1s, 0.3s, 0.8s, 0.9s, 1.1s, 1.6s, and so on, during profiling.