Perf Analyzer has several modes for generating inference request load for a model.
In concurrency mode, Perf Analyzer attempts to send inference requests to the
server such that N requests are always outstanding during profiling. For
example, when using
--concurrency-range=4
, Perf Analyzer
will to attempt to have 4 outgoing inference requests at all times during
profiling.
In request rate mode, Perf Analyzer attempts to send N inference requests per
second to the server during profiling. For example, when using
--request-rate-range=20
, Perf
Analyzer will attempt to send 20 requests per second during profiling.
In custom interval mode, Perf Analyzer attempts to send inference requests
according to intervals (between requests, looping if necessary) provided by the
user in the form of a text file with one time interval (in microseconds) per
line. For example, when using
--request-intervals=my_intervals.txt
,
where my_intervals.txt
contains:
100000
200000
500000
Perf Analyzer will attempt to send requests at the following times: 0.1s, 0.3s, 0.8s, 0.9s, 1.1s, 1.6s, and so on, during profiling.