Skip to content

Commit

Permalink
Results from GH action on NVIDIA_RTX4090x2
Browse files Browse the repository at this point in the history
  • Loading branch information
arjunsuresh committed Dec 25, 2024
1 parent 4a1fa5f commit 8d3af46
Show file tree
Hide file tree
Showing 56 changed files with 20,497 additions and 20,497 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ pip install -U cmind

cm rm cache -f

cm pull repo mlcommons@mlperf-automations --checkout=225220c7d9bb7e66e5b9a1e1ebfc3e0180fbd094
cm pull repo mlcommons@mlperf-automations --checkout=a90475d2de72bf0622cebe8d5ca8eb8c9d872fbd

cm run script \
--tags=app,mlperf,inference,generic,_nvidia,_retinanet,_tensorrt,_cuda,_valid,_r4.1-dev_default,_multistream \
Expand Down Expand Up @@ -71,7 +71,7 @@ cm run script \
--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=yes \
--env.CM_DOCKER_DETACHED_MODE=yes \
--env.CM_MLPERF_INFERENCE_RESULTS_DIR_=/home/arjun/gh_action_results/valid_results \
--env.CM_DOCKER_CONTAINER_ID=e891b1c381b6 \
--env.CM_DOCKER_CONTAINER_ID=f77641321894 \
--env.CM_MLPERF_LOADGEN_COMPLIANCE_TEST=TEST01 \
--add_deps_recursive.compiler.tags=gcc \
--add_deps_recursive.coco2014-original.tags=_full \
Expand Down Expand Up @@ -129,7 +129,7 @@ Platform: RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config
Model Precision: int8

### Accuracy Results
`mAP`: `37.356`, Required accuracy for closed division `>= 37.1745`
`mAP`: `37.327`, Required accuracy for closed division `>= 37.1745`

### Performance Results
`Samples per query`: `5654438.0`
`Samples per query`: `5623507.0`
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
[2024-12-23 02:36:38,287 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
[2024-12-23 02:36:38,369 harness.py:249 INFO] The harness will load 2 plugins: ['build/plugins/NMSOptPlugin/libnmsoptplugin.so', 'build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so']
[2024-12-23 02:36:38,369 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/retinanet/MultiStream
[2024-12-23 02:36:38,369 __init__.py:46 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=true --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/c44de86dadc4420199ad25d6fa563275.conf" --gpu_engines="./build/engines/RTX4090x2/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model retinanet --response_postprocess openimageeffnms
[2024-12-23 02:36:38,369 __init__.py:53 INFO] Overriding Environment
[2024-12-25 01:40:51,607 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2
[2024-12-25 01:40:51,689 harness.py:249 INFO] The harness will load 2 plugins: ['build/plugins/NMSOptPlugin/libnmsoptplugin.so', 'build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so']
[2024-12-25 01:40:51,689 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/retinanet/MultiStream
[2024-12-25 01:40:51,689 __init__.py:46 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=true --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/00cdca2a511c4f33bad6658d0557db67.conf" --gpu_engines="./build/engines/RTX4090x2/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model retinanet --response_postprocess openimageeffnms
[2024-12-25 01:40:51,689 __init__.py:53 INFO] Overriding Environment
benchmark : Benchmark.Retinanet
buffer_manager_thread_count : 0
data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data
Expand All @@ -12,7 +12,7 @@ gpu_copy_streams : 1
gpu_inference_streams : 1
input_dtype : int8
input_format : linear
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.23-02.36.37
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.25-01.40.50
map_path : data_maps/open-images-v6-mlperf/val_map.txt
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf
multi_stream_expected_latency_ns : 0
Expand All @@ -26,7 +26,7 @@ tensor_path : build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet
test_mode : AccuracyOnly
use_deque_limit : True
use_graphs : True
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/c44de86dadc4420199ad25d6fa563275.conf
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/00cdca2a511c4f33bad6658d0557db67.conf
system_id : RTX4090x2
config_name : RTX4090x2_retinanet_MultiStream
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
Expand All @@ -40,81 +40,81 @@ power_limit : None
cpu_freq : None
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: /home/cmuser/CM/repos/local/cache/5860c00d55d14786/inference/mlperf.conf
[I] user.conf path: /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/c44de86dadc4420199ad25d6fa563275.conf
[I] user.conf path: /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/00cdca2a511c4f33bad6658d0557db67.conf
Creating QSL.
Finished Creating QSL.
Setting up SUT.
[I] [TRT] Loaded engine size: 72 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 125, GPU 881 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 127, GPU 891 (MiB)
[I] [TRT] Loaded engine size: 73 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 126, GPU 881 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 128, GPU 891 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +68, now: CPU 0, GPU 68 (MiB)
[I] Device:0.GPU: [0] ./build/engines/RTX4090x2/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[I] [TRT] Loaded engine size: 72 MiB
[I] [TRT] Loaded engine size: 73 MiB
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 159, GPU 624 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 160, GPU 634 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 161, GPU 624 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 162, GPU 634 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +69, now: CPU 0, GPU 137 (MiB)
[I] Device:1.GPU: [0] ./build/engines/RTX4090x2/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 88, GPU 893 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 88, GPU 901 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 89, GPU 893 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 90, GPU 901 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +1528, now: CPU 1, GPU 1665 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 89, GPU 636 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 89, GPU 644 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1528, now: CPU 1, GPU 3193 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 90, GPU 636 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 90, GPU 644 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1527, now: CPU 1, GPU 3192 (MiB)
[I] Start creating CUDA graphs
[I] Capture 2 CUDA graphs
[I] Capture 2 CUDA graphs
[I] Finish creating CUDA graphs
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
Finished setting up SUT.
Starting warmup. Running for a minimum of 5 seconds.
Finished warmup. Ran for 5.14292s.
Finished warmup. Ran for 5.14343s.
Starting running actual test.

No warnings encountered during test.

No errors encountered during test.
Finished running actual test.
Device Device:0.GPU processed:
6195 batches of size 2
6196 batches of size 2
Memcpy Calls: 0
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 6195
BatchedCudaMemcpy Calls: 6196
Device Device:1.GPU processed:
6197 batches of size 2
6196 batches of size 2
Memcpy Calls: 0
PerSampleCudaMemcpy Calls: 0
BatchedCudaMemcpy Calls: 6197
BatchedCudaMemcpy Calls: 6196
&&&& PASSED Default_Harness # ./build/bin/harness_default
[2024-12-23 02:37:18,118 run_harness.py:166 INFO] Result: Accuracy run detected.
[2024-12-23 02:37:18,118 __init__.py:46 INFO] Running command: python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/vision/classification_and_detection/tools/accuracy-openimages.py --mlperf-accuracy-file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy/mlperf_log_accuracy.json --openimages-dir /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/preprocessed_data/open-images-v6-mlperf --output-file build/retinanet-results.json
[2024-12-25 01:41:30,299 run_harness.py:166 INFO] Result: Accuracy run detected.
[2024-12-25 01:41:30,299 __init__.py:46 INFO] Running command: python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/vision/classification_and_detection/tools/accuracy-openimages.py --mlperf-accuracy-file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy/mlperf_log_accuracy.json --openimages-dir /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/preprocessed_data/open-images-v6-mlperf --output-file build/retinanet-results.json
loading annotations into memory...
Done (t=0.52s)
Done (t=0.53s)
creating index...
index created!
Loading and preparing results...
DONE (t=17.77s)
DONE (t=17.92s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=133.59s).
DONE (t=133.92s).
Accumulating evaluation results...
DONE (t=31.87s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.374
DONE (t=32.42s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.373
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.522
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.404
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.403
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.022
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.125
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.413
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.419
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.599
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.628
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.082
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.345
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.678
mAP=37.356%
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.344
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.677
mAP=37.327%

======================== Result summaries: ========================

Loading

0 comments on commit 8d3af46

Please sign in to comment.