Skip to content

Commit

Permalink
Results from GH action on NVIDIA_RTX4090x1
Browse files Browse the repository at this point in the history
  • Loading branch information
arjunsuresh committed Dec 25, 2024
1 parent 1317bae commit 9bea238
Show file tree
Hide file tree
Showing 25 changed files with 890 additions and 890 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ pip install -U cmind

cm rm cache -f

cm pull repo mlcommons@mlperf-automations --checkout=225220c7d9bb7e66e5b9a1e1ebfc3e0180fbd094
cm pull repo mlcommons@mlperf-automations --checkout=a90475d2de72bf0622cebe8d5ca8eb8c9d872fbd

cm run script \
--tags=app,mlperf,inference,generic,_nvidia,_bert-99,_tensorrt,_cuda,_valid,_r4.1-dev_default,_offline \
Expand Down Expand Up @@ -71,7 +71,7 @@ cm run script \
--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=yes \
--env.CM_DOCKER_DETACHED_MODE=yes \
--env.CM_MLPERF_INFERENCE_RESULTS_DIR_=/home/arjun/gh_action_results/valid_results \
--env.CM_DOCKER_CONTAINER_ID=b8ce345e7cfa \
--env.CM_DOCKER_CONTAINER_ID=c66a229e22cf \
--env.CM_MLPERF_LOADGEN_COMPLIANCE_TEST=TEST01 \
--add_deps_recursive.compiler.tags=gcc \
--add_deps_recursive.coco2014-original.tags=_full \
Expand Down Expand Up @@ -129,4 +129,4 @@ Model Precision: int8
`F1`: `90.15674`, Required accuracy for closed division `>= 89.96526`

### Performance Results
`Samples per second`: `4119.08`
`Samples per second`: `4125.31`
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[2024-12-23 02:01:52,311 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x1
[2024-12-23 02:01:52,813 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x1_TRT/bert-99/Offline
[2024-12-23 02:01:52,813 __init__.py:46 INFO] Running command: ./build/bin/harness_bert --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=10833 --test_mode="AccuracyOnly" --gpu_batch_size=256 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/adc9dc0382544ffd/inference/mlperf.conf" --tensor_path="build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/eb9854064bde419c94548e4a59d8e6b4.conf" --gpu_inference_streams=2 --gpu_copy_streams=2 --gpu_engines="./build/engines/RTX4090x1/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan" --scenario Offline --model bert
[2024-12-23 02:01:52,813 __init__.py:53 INFO] Overriding Environment
[2024-12-24 23:00:52,890 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x1
[2024-12-24 23:00:53,380 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x1_TRT/bert-99/Offline
[2024-12-24 23:00:53,380 __init__.py:46 INFO] Running command: ./build/bin/harness_bert --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=10833 --test_mode="AccuracyOnly" --gpu_batch_size=256 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/e824f0bb4919400f/inference/mlperf.conf" --tensor_path="build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/aab71c24a07442c090544e000738c311.conf" --gpu_inference_streams=2 --gpu_copy_streams=2 --gpu_engines="./build/engines/RTX4090x1/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan" --scenario Offline --model bert
[2024-12-24 23:00:53,380 __init__.py:53 INFO] Overriding Environment
benchmark : Benchmark.BERT
buffer_manager_thread_count : 0
coalesced_tensor : True
Expand All @@ -11,8 +11,8 @@ gpu_copy_streams : 2
gpu_inference_streams : 2
input_dtype : int32
input_format : linear
log_dir : /home/cmuser/CM/repos/local/cache/ba8d5f2a6bc546f9/repo/closed/NVIDIA/build/logs/2024.12.23-02.01.51
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/adc9dc0382544ffd/inference/mlperf.conf
log_dir : /home/cmuser/CM/repos/local/cache/ba8d5f2a6bc546f9/repo/closed/NVIDIA/build/logs/2024.12.24-23.00.51
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/e824f0bb4919400f/inference/mlperf.conf
offline_expected_qps : 0.0
precision : int8
preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/a8c152aef5494496/preprocessed_data
Expand All @@ -21,7 +21,7 @@ system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AM
tensor_path : build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy
test_mode : AccuracyOnly
use_graphs : False
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/eb9854064bde419c94548e4a59d8e6b4.conf
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/aab71c24a07442c090544e000738c311.conf
system_id : RTX4090x1
config_name : RTX4090x1_bert_Offline
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
Expand All @@ -34,40 +34,40 @@ skip_file_checks : True
power_limit : None
cpu_freq : None
&&&& RUNNING BERT_HARNESS # ./build/bin/harness_bert
I1223 02:01:52.866398 19723 main_bert.cc:163] Found 1 GPUs
I1223 02:01:53.259613 19723 bert_server.cc:147] Engine Path: ./build/engines/RTX4090x1/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan
I1224 23:00:53.437846 19726 main_bert.cc:163] Found 1 GPUs
I1224 23:00:53.830546 19726 bert_server.cc:147] Engine Path: ./build/engines/RTX4090x1/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan
[I] [TRT] Loaded engine size: 414 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +7, GPU +8, now: CPU 582, GPU 1232 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 583, GPU 1242 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +8, now: CPU 582, GPU 1232 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 584, GPU 1242 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +290, now: CPU 0, GPU 290 (MiB)
I1223 02:01:53.596822 19723 bert_server.cc:208] Engines Creation Completed
I1223 02:01:53.616966 19723 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1223 02:01:53.616971 19723 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1223 02:01:53.616976 19723 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 170, GPU 1908 (MiB)
I1224 23:00:54.158447 19726 bert_server.cc:208] Engines Creation Completed
I1224 23:00:54.175813 19726 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1224 23:00:54.175822 19726 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1224 23:00:54.175827 19726 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 170, GPU 1908 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 170, GPU 1916 (MiB)
I1223 02:01:53.672140 19723 bert_core_vs.cc:426] Setting Opt.Prof. to 0
I1223 02:01:53.672174 19723 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1224 23:00:54.230412 19726 bert_core_vs.cc:426] Setting Opt.Prof. to 0
I1224 23:00:54.230443 19726 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 290 (MiB)
I1223 02:01:53.673125 19723 bert_core_vs.cc:476] Setup complete
I1223 02:01:53.673296 19723 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1223 02:01:53.673300 19723 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1223 02:01:53.673302 19723 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 292, GPU 2722 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 293, GPU 2732 (MiB)
I1223 02:01:53.727489 19723 bert_core_vs.cc:426] Setting Opt.Prof. to 1
I1224 23:00:54.231410 19726 bert_core_vs.cc:476] Setup complete
I1224 23:00:54.231583 19726 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608
I1224 23:00:54.231587 19726 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2
I1224 23:00:54.231590 19726 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 293, GPU 2722 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 293, GPU 2732 (MiB)
I1224 23:00:54.285146 19726 bert_core_vs.cc:426] Setting Opt.Prof. to 1
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +0, now: CPU 1, GPU 290 (MiB)
I1223 02:01:53.727843 19723 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1223 02:01:53.728783 19723 bert_core_vs.cc:476] Setup complete
I1223 02:01:53.957628 19723 main_bert.cc:184] Starting running actual test.
I1223 02:01:56.618793 19723 main_bert.cc:190] Finished running actual test.
I1224 23:00:54.285508 19726 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256
I1224 23:00:54.286469 19726 bert_core_vs.cc:476] Setup complete
I1224 23:00:54.514549 19726 main_bert.cc:184] Starting running actual test.
I1224 23:00:57.166287 19726 main_bert.cc:190] Finished running actual test.

No warnings encountered during test.

No errors encountered during test.
[2024-12-23 02:01:56,881 run_harness.py:166 INFO] Result: Accuracy run detected.
[2024-12-23 02:01:56,881 __init__.py:46 INFO] Running command: PYTHONPATH=code/bert/tensorrt/helpers python3 /home/cmuser/CM/repos/local/cache/ba8d5f2a6bc546f9/repo/closed/NVIDIA/build/inference/language/bert/accuracy-squad.py --log_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/mlperf_log_accuracy.json --vocab_file build/models/bert/vocab.txt --val_data /home/cmuser/CM/repos/local/cache/a8c152aef5494496/data/squad/dev-v1.1.json --out_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/predictions.json --output_dtype float16
[2024-12-24 23:00:57,429 run_harness.py:166 INFO] Result: Accuracy run detected.
[2024-12-24 23:00:57,429 __init__.py:46 INFO] Running command: PYTHONPATH=code/bert/tensorrt/helpers python3 /home/cmuser/CM/repos/local/cache/ba8d5f2a6bc546f9/repo/closed/NVIDIA/build/inference/language/bert/accuracy-squad.py --log_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/mlperf_log_accuracy.json --vocab_file build/models/bert/vocab.txt --val_data /home/cmuser/CM/repos/local/cache/a8c152aef5494496/data/squad/dev-v1.1.json --out_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/predictions.json --output_dtype float16
{"exact_match": 82.81929990539263, "f1": 90.15673510616978}
Reading examples...
Loading cached features from 'eval_features.pickle'...
Expand Down
Loading

0 comments on commit 9bea238

Please sign in to comment.