generated from mlcommons/mlperf_inference_submissions
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Results from GH action on NVIDIA_RTX4090x2
- Loading branch information
1 parent
976dbf8
commit 9442986
Showing
41 changed files
with
13,278 additions
and
0 deletions.
There are no files selected for viewing
132 changes: 132 additions & 0 deletions
132
...vidia_original-gpu-tensorrt-vdefault-default_config/bert-99.9/offline/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops). | ||
|
||
*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.* | ||
|
||
## Host platform | ||
|
||
* OS version: Linux-6.8.0-49-generic-x86_64-with-glibc2.29 | ||
* CPU version: x86_64 | ||
* Python version: 3.8.10 (default, Nov 7 2024, 13:10:47) | ||
[GCC 9.4.0] | ||
* MLCommons CM version: 3.5.2 | ||
|
||
## CM Run Command | ||
|
||
See [CM installation guide](https://docs.mlcommons.org/inference/install/). | ||
|
||
```bash | ||
pip install -U cmind | ||
|
||
cm rm cache -f | ||
|
||
cm pull repo mlcommons@mlperf-automations --checkout=225220c7d9bb7e66e5b9a1e1ebfc3e0180fbd094 | ||
|
||
cm run script \ | ||
--tags=app,mlperf,inference,generic,_nvidia,_bert-99.9,_tensorrt,_cuda,_valid,_r4.1-dev_default,_offline \ | ||
--quiet=true \ | ||
--env.CM_QUIET=yes \ | ||
--env.CM_MLPERF_IMPLEMENTATION=nvidia \ | ||
--env.CM_MLPERF_MODEL=bert-99.9 \ | ||
--env.CM_MLPERF_RUN_STYLE=valid \ | ||
--env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False \ | ||
--env.CM_DOCKER_PRIVILEGED_MODE=True \ | ||
--env.CM_MLPERF_BACKEND=tensorrt \ | ||
--env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter,edge \ | ||
--env.CM_MLPERF_CLEAN_ALL=True \ | ||
--env.CM_MLPERF_DEVICE=cuda \ | ||
--env.CM_MLPERF_SUBMISSION_DIVISION=closed \ | ||
--env.CM_MLPERF_USE_DOCKER=True \ | ||
--env.CM_NVIDIA_GPU_NAME=rtx_4090 \ | ||
--env.CM_HW_NAME=RTX4090x2 \ | ||
--env.CM_RUN_MLPERF_SUBMISSION_PREPROCESSOR=yes \ | ||
--env.CM_MLPERF_INFERENCE_PULL_CODE_CHANGES=yes \ | ||
--env.CM_MLPERF_INFERENCE_PULL_SRC_CHANGES=yes \ | ||
--env.OUTPUT_BASE_DIR=/home/arjun/gh_action_results \ | ||
--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/arjun/gh_action_submissions \ | ||
--env.CM_MLPERF_SUBMITTER=MLCommons \ | ||
--env.CM_USE_DATASET_FROM_HOST=yes \ | ||
--env.CM_USE_MODEL_FROM_HOST=yes \ | ||
--env.CM_MLPERF_LOADGEN_ALL_SCENARIOS=yes \ | ||
--env.CM_MLPERF_LOADGEN_COMPLIANCE=yes \ | ||
--env.CM_MLPERF_SUBMISSION_RUN=yes \ | ||
--env.CM_RUN_MLPERF_ACCURACY=on \ | ||
--env.CM_RUN_SUBMISSION_CHECKER=yes \ | ||
--env.CM_TAR_SUBMISSION_DIR=yes \ | ||
--env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full \ | ||
--env.CM_MLPERF_INFERENCE_VERSION=5.0-dev \ | ||
--env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1-dev_default \ | ||
--env.CM_MLPERF_LOADGEN_ALL_MODES=yes \ | ||
--env.CM_MLPERF_INFERENCE_SOURCE_VERSION=5.0.4 \ | ||
--env.CM_MLPERF_LAST_RELEASE=v5.0 \ | ||
--env.CM_TMP_PIP_VERSION_STRING= \ | ||
--env.CM_MODEL=bert-99.9 \ | ||
--env.CM_MLPERF_CLEAN_SUBMISSION_DIR=yes \ | ||
--env.CM_RERUN=yes \ | ||
--env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= \ | ||
--env.CM_MLPERF_LOADGEN_MODE=performance \ | ||
--env.CM_MLPERF_LOADGEN_SCENARIO=Offline \ | ||
--env.CM_MLPERF_LOADGEN_SCENARIOS,=SingleStream,Offline \ | ||
--env.CM_MLPERF_LOADGEN_MODES,=performance,accuracy \ | ||
--env.CM_OUTPUT_FOLDER_NAME=valid_results \ | ||
--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=yes \ | ||
--env.CM_DOCKER_DETACHED_MODE=yes \ | ||
--env.CM_MLPERF_INFERENCE_RESULTS_DIR_=/home/arjun/gh_action_results/valid_results \ | ||
--env.CM_DOCKER_CONTAINER_ID=6f820c3674f7 \ | ||
--env.CM_MLPERF_LOADGEN_COMPLIANCE_TEST=TEST01 \ | ||
--add_deps_recursive.compiler.tags=gcc \ | ||
--add_deps_recursive.coco2014-original.tags=_full \ | ||
--add_deps_recursive.coco2014-preprocessed.tags=_full \ | ||
--add_deps_recursive.imagenet-original.tags=_full \ | ||
--add_deps_recursive.imagenet-preprocessed.tags=_full \ | ||
--add_deps_recursive.openimages-original.tags=_full \ | ||
--add_deps_recursive.openimages-preprocessed.tags=_full \ | ||
--add_deps_recursive.openorca-original.tags=_full \ | ||
--add_deps_recursive.openorca-preprocessed.tags=_full \ | ||
--add_deps_recursive.coco2014-dataset.tags=_full \ | ||
--add_deps_recursive.igbh-dataset.tags=_full \ | ||
--add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1-dev \ | ||
--add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1-dev \ | ||
--add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1-dev \ | ||
--adr.compiler.tags=gcc \ | ||
--adr.coco2014-original.tags=_full \ | ||
--adr.coco2014-preprocessed.tags=_full \ | ||
--adr.imagenet-original.tags=_full \ | ||
--adr.imagenet-preprocessed.tags=_full \ | ||
--adr.openimages-original.tags=_full \ | ||
--adr.openimages-preprocessed.tags=_full \ | ||
--adr.openorca-original.tags=_full \ | ||
--adr.openorca-preprocessed.tags=_full \ | ||
--adr.coco2014-dataset.tags=_full \ | ||
--adr.igbh-dataset.tags=_full \ | ||
--adr.get-mlperf-inference-results-dir.tags=_version.r4_1-dev \ | ||
--adr.get-mlperf-inference-submission-dir.tags=_version.r4_1-dev \ | ||
--adr.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1-dev \ | ||
--v=False \ | ||
--print_env=False \ | ||
--print_deps=False \ | ||
--dump_version_info=True \ | ||
--env.OUTPUT_BASE_DIR=/cm-mount/home/arjun/gh_action_results \ | ||
--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/cm-mount/home/arjun/gh_action_submissions \ | ||
--env.MLPERF_SCRATCH_PATH=/home/cmuser/CM/repos/local/cache/4db00c74da1e44c8 | ||
``` | ||
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts), | ||
you should simply reload mlcommons@mlperf-automations without checkout and clean CM cache as follows:* | ||
|
||
```bash | ||
cm rm repo mlcommons@mlperf-automations | ||
cm pull repo mlcommons@mlperf-automations | ||
cm rm cache -f | ||
|
||
``` | ||
|
||
## Results | ||
|
||
Platform: RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config | ||
|
||
Model Precision: fp16 | ||
|
||
### Accuracy Results | ||
`F1`: `90.88324`, Required accuracy for closed division `>= 90.78313` | ||
|
||
### Performance Results | ||
`Samples per second`: `3347.07` |
7 changes: 7 additions & 0 deletions
7
...fig/bert-99.9/offline/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"starting_weights_filename": "https://zenodo.org/record/3750364/files/bert_large_v1_1_fake_quant.onnx", | ||
"retraining": "no", | ||
"input_data_types": "int32", | ||
"weight_data_types": "fp16", | ||
"weight_transformations": "quantization, affine fusion" | ||
} |
104 changes: 104 additions & 0 deletions
104
...idia_original-gpu-tensorrt-vdefault-default_config/bert-99.9/offline/accuracy_console.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
[2024-12-22 22:30:04,728 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x2 | ||
[2024-12-22 22:30:05,254 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x2_TRT/bert-99.9/Offline | ||
[2024-12-22 22:30:05,254 __init__.py:46 INFO] Running command: ./build/bin/harness_bert --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99.9/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=10833 --test_mode="AccuracyOnly" --gpu_batch_size=256 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/84d2324f5fa344ed/inference/mlperf.conf" --tensor_path="build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/60a37c5ad0bc45cf8a97ac19c6979aa1.conf" --gpu_inference_streams=2 --gpu_copy_streams=2 --gpu_engines="./build/engines/RTX4090x2/bert/Offline/bert-Offline-gpu-fp16_S_384_B_256_P_2_vs.custom_k_99_9_MaxP.plan" --scenario Offline --model bert | ||
[2024-12-22 22:30:05,254 __init__.py:53 INFO] Overriding Environment | ||
benchmark : Benchmark.BERT | ||
buffer_manager_thread_count : 0 | ||
coalesced_tensor : True | ||
data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data | ||
gpu_batch_size : 256 | ||
gpu_copy_streams : 2 | ||
gpu_inference_streams : 2 | ||
input_dtype : int32 | ||
input_format : linear | ||
log_dir : /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/logs/2024.12.22-22.30.03 | ||
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/84d2324f5fa344ed/inference/mlperf.conf | ||
offline_expected_qps : 0.0 | ||
precision : fp16 | ||
preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/preprocessed_data | ||
scenario : Scenario.Offline | ||
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) w7-2495X', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=2): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=197.334532, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=197334532000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1, GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=500.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=1), system_id='RTX4090x2') | ||
tensor_path : build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy | ||
test_mode : AccuracyOnly | ||
use_graphs : False | ||
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/60a37c5ad0bc45cf8a97ac19c6979aa1.conf | ||
system_id : RTX4090x2 | ||
config_name : RTX4090x2_bert_Offline | ||
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99_9, PowerSetting.MaxP) | ||
optimization_level : plugin-enabled | ||
num_profiles : 2 | ||
config_ver : custom_k_99_9_MaxP | ||
accuracy_level : 99.9% | ||
inference_server : custom | ||
skip_file_checks : True | ||
power_limit : None | ||
cpu_freq : None | ||
&&&& RUNNING BERT_HARNESS # ./build/bin/harness_bert | ||
I1222 22:30:05.299837 20259 main_bert.cc:163] Found 2 GPUs | ||
I1222 22:30:05.419869 20259 bert_server.cc:147] Engine Path: ./build/engines/RTX4090x2/bert/Offline/bert-Offline-gpu-fp16_S_384_B_256_P_2_vs.custom_k_99_9_MaxP.plan | ||
[I] [TRT] Loaded engine size: 700 MiB | ||
[I] [TRT] Loaded engine size: 700 MiB | ||
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +8, GPU +10, now: CPU 1008, GPU 1511 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 1010, GPU 1521 (MiB) | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1152, now: CPU 0, GPU 1152 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 1018, GPU 1254 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 1019, GPU 1264 (MiB) | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +1, GPU +576, now: CPU 1, GPU 1152 (MiB) | ||
I1222 22:30:06.060315 20259 bert_server.cc:208] Engines Creation Completed | ||
I1222 22:30:06.093281 20259 bert_core_vs.cc:385] Engine - Device Memory requirements: 1409287680 | ||
I1222 22:30:06.093286 20259 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2 | ||
I1222 22:30:06.093291 20259 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384 | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 319, GPU 2859 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 320, GPU 2867 (MiB) | ||
I1222 22:30:06.164353 20259 bert_core_vs.cc:426] Setting Opt.Prof. to 0 | ||
I1222 22:30:06.164381 20259 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256 | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 1, GPU 1152 (MiB) | ||
I1222 22:30:06.165206 20259 bert_core_vs.cc:476] Setup complete | ||
I1222 22:30:06.165371 20259 bert_core_vs.cc:385] Engine - Device Memory requirements: 1409287680 | ||
I1222 22:30:06.165374 20259 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2 | ||
I1222 22:30:06.165378 20259 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384 | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 443, GPU 2602 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 443, GPU 2610 (MiB) | ||
I1222 22:30:06.244166 20259 bert_core_vs.cc:426] Setting Opt.Prof. to 0 | ||
I1222 22:30:06.244184 20259 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256 | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +0, now: CPU 2, GPU 1152 (MiB) | ||
I1222 22:30:06.245065 20259 bert_core_vs.cc:476] Setup complete | ||
I1222 22:30:06.245271 20259 bert_core_vs.cc:385] Engine - Device Memory requirements: 1409287680 | ||
I1222 22:30:06.245275 20259 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2 | ||
I1222 22:30:06.245278 20259 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384 | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 566, GPU 4345 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 566, GPU 4355 (MiB) | ||
I1222 22:30:06.344513 20259 bert_core_vs.cc:426] Setting Opt.Prof. to 1 | ||
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly. | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +0, now: CPU 3, GPU 1152 (MiB) | ||
I1222 22:30:06.344992 20259 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256 | ||
I1222 22:30:06.346107 20259 bert_core_vs.cc:476] Setup complete | ||
I1222 22:30:06.346344 20259 bert_core_vs.cc:385] Engine - Device Memory requirements: 1409287680 | ||
I1222 22:30:06.346349 20259 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2 | ||
I1222 22:30:06.346354 20259 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384 | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 690, GPU 4088 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 690, GPU 4098 (MiB) | ||
I1222 22:30:06.460180 20259 bert_core_vs.cc:426] Setting Opt.Prof. to 1 | ||
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly. | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 3, GPU 1152 (MiB) | ||
I1222 22:30:06.460637 20259 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256 | ||
I1222 22:30:06.461728 20259 bert_core_vs.cc:476] Setup complete | ||
I1222 22:30:07.662860 20259 main_bert.cc:184] Starting running actual test. | ||
I1222 22:30:10.967218 20259 main_bert.cc:190] Finished running actual test. | ||
|
||
No warnings encountered during test. | ||
|
||
No errors encountered during test. | ||
[2024-12-22 22:30:11,194 run_harness.py:166 INFO] Result: Accuracy run detected. | ||
[2024-12-22 22:30:11,195 __init__.py:46 INFO] Running command: PYTHONPATH=code/bert/tensorrt/helpers python3 /home/cmuser/CM/repos/local/cache/94a57f78972843c6/repo/closed/NVIDIA/build/inference/language/bert/accuracy-squad.py --log_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99.9/offline/accuracy/mlperf_log_accuracy.json --vocab_file build/models/bert/vocab.txt --val_data /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data/squad/dev-v1.1.json --out_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99.9/offline/accuracy/predictions.json --output_dtype float16 | ||
{"exact_match": 83.67076631977294, "f1": 90.8832407068292} | ||
Reading examples... | ||
Loading cached features from 'eval_features.pickle'... | ||
Loading LoadGen logs... | ||
Post-processing predictions... | ||
Writing predictions to: /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99.9/offline/accuracy/predictions.json | ||
Evaluating predictions... | ||
|
||
======================== Result summaries: ======================== | ||
|
Oops, something went wrong.