generated from mlcommons/mlperf_inference_submissions
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Results from GH action on NVIDIA_RTX4090x1
- Loading branch information
1 parent
f4e8db8
commit 95ab16f
Showing
63 changed files
with
15,948 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
TBD |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
TBD |
132 changes: 132 additions & 0 deletions
132
...-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops). | ||
|
||
*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.* | ||
|
||
## Host platform | ||
|
||
* OS version: Linux-6.8.0-49-generic-x86_64-with-glibc2.29 | ||
* CPU version: x86_64 | ||
* Python version: 3.8.10 (default, Nov 7 2024, 13:10:47) | ||
[GCC 9.4.0] | ||
* MLCommons CM version: 3.5.2 | ||
|
||
## CM Run Command | ||
|
||
See [CM installation guide](https://docs.mlcommons.org/inference/install/). | ||
|
||
```bash | ||
pip install -U cmind | ||
|
||
cm rm cache -f | ||
|
||
cm pull repo mlcommons@mlperf-automations --checkout=225220c7d9bb7e66e5b9a1e1ebfc3e0180fbd094 | ||
|
||
cm run script \ | ||
--tags=app,mlperf,inference,generic,_nvidia,_bert-99,_tensorrt,_cuda,_valid,_r4.1-dev_default,_offline \ | ||
--quiet=true \ | ||
--env.CM_QUIET=yes \ | ||
--env.CM_MLPERF_IMPLEMENTATION=nvidia \ | ||
--env.CM_MLPERF_MODEL=bert-99 \ | ||
--env.CM_MLPERF_RUN_STYLE=valid \ | ||
--env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False \ | ||
--env.CM_DOCKER_PRIVILEGED_MODE=True \ | ||
--env.CM_MLPERF_BACKEND=tensorrt \ | ||
--env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter,edge \ | ||
--env.CM_MLPERF_CLEAN_ALL=True \ | ||
--env.CM_MLPERF_DEVICE=cuda \ | ||
--env.CM_MLPERF_SUBMISSION_DIVISION=closed \ | ||
--env.CM_MLPERF_USE_DOCKER=True \ | ||
--env.CM_NVIDIA_GPU_NAME=rtx_4090 \ | ||
--env.CM_HW_NAME=RTX4090x1 \ | ||
--env.CM_RUN_MLPERF_SUBMISSION_PREPROCESSOR=yes \ | ||
--env.CM_MLPERF_INFERENCE_PULL_CODE_CHANGES=yes \ | ||
--env.CM_MLPERF_INFERENCE_PULL_SRC_CHANGES=yes \ | ||
--env.OUTPUT_BASE_DIR=/home/arjun/gh_action_results \ | ||
--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/arjun/gh_action_submissions \ | ||
--env.CM_MLPERF_SUBMITTER=MLCommons \ | ||
--env.CM_USE_DATASET_FROM_HOST=yes \ | ||
--env.CM_USE_MODEL_FROM_HOST=yes \ | ||
--env.CM_MLPERF_LOADGEN_ALL_SCENARIOS=yes \ | ||
--env.CM_MLPERF_LOADGEN_COMPLIANCE=yes \ | ||
--env.CM_MLPERF_SUBMISSION_RUN=yes \ | ||
--env.CM_RUN_MLPERF_ACCURACY=on \ | ||
--env.CM_RUN_SUBMISSION_CHECKER=yes \ | ||
--env.CM_TAR_SUBMISSION_DIR=yes \ | ||
--env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full \ | ||
--env.CM_MLPERF_INFERENCE_VERSION=4.1-dev \ | ||
--env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1-dev_default \ | ||
--env.CM_MLPERF_LOADGEN_ALL_MODES=yes \ | ||
--env.CM_MLPERF_INFERENCE_SOURCE_VERSION=4.1.23 \ | ||
--env.CM_MLPERF_LAST_RELEASE=v4.1 \ | ||
--env.CM_TMP_PIP_VERSION_STRING= \ | ||
--env.CM_MODEL=bert-99 \ | ||
--env.CM_MLPERF_CLEAN_SUBMISSION_DIR=yes \ | ||
--env.CM_RERUN=yes \ | ||
--env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= \ | ||
--env.CM_MLPERF_LOADGEN_MODE=performance \ | ||
--env.CM_MLPERF_LOADGEN_SCENARIO=Offline \ | ||
--env.CM_MLPERF_LOADGEN_SCENARIOS,=SingleStream,Offline,Server \ | ||
--env.CM_MLPERF_LOADGEN_MODES,=performance,accuracy \ | ||
--env.CM_OUTPUT_FOLDER_NAME=valid_results \ | ||
--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=yes \ | ||
--env.CM_DOCKER_DETACHED_MODE=yes \ | ||
--env.CM_MLPERF_INFERENCE_RESULTS_DIR_=/home/arjun/gh_action_results/valid_results \ | ||
--env.CM_DOCKER_CONTAINER_ID=bcea82088806 \ | ||
--env.CM_MLPERF_LOADGEN_COMPLIANCE_TEST=TEST01 \ | ||
--add_deps_recursive.compiler.tags=gcc \ | ||
--add_deps_recursive.coco2014-original.tags=_full \ | ||
--add_deps_recursive.coco2014-preprocessed.tags=_full \ | ||
--add_deps_recursive.imagenet-original.tags=_full \ | ||
--add_deps_recursive.imagenet-preprocessed.tags=_full \ | ||
--add_deps_recursive.openimages-original.tags=_full \ | ||
--add_deps_recursive.openimages-preprocessed.tags=_full \ | ||
--add_deps_recursive.openorca-original.tags=_full \ | ||
--add_deps_recursive.openorca-preprocessed.tags=_full \ | ||
--add_deps_recursive.coco2014-dataset.tags=_full \ | ||
--add_deps_recursive.igbh-dataset.tags=_full \ | ||
--add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1-dev \ | ||
--add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1-dev \ | ||
--add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1-dev \ | ||
--adr.compiler.tags=gcc \ | ||
--adr.coco2014-original.tags=_full \ | ||
--adr.coco2014-preprocessed.tags=_full \ | ||
--adr.imagenet-original.tags=_full \ | ||
--adr.imagenet-preprocessed.tags=_full \ | ||
--adr.openimages-original.tags=_full \ | ||
--adr.openimages-preprocessed.tags=_full \ | ||
--adr.openorca-original.tags=_full \ | ||
--adr.openorca-preprocessed.tags=_full \ | ||
--adr.coco2014-dataset.tags=_full \ | ||
--adr.igbh-dataset.tags=_full \ | ||
--adr.get-mlperf-inference-results-dir.tags=_version.r4_1-dev \ | ||
--adr.get-mlperf-inference-submission-dir.tags=_version.r4_1-dev \ | ||
--adr.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1-dev \ | ||
--v=False \ | ||
--print_env=False \ | ||
--print_deps=False \ | ||
--dump_version_info=True \ | ||
--env.OUTPUT_BASE_DIR=/cm-mount/home/arjun/gh_action_results \ | ||
--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/cm-mount/home/arjun/gh_action_submissions \ | ||
--env.MLPERF_SCRATCH_PATH=/home/cmuser/CM/repos/local/cache/5b2b0cc913a4453a | ||
``` | ||
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts), | ||
you should simply reload mlcommons@mlperf-automations without checkout and clean CM cache as follows:* | ||
|
||
```bash | ||
cm rm repo mlcommons@mlperf-automations | ||
cm pull repo mlcommons@mlperf-automations | ||
cm rm cache -f | ||
|
||
``` | ||
|
||
## Results | ||
|
||
Platform: RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config | ||
|
||
Model Precision: int8 | ||
|
||
### Accuracy Results | ||
`F1`: `90.15674`, Required accuracy for closed division `>= 89.96526` | ||
|
||
### Performance Results | ||
`Samples per second`: `4124.17` |
7 changes: 7 additions & 0 deletions
7
...onfig/bert-99/offline/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"starting_weights_filename": "https://zenodo.org/record/3750364/files/bert_large_v1_1_fake_quant.onnx", | ||
"retraining": "no", | ||
"input_data_types": "int32", | ||
"weight_data_types": "int8", | ||
"weight_transformations": "quantization, affine fusion" | ||
} |
80 changes: 80 additions & 0 deletions
80
...nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy_console.out
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
[2024-12-22 20:21:07,756 main.py:229 INFO] Detected system ID: KnownSystem.RTX4090x1 | ||
[2024-12-22 20:21:08,172 generate_conf_files.py:107 INFO] Generated measurements/ entries for RTX4090x1_TRT/bert-99/Offline | ||
[2024-12-22 20:21:08,172 __init__.py:46 INFO] Running command: ./build/bin/harness_bert --logfile_outdir="/cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=10833 --test_mode="AccuracyOnly" --gpu_batch_size=256 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/30a4695143b84b89/inference/mlperf.conf" --tensor_path="build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/403177c9892b404eaf63b19dc57cef7e.conf" --gpu_inference_streams=2 --gpu_copy_streams=2 --gpu_engines="./build/engines/RTX4090x1/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan" --scenario Offline --model bert | ||
[2024-12-22 20:21:08,172 __init__.py:53 INFO] Overriding Environment | ||
benchmark : Benchmark.BERT | ||
buffer_manager_thread_count : 0 | ||
coalesced_tensor : True | ||
data_dir : /home/cmuser/CM/repos/local/cache/5b2b0cc913a4453a/data | ||
gpu_batch_size : 256 | ||
gpu_copy_streams : 2 | ||
gpu_inference_streams : 2 | ||
input_dtype : int32 | ||
input_format : linear | ||
log_dir : /home/cmuser/CM/repos/local/cache/dfbf240f980947f5/repo/closed/NVIDIA/build/logs/2024.12.22-20.21.06 | ||
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/30a4695143b84b89/inference/mlperf.conf | ||
offline_expected_qps : 0.0 | ||
precision : int8 | ||
preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/5b2b0cc913a4453a/preprocessed_data | ||
scenario : Scenario.Offline | ||
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='13th Gen Intel(R) Core(TM) i9-13900K', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=1): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=131.634476, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=131634476000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=None, system_id='RTX4090x1') | ||
tensor_path : build/preprocessed_data/squad_tokenized/input_ids.npy,build/preprocessed_data/squad_tokenized/segment_ids.npy,build/preprocessed_data/squad_tokenized/input_mask.npy | ||
test_mode : AccuracyOnly | ||
use_graphs : False | ||
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/403177c9892b404eaf63b19dc57cef7e.conf | ||
system_id : RTX4090x1 | ||
config_name : RTX4090x1_bert_Offline | ||
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP) | ||
optimization_level : plugin-enabled | ||
num_profiles : 2 | ||
config_ver : custom_k_99_MaxP | ||
accuracy_level : 99% | ||
inference_server : custom | ||
skip_file_checks : True | ||
power_limit : None | ||
cpu_freq : None | ||
&&&& RUNNING BERT_HARNESS # ./build/bin/harness_bert | ||
I1222 20:21:08.208585 19723 main_bert.cc:163] Found 1 GPUs | ||
I1222 20:21:08.294823 19723 bert_server.cc:147] Engine Path: ./build/engines/RTX4090x1/bert/Offline/bert-Offline-gpu-int8_S_384_B_256_P_2_vs.custom_k_99_MaxP.plan | ||
[I] [TRT] Loaded engine size: 414 MiB | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +8, now: CPU 578, GPU 1225 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 580, GPU 1235 (MiB) | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +290, now: CPU 0, GPU 290 (MiB) | ||
I1222 20:21:08.548007 19723 bert_server.cc:208] Engines Creation Completed | ||
I1222 20:21:08.560138 19723 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608 | ||
I1222 20:21:08.560142 19723 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2 | ||
I1222 20:21:08.560149 19723 bert_core_vs.cc:415] Engine - Profile 0 maxDims 98304 Bmax=256 Smax=384 | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 166, GPU 1901 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 166, GPU 1909 (MiB) | ||
I1222 20:21:08.603008 19723 bert_core_vs.cc:426] Setting Opt.Prof. to 0 | ||
I1222 20:21:08.603025 19723 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256 | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 290 (MiB) | ||
I1222 20:21:08.603590 19723 bert_core_vs.cc:476] Setup complete | ||
I1222 20:21:08.603727 19723 bert_core_vs.cc:385] Engine - Device Memory requirements: 704644608 | ||
I1222 20:21:08.603729 19723 bert_core_vs.cc:393] Engine - Number of Optimization Profiles: 2 | ||
I1222 20:21:08.603731 19723 bert_core_vs.cc:415] Engine - Profile 1 maxDims 98304 Bmax=256 Smax=384 | ||
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 289, GPU 2715 (MiB) | ||
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 289, GPU 2725 (MiB) | ||
I1222 20:21:08.645998 19723 bert_core_vs.cc:426] Setting Opt.Prof. to 1 | ||
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly. | ||
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +0, now: CPU 1, GPU 290 (MiB) | ||
I1222 20:21:08.646291 19723 bert_core_vs.cc:444] Context creation complete. Max supported batchSize: 256 | ||
I1222 20:21:08.646836 19723 bert_core_vs.cc:476] Setup complete | ||
I1222 20:21:08.876005 19723 main_bert.cc:184] Starting running actual test. | ||
I1222 20:21:11.529424 19723 main_bert.cc:190] Finished running actual test. | ||
|
||
No warnings encountered during test. | ||
|
||
No errors encountered during test. | ||
[2024-12-22 20:21:11,652 run_harness.py:166 INFO] Result: Accuracy run detected. | ||
[2024-12-22 20:21:11,652 __init__.py:46 INFO] Running command: PYTHONPATH=code/bert/tensorrt/helpers python3 /home/cmuser/CM/repos/local/cache/dfbf240f980947f5/repo/closed/NVIDIA/build/inference/language/bert/accuracy-squad.py --log_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/mlperf_log_accuracy.json --vocab_file build/models/bert/vocab.txt --val_data /home/cmuser/CM/repos/local/cache/5b2b0cc913a4453a/data/squad/dev-v1.1.json --out_file /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/predictions.json --output_dtype float16 | ||
{"exact_match": 82.81929990539263, "f1": 90.15673510616978} | ||
Reading examples... | ||
Loading cached features from 'eval_features.pickle'... | ||
Loading LoadGen logs... | ||
Post-processing predictions... | ||
Writing predictions to: /cm-mount/home/arjun/gh_action_results/valid_results/RTX4090x1-nvidia_original-gpu-tensorrt-vdefault-default_config/bert-99/offline/accuracy/predictions.json | ||
Evaluating predictions... | ||
|
||
======================== Result summaries: ======================== | ||
|
Oops, something went wrong.