Skip to content

Commit

Permalink
Results from self hosted Github actions - NVIDIARTX4090
Browse files Browse the repository at this point in the history
  • Loading branch information
arjunsuresh committed Dec 25, 2024
1 parent 078833e commit a80faec
Show file tree
Hide file tree
Showing 35 changed files with 12,259 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
| Model | Scenario | Accuracy | Throughput | Latency (in ms) |
|---------------------|------------|------------|--------------|-------------------|
| stable-diffusion-xl | offline | () | 1.314 | - |
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops).

*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.*

## Host platform

* OS version: Linux-6.8.0-49-generic-x86_64-with-glibc2.29
* CPU version: x86_64
* Python version: 3.8.10 (default, Nov 7 2024, 13:10:47)
[GCC 9.4.0]
* MLCommons CM version: 3.5.2

## CM Run Command

See [CM installation guide](https://docs.mlcommons.org/inference/install/).

```bash
pip install -U cmind

cm rm cache -f

cm pull repo mlcommons@mlperf-automations --checkout=3551660b68ffcff303ae7539ae9a62d34b19bc7e

cm run script \
--tags=app,mlperf,inference,generic,_nvidia,_sdxl,_tensorrt,_test,_r4.1-dev_default,_float16,_offline \
--quiet=true \
--env.CM_MLPERF_MODEL_SDXL_DOWNLOAD_TO_HOST=yes \
--env.CM_QUIET=yes \
--env.CM_MLPERF_IMPLEMENTATION=nvidia \
--env.CM_MLPERF_MODEL=sdxl \
--env.CM_MLPERF_RUN_STYLE=test \
--env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False \
--env.CM_DOCKER_PRIVILEGED_MODE=True \
--env.CM_MLPERF_BACKEND=tensorrt \
--env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter \
--env.CM_MLPERF_CLEAN_ALL=True \
--env.CM_MLPERF_DEVICE= \
--env.CM_MLPERF_USE_DOCKER=True \
--env.CM_MLPERF_MODEL_PRECISION=float16 \
--env.OUTPUT_BASE_DIR=/home/arjun/scc_gh_action_results \
--env.CM_MLPERF_LOADGEN_SCENARIO=Offline \
--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/arjun/scc_gh_action_submissions \
--env.CM_MLPERF_INFERENCE_VERSION=5.0-dev \
--env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1-dev_default \
--env.CM_MLPERF_SUBMISSION_DIVISION=open \
--env.CM_RUN_MLPERF_SUBMISSION_PREPROCESSOR=False \
--env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=short \
--env.CM_MLPERF_SUT_NAME_RUN_CONFIG_SUFFIX4=scc24-base \
--env.CM_DOCKER_IMAGE_NAME=scc24-nvidia \
--env.CM_MLPERF_INFERENCE_MIN_QUERY_COUNT=50 \
--env.CM_MLPERF_LOADGEN_ALL_MODES=yes \
--env.CM_MLPERF_INFERENCE_SOURCE_VERSION=5.0.4 \
--env.CM_MLPERF_LAST_RELEASE=v5.0 \
--env.CM_TMP_PIP_VERSION_STRING= \
--env.CM_MODEL=sdxl \
--env.CM_MLPERF_LOADGEN_COMPLIANCE=no \
--env.CM_MLPERF_CLEAN_SUBMISSION_DIR=yes \
--env.CM_RERUN=yes \
--env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= \
--env.CM_MLPERF_LOADGEN_MODE=performance \
--env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline \
--env.CM_MLPERF_LOADGEN_MODES,=performance,accuracy \
--env.CM_OUTPUT_FOLDER_NAME=test_results \
--env.CM_DOCKER_REUSE_EXISTING_CONTAINER=no \
--env.CM_DOCKER_DETACHED_MODE=yes \
--add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1-dev \
--add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1-dev \
--add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1-dev \
--add_deps_recursive.submission-checker.tags=_short-run \
--add_deps_recursive.coco2014-preprocessed.tags=_size.50,_with-sample-ids \
--add_deps_recursive.coco2014-dataset.tags=_size.50,_with-sample-ids \
--add_deps_recursive.nvidia-preprocess-data.extra_cache_tags=scc24-base \
--v=False \
--print_env=False \
--print_deps=False \
--dump_version_info=True \
--env.OUTPUT_BASE_DIR=/cm-mount/home/arjun/scc_gh_action_results \
--env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/cm-mount/home/arjun/scc_gh_action_submissions \
--env.SDXL_CHECKPOINT_PATH=/home/cmuser/CM/repos/local/cache/762e6805370c44eb/stable_diffusion_fp16 \
--env.MLPERF_SCRATCH_PATH=/home/cmuser/CM/repos/local/cache/4db00c74da1e44c8
```
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts),
you should simply reload mlcommons@mlperf-automations without checkout and clean CM cache as follows:*

```bash
cm rm repo mlcommons@mlperf-automations
cm pull repo mlcommons@mlperf-automations
cm rm cache -f

```

## Results

Platform: f7fe5bc93dd5-nvidia_original-gpu-tensorrt-vdefault-scc24-base

Model Precision: int8

### Accuracy Results

### Performance Results
`Samples per second`: `1.3141`
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
[2024-12-25 12:27:53,730 main.py:229 INFO] Detected system ID: KnownSystem.f7fe5bc93dd5
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
[2024-12-25 12:27:55,048 generate_conf_files.py:107 INFO] Generated measurements/ entries for f7fe5bc93dd5_TRT/stable-diffusion-xl/Offline
[2024-12-25 12:27:55,048 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/cm-mount/home/arjun/scc_gh_action_results/test_results/f7fe5bc93dd5-nvidia_original-gpu-tensorrt-vdefault-scc24-base/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=2 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/7f314a33540f461d/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/2bf31fee686c457eb2bc9b93856f7440.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan,./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan,./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan,./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl
[2024-12-25 12:27:55,048 __init__.py:53 INFO] Overriding Environment
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
[2024-12-25 12:27:56,903 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
[2024-12-25 12:27:57,035 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
[2024-12-25 12:27:57,681 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan.
[2024-12-25 12:27:59,038 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan.
[2024-12-25 12:28:00,378 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
[2024-12-25 12:28:00,503 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan.
[2024-12-25 12:28:01,147 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan.
[2024-12-25 12:28:02,517 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan.
[2024-12-25 12:28:03,690 harness.py:207 INFO] Start Warm Up!
[2024-12-25 12:28:15,465 harness.py:209 INFO] Warm Up Done!
[2024-12-25 12:28:15,465 harness.py:211 INFO] Start Test!
[2024-12-25 13:30:02,311 backend.py:801 INFO] [Server] Received 5000 total samples
[2024-12-25 13:30:02,312 backend.py:809 INFO] [Device 0] Reported 2496 samples
[2024-12-25 13:30:02,312 backend.py:809 INFO] [Device 1] Reported 2504 samples
[2024-12-25 13:30:02,312 harness.py:214 INFO] Test Done!
[2024-12-25 13:30:02,312 harness.py:216 INFO] Destroying SUT...
[2024-12-25 13:30:02,312 harness.py:219 INFO] Destroying QSL...
benchmark : Benchmark.SDXL
buffer_manager_thread_count : 0
data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/data
gpu_batch_size : 2
gpu_copy_streams : 1
gpu_inference_streams : 1
input_dtype : int32
input_format : linear
log_dir : /home/cmuser/CM/repos/local/cache/7c0c2e4c9cc3421e/repo/closed/NVIDIA/build/logs/2024.12.25-12.27.52
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/7f314a33540f461d/inference/mlperf.conf
model_path : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/models/SDXL/
offline_expected_qps : 0.0
precision : int8
preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/4db00c74da1e44c8/preprocessed_data
scenario : Scenario.Offline
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) w7-2495X', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=2): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=197.334532, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=197334532000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1, GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=500.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=1), system_id='f7fe5bc93dd5')
tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/
test_mode : AccuracyOnly
use_graphs : False
user_conf_path : /home/cmuser/CM/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/2bf31fee686c457eb2bc9b93856f7440.conf
system_id : f7fe5bc93dd5
config_name : f7fe5bc93dd5_stable-diffusion-xl_Offline
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
optimization_level : plugin-enabled
num_profiles : 1
config_ver : custom_k_99_MaxP
accuracy_level : 99%
inference_server : custom
skip_file_checks : False
power_limit : None
cpu_freq : None
[I] Loading bytes from ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
[I] Loading bytes from ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
[I] Loading bytes from ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan
[I] Loading bytes from ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan
[I] Loading bytes from ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
[W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[I] Loading bytes from ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b2-fp16.custom_k_99_MaxP.plan
[I] Loading bytes from ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b2-int8.custom_k_99_MaxP.plan
[I] Loading bytes from ./build/engines/f7fe5bc93dd5/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b2-fp32.custom_k_99_MaxP.plan
[2024-12-25 13:30:02,832 run_harness.py:166 INFO] Result: Accuracy run detected.

======================== Result summaries: ========================

Loading

0 comments on commit a80faec

Please sign in to comment.