You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Around line 261 in dlio_benchmark/dlio_benchmark/main.py, there is the main loop to read and simulate the computation time.
loader = self.framework.get_loader(dataset_type=DatasetType.TRAIN)
t0 = time()
for batch in dlp.iter(loader.next()):
self.stats.batch_loaded(epoch, overall_step, block, t0)
When I run my code with native_dali (I understand this isn't fully supported yet). The first step reports a reasonable process, but all subsequent response are much larger.
I have been adjusting my .yaml file for the resnet50_h100 case. My current computation_time is 0.1 seconds. If I set the computation_time to zero, the actual time reported is ~0.055 sections, which is the fastest my storage can do for this configuration.
The time reported for loaded, which is the from the batch_loaded() function for all steps > 1 is the actual IO time. So the processed time is the IO time plus the computational time. It's as if IO is not being done asynchronously.
Until the dali reader is fixed, I can't test that. I want to get this written down in case it also affects the dali reader.
[INFO] 2024-04-02T17:40:01.359388 Rank 0 step 1: loaded 400 samples in 9.298324584960938e-05 s
[INFO] 2024-04-02T17:40:01.479351 Rank 0 step 1 processed 400 samples in 0.12005329132080078 s
[INFO] 2024-04-02T17:40:01.534976 Rank 0 step 2: loaded 400 samples in 0.05459904670715332 s
[INFO] 2024-04-02T17:40:01.636629 Rank 0 step 2 processed 400 samples in 0.15625238418579102 s
The text was updated successfully, but these errors were encountered:
crtierney
changed the title
Is dlp.
IO does not appear to be asynchronous with the compute phase
Apr 3, 2024
Around line 261 in dlio_benchmark/dlio_benchmark/main.py, there is the main loop to read and simulate the computation time.
When I run my code with native_dali (I understand this isn't fully supported yet). The first step reports a reasonable process, but all subsequent response are much larger.
I have been adjusting my .yaml file for the resnet50_h100 case. My current computation_time is 0.1 seconds. If I set the computation_time to zero, the actual time reported is ~0.055 sections, which is the fastest my storage can do for this configuration.
The time reported for loaded, which is the from the batch_loaded() function for all steps > 1 is the actual IO time. So the processed time is the IO time plus the computational time. It's as if IO is not being done asynchronously.
Until the dali reader is fixed, I can't test that. I want to get this written down in case it also affects the dali reader.
[INFO] 2024-04-02T17:40:01.359388 Rank 0 step 1: loaded 400 samples in 9.298324584960938e-05 s
[INFO] 2024-04-02T17:40:01.479351 Rank 0 step 1 processed 400 samples in 0.12005329132080078 s
[INFO] 2024-04-02T17:40:01.534976 Rank 0 step 2: loaded 400 samples in 0.05459904670715332 s
[INFO] 2024-04-02T17:40:01.636629 Rank 0 step 2 processed 400 samples in 0.15625238418579102 s
The text was updated successfully, but these errors were encountered: