Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO does not appear to be asynchronous with the compute phase #178

Open
crtierney opened this issue Apr 3, 2024 · 1 comment
Open

IO does not appear to be asynchronous with the compute phase #178

crtierney opened this issue Apr 3, 2024 · 1 comment

Comments

@crtierney
Copy link

Around line 261 in dlio_benchmark/dlio_benchmark/main.py, there is the main loop to read and simulate the computation time.

    loader = self.framework.get_loader(dataset_type=DatasetType.TRAIN)
    t0 = time()
    for batch in dlp.iter(loader.next()):
        self.stats.batch_loaded(epoch, overall_step, block, t0)

When I run my code with native_dali (I understand this isn't fully supported yet). The first step reports a reasonable process, but all subsequent response are much larger.

I have been adjusting my .yaml file for the resnet50_h100 case. My current computation_time is 0.1 seconds. If I set the computation_time to zero, the actual time reported is ~0.055 sections, which is the fastest my storage can do for this configuration.

The time reported for loaded, which is the from the batch_loaded() function for all steps > 1 is the actual IO time. So the processed time is the IO time plus the computational time. It's as if IO is not being done asynchronously.

Until the dali reader is fixed, I can't test that. I want to get this written down in case it also affects the dali reader.

[INFO] 2024-04-02T17:40:01.359388 Rank 0 step 1: loaded 400 samples in 9.298324584960938e-05 s
[INFO] 2024-04-02T17:40:01.479351 Rank 0 step 1 processed 400 samples in 0.12005329132080078 s

[INFO] 2024-04-02T17:40:01.534976 Rank 0 step 2: loaded 400 samples in 0.05459904670715332 s
[INFO] 2024-04-02T17:40:01.636629 Rank 0 step 2 processed 400 samples in 0.15625238418579102 s

@crtierney crtierney changed the title Is dlp. IO does not appear to be asynchronous with the compute phase Apr 3, 2024
@zhenghh04
Copy link
Member

Yes, we are aware of this issue. There is some CPU function blocking the I/O call in native_dali. We are working on a PR to address that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants