Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Native Dali Data Loader support for TFRecord, Images, and NPZ files #118

Merged
merged 39 commits into from
Dec 15, 2023

Conversation

zhenghh04
Copy link
Member

This PR is a partial merge of the following PR #81 @hariharan-devarajan

@zhenghh04 zhenghh04 marked this pull request as draft December 1, 2023 20:32
@zhenghh04 zhenghh04 added the enhancement New feature or request label Dec 2, 2023
@zhenghh04 zhenghh04 marked this pull request as ready for review December 6, 2023 22:23
@zhenghh04
Copy link
Member Author

@hariharan-devarajan , I pull part of your PR #81.
This only pull native_dali part. I have not pull native_torch and native_tensorflow. Still need more study to see whether there is a need to have native_torch and native_tensorflow.

For format, I keep only npz, npy, jpeg, png, hdf5. I don't have dlio_npz, etc. We can always determine whether it is dlio_npz or native_npz based on the data_loader selected.

Copy link
Collaborator

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some potential improvements.

dlio_benchmark/reader/dali_base_reader.py Outdated Show resolved Hide resolved
dlio_benchmark/reader/dali_base_reader.py Outdated Show resolved Hide resolved
dlio_benchmark/reader/dali_image_reader.py Show resolved Hide resolved
dlio_benchmark/reader/dali_npy_reader.py Show resolved Hide resolved
@zhenghh04
Copy link
Member Author

@hariharan-devarajan I also have to removed the cache installation for DLIO, because of the code changes does not get push to the dlio installation, causing failing tests.

I think it is not good to keep cache for DLIO installation.

@zhenghh04
Copy link
Member Author

@hariharan-devarajan Please review it again and check whether the changes look good to you.

@hariharan-devarajan
Copy link
Collaborator

@hariharan-devarajan I also have to removed the cache installation for DLIO, because of the code changes does not get push to the dlio installation, causing failing tests.

I think it is not good to keep cache for DLIO installation.

What if we split dependency installation with actual code installation?

The pip way to do this is.

python setup.py egg_info
pip install -r *.egg-info/requires.txt
rm -rf *.egg-info/

What do u think?

Copy link
Collaborator

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more minor comments.

dlio_benchmark/reader/dali_image_reader.py Show resolved Hide resolved
dlio_benchmark/reader/dali_image_reader.py Outdated Show resolved Hide resolved
dlio_benchmark/reader/dali_image_reader.py Show resolved Hide resolved
dlio_benchmark/reader/dali_npy_reader.py Show resolved Hide resolved
dlio_benchmark/reader/dali_npy_reader.py Show resolved Hide resolved
dlio_benchmark/data_loader/native_dali_data_loader.py Outdated Show resolved Hide resolved
dlio_benchmark/reader/dali_tfrecord_reader.py Show resolved Hide resolved
dlio_benchmark/reader/dali_tfrecord_reader.py Show resolved Hide resolved
dlio_benchmark/reader/reader_factory.py Show resolved Hide resolved
@hariharan-devarajan
Copy link
Collaborator

@hariharan-devarajan Please review it again and check whether the changes look good to you.

Can u do the review again button. The commenting doesn’t notify me correctly.

@zhenghh04 zhenghh04 removed the request for review from hariharan-devarajan December 12, 2023 16:28
@zhenghh04
Copy link
Member Author

@hariharan-devarajan Please see all the comments again.

Copy link
Collaborator

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there :)

dlio_benchmark/reader/dali_npy_reader.py Show resolved Hide resolved
dlio_benchmark/reader/reader_factory.py Show resolved Hide resolved
@zhenghh04
Copy link
Member Author

zhenghh04 commented Dec 14, 2023

@hariharan-devarajan I also have to removed the cache installation for DLIO, because of the code changes does not get push to the dlio installation, causing failing tests.
I think it is not good to keep cache for DLIO installation.

What if we split dependency installation with actual code installation?

The pip way to do this is.

python setup.py egg_info
pip install -r *.egg-info/requires.txt
rm -rf *.egg-info/
``1

What do u think?

I think splitting dependency installation and actuall code installation will be good. New commit added this. commit 1f03159 and commit 5dd6ebf

Copy link
Collaborator

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me.

@zhenghh04 zhenghh04 merged commit 657d4b9 into main Dec 15, 2023
24 checks passed
@zhenghh04 zhenghh04 deleted the dali branch March 12, 2024 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants