Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLoader Performance improvement #58

Closed
gitttt-1234 opened this issue Jun 23, 2024 · 2 comments · May be fixed by #72
Closed

DataLoader Performance improvement #58

gitttt-1234 opened this issue Jun 23, 2024 · 2 comments · May be fixed by #72

Comments

@gitttt-1234
Copy link
Contributor

The primary bottleneck in our training pipeline is the dataloader performance (currently training time per epoch is very high due to the dataloader - IterDatapipe).

We should implement the PreLoader module (check SLEAP) in Sleap-NN and benchmark the training time. The training time should be optimized to improve/ match the current SLEAP performance.

Ref (current Torch issue): pytorch/data#1196

@talmo
Copy link
Contributor

talmo commented Jun 26, 2024

Profile!! --> https://lightning.ai/docs/pytorch/stable/tuning/profiler_basic.html

Break down the performance by steps, adding one block of the preloader at a time.

@gitttt-1234 gitttt-1234 linked a pull request Aug 15, 2024 that will close this issue
@gitttt-1234
Copy link
Contributor Author

gitttt-1234 commented Sep 5, 2024

We compared and benchmarked the performance of IterDatapipes with LitData, and found that LitData is much faster and efficient for data processing. #80 addresses the plan for refactoring our current data pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants