-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support setup for torch DataPipes #16603
Comments
I am not super familiar with Lightning, but what happens if users do: mp_rs = MultiProcessingReadingService(num_workers=2)
dist_rs = DistributedReadingService()
rs = SequentialReadingService(dist_rs, mp_rs) # Execute both distributed and multiprocessing
dataloader = DataLoader2(datapipe, reading_service=rs) This would normally work with standalone Would it still work if |
In Lightning Trainer yes, because it supports arbitrary iterables. But we don't have any correctness tests for DataLoader2 specifically, so no guarantees. In Fabric, the |
Is there any update or plan to support Dataloader2? Thanks! |
The future of torchdata is very unclear. The development has paused, here is the official statement: pytorch/data#1196 I suggest we wait until we know about future plans. For now, users can have datapipes/dataloader2 with Lightning by configuring them manually. |
Description & Motivation
When working with torchdata's DataPipes in distributed settings, the user can use DataLoader2 to run the pipe with a specific reading service that applies sharding of the data pipe correctly. For example:
However, this does not mix well with Lightning as the user would have to change the reading service when switching from one strategy to another.
Pitch
Similar to what we do with the injection of the DistributedSampler for the regular DataLoader, add the reading service for the datapipe automatically for the user.
Alternatives
No response
Additional context
If you are interested in learning more about torchdata, here is a good YouTube video by PyTorch that introduces the main concepts an values.
cc @Borda @justusschock @awaelchli @carmocca
The text was updated successfully, but these errors were encountered: