-
Notifications
You must be signed in to change notification settings - Fork 967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pickling the accelerator after preparing data loader no longer possible #3070
Comments
Ping @byi8220. I already discussed this issue internally with Zach, who might take a look later today. |
Fully agree on that.
Makes sense, IIUC it would be preferred to adjust Alternatively, we could just abandon the dynamic class solution entirely and incur a lot of code duplication (which we discussed in the original PR, but decided against).
Ack, I'll hold off on writing PR then? |
I think you can give it a try, I'm sure Zach will write here when he starts working on this. |
@byi8220 let's go with the getstate/setstate version rather than do code duplication. I haven't gotten there yet, so feel free to have at it! |
Working on a possible fix for this in #3074 (Currently marked as draft, but it might be worth looking at in its current state, since it might be enough?) Honestly, even before breaking the class pickling DataLoaders seems really sort of fragile. There's more details in the PR, but I'm not sure if StatefulDataLoader could (or should) even be pickled. Also, my current multigpu tests are failing with an error message of At the very least, I ran However, I can't run their integration tests, so there's still a chance this is not sufficient. I still think there's some rough edges with pickling, especially in a distributed context. |
Yet again, this was just me not updating to the latest pytorch version. Pickling torch generators is only available in torch version |
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
The error is:
Expected behavior
Pickling and unpickling the
Accelerator
instance used to work but now it fails if a data loader has been prepared. I could trace down the issue to this PR: #2895. Although I haven't investigated further, I'm fairly certain that messing with the__class__
attribute is the reason:accelerate/src/accelerate/data_loader.py
Line 436 in b5235f2
If that's so, there might be the necessity to adjust
__getstate__
or__reduce__
onDataLoaderAdapter
to account for the fact that the class is dynamically changed.The text was updated successfully, but these errors were encountered: