-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PersistentDataset and CacheDataset hybrid #6753
Comments
thanks for the feature request, there's recently a Lines 787 to 801 in 4addc5d
|
This is a nice addition, thanks for pointing it out! I was suggesting something a bit different. Maybe I should frame it this way - a Or, a |
thanks @ibro45 that's a good idea, adding the feature request label here.. |
Is your feature request related to a problem? Please describe.
CacheDataset
preprocesses the non-random transforms and loads the data into RAM.PersistentDataset
preprocesses the non-random transforms into pickled files on its first run, and any subsequent run reads them on the fly and applies the random transforms only.However, when prototyping, you often rerun a setup with different hyper-parameters, and you end up waiting each time for the
CacheDataset
to preprocess the non-random transforms all over again. UsingPersistentDataset
on the other hand, won't require preprocessing them again at each run, but could still be slower thanCacheDataset
as it reads objects from the drive instead of RAM.Describe the solution you'd like
I propose a combination of the two, that could also be framed as an extension to
PersistentDataset
that will allow loading of the pickled files into RAM. This way, the non-random transforms are only ever done once instead of always redoing them when loading the data into RAM.The text was updated successfully, but these errors were encountered: