Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] MemoryMappedTensor Loading #1051

Open
3 tasks done
suessmann opened this issue Oct 21, 2024 · 0 comments
Open
3 tasks done

[BUG] MemoryMappedTensor Loading #1051

suessmann opened this issue Oct 21, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@suessmann
Copy link

suessmann commented Oct 21, 2024

Describe the bug

I collected a memmap tensordict similar to the guide provided [1] on the cluster in a jupyter notebook. When loading the same memmap on my local machine (with TensorDict.load_memmap(path), I get the error RuntimeError: Could not find name <class '__main__.ImageNetData'>, since I'm not loading the memmap from in __main__. I suspect the issue is in meta.json file of the memmap, where the type is specified as <class '__main__.ImageNetData'>, but I do not run load_memmap(path) from __main__.

To Reproduce

Follow [1] and save the path to memmap. Then create main.py:

from data import Dataset

def main(path):
    data = Dataset(path)

if __name__ == '__main__':
    main('path/to/memmap')

in data.py

from tensordict import MemoryMappedTensor, tensorclass, TensorDict

@tensorclass
class ImageNetData:
    images: torch.Tensor
    targets: torch.Tensor

class Dataset:
    def __init__(path):
        self.data = TensorDict.load_memmap(path)

and you will get

RuntimeError: Could not find name <class '__main__.ImageNetData'>

Expected behavior

A slick load of memmap.

System info

import tensordict, numpy, sys, torch
print(tensordict.__version__, numpy.__version__, sys.version, sys.platform, torch.__version__)

0.5.0 1.26.4 3.9.19 (main, May 6 2024, 19:43:03)
[GCC 11.2.0] linux 2.4.1+cu121

Reason and Possible fixes

I manually changed meta.json to

{"_type":"<class 'data.ImageNetData'>"}

but it is not the most consistent way. There is also an option to make use of snapshots, but from the example [2] I see that to load a snapshot, one needs to initialize memmap each time, which is super time consuming in my case (my data is >500GB of size).

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

[1] https://pytorch.org/tensordict/main/tutorials/tensorclass_imagenet.html
[2]

def load(cls, dataset, path):

@suessmann suessmann added the bug Something isn't working label Oct 21, 2024
@suessmann suessmann changed the title [BUG] [BUG] MemoryMappedTensor Loading Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants