This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Replies: 2 comments 3 replies
-
@lou-k The dataloader should be able to load multiple .rec files, at least the DALI dataloader. |
Beta Was this translation helpful? Give feedback.
2 replies
-
For anyone who finds this via google, this is my current workaround: import numpy as np
from mxnet.gluon.data.dataset import Dataset
from bisect import bisect_right
class ConcatenatedDataset(Dataset):
"""
Combines multiple gluon datasets into one. This is useful if, for example, you need to combine multiple ImageRecordDatasets
after using the 'chunk' option in im2rec..
"""
def __init__(self, datasets):
self.datasets = datasets
self.offsets = [0] + np.cumsum([len(d) for d in datasets]).tolist()
def __len__(self):
return self.offsets[-1]
def __getitem__(self, idx):
# figure out which dataset this index will fall into
j = bisect_right(self.offsets, idx) - 1
# get that item.
return self.datasets[j].__getitem__(idx - self.offsets[j]) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I ran into some performance issues packaging about 2 million images into
.rec
files – after about 200k, the time to pack each image got really high. As a workaround, I used the--chunks
option forim2rec
, which resulted in about 10.rec
files with around 200k images each.That was nice, but I don’t see an easy way to combine them. I’m using gluon’s
ImageRecordDataset
, which only accepts a single.rec
and idx file.Is there an easy way to combine these
.rec
I’ve generated? I’d be OK combining them into one file for passing toImageRecordDataset
, or having a dataset support multiple input files.Beta Was this translation helpful? Give feedback.
All reactions