A Julia package implementing performant data loading for deep learning on out-of-memory datasets that. Works like PyTorch's DataLoader
.
- Uses multi-threading to load data in parallel while keeping the primary thread free for the training loop
- Handles batching and collating
- Is simple to extend for custom datasets
- Integrates well with other packages in the ecosystem
- Allows for inplace loading to reduce memory load
- You have a dataset that does not fit into memory
- You want to reduce the time your training loop is waiting for the next batch of data
Install like any other Julia package using the package manager (see setup):
]add DataLoaders
After installation, import it, create a DataLoader
from a dataset and batch size, and iterate over it:
using DataLoaders
# 10.000 observations of inputs with 128 features and one target feature
data = (rand(128, 10000), rand(1, 10000))
dataloader = DataLoader(data, 16)
for (xs, ys) in dataloader
@assert size(xs) == (128, 16)
@assert size(ys) == (1, 16)
end