GitHub - fosterrath-mila/Benzina: Benzina is an image-loader package that greatly accelerates image loading onto GPUs using their built-in hardware codecs.

Бензина / Benzina

Description of the project

Benzina is an image loading library that accelerates image loading and preprocessing by making use of the hardware decoder in NVIDIA's GPUs.

Since it minimize the use of the CPU and of the GPU computing units, it's easier to reach saturation of GPU computing power / CPU. In our tests using ResNet18 models in PyTorch on the ImageNet 2012 dataset, we could observe an increase by 2.4x the amount of images loaded, preprocessed then processed by the model when using a single CPU and GPU:

Data loader	CPU	CPU Workers	GPU	GPU compute speed	Pipeline effective speed
PyTorch ImageFolder	Intel Xeon E5-2623*	2	Tesla V100*	1050 img/s	400 img/s
Benzina	Intel Xeon E5-2623*	1	Tesla V100*	1050 img/s	960 img/s

Note

Intel Xeon E5-2623 is the Xeon E5-2623 v3 @ 3.00 GHz version
Tesla V100 is the Tesla V100 PCIE 16GB version

The name "Benzina" is a phonetic transliteration of the Ukrainian word "Бензина", meaning "gasoline" (or "petrol").

ImageNet loading in PyTorch

As long as your dataset is converted into Benzina's data format, you can load it to train a PyTorch model in a few lines of code. Here is an example demonstrating how this can be done with an ImageNet dataset. It is based on the ImageNet example from PyTorch

import torch
import benzina.torch as bz
import benzina.torch.operations as ops

seed = 1234
torch.manual_seed(seed)

# Dataset
dataset = bz.ImageNet("path/to/data")

indices = list(range(len(dataset)))
n_valid = 50000
n_test = 100000
n_train = len(dataset) - n_valid - n_test
train_sampler = torch.utils.data.SubsetRandomSampler(indices[:n_train])
valid_sampler = torch.utils.data.SubsetRandomSampler(indices[n_train:-n_test])

# Dataloaders
bias = ops.ConstantBiasTransform(bias=(123.675, 116.28 , 103.53))
std = ops.ConstantNormTransform(norm=(58.395, 57.12 , 57.375))

train_dataloader = bz.DataLoader(
    dataset,
    batch_size=256,
    sampler=train_sampler,
    seed=seed,
    shape=(224,224),
    bias_transform=bias,
    norm_transform=std,
    warp_transform=ops.SimilarityTransform(flip_h=0.5))
valid_dataloader = bz.DataLoader(
    dataset,
    batch_size=512,
    sampler=valid_sampler,
    seed=seed,
    shape=(224,224),
    bias_transform=bias,
    norm_transform=std,
    warp_transform=ops.SimilarityTransform())

for epoch in range(1, 10):
    # train for one epoch
    train(train_dataloader, ...)

    # evaluate on validation set
    accuracy = validate(valid_dataloader, ...)

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
doc/source		doc/source
examples/python/imagenet		examples/python/imagenet
include		include
scripts		scripts
src		src
thirdparty		thirdparty
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS.md		AUTHORS.md
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.rst		README.rst
meson.build		meson.build
meson_options.txt		meson_options.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Бензина / Benzina

Description of the project

ImageNet loading in PyTorch

Objectives

Known limitations

Roadmap

How to Contribute

Submitting bugs

Contributing changes

About

Releases

Packages

Languages

License

fosterrath-mila/Benzina

Folders and files

Latest commit

History

Repository files navigation

Бензина / Benzina

Description of the project

ImageNet loading in PyTorch

About

Resources

License

Stars

Watchers

Forks

Languages