-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device assignment does not work in PyTorch #131
Comments
Even with #132, the device assignment still doesn't work for list columns: import os
import pandas as pd
from merlin.dataloader.torch import Loader
from merlin.io.dataset import Dataset
#dataset = Dataset(pd.DataFrame({"a": list(range(10))}))
dataset = Dataset(pd.DataFrame({"a": [[1], [2, 3]] * 5}))
dataset = dataset.repartition(npartitions=2)
rank = int(os.environ["LOCAL_RANK"])
with Loader(
dataset,
batch_size=1,
global_rank=rank,
global_size=2,
device=rank,
) as loader:
for idx, batch in enumerate(loader):
x, y = batch
values_device = x["a__values"].device
offsets_device = x["a__offsets"].device
print(f"rank: {rank}, values_device: {values_device}, offsets_device: {offsets_device}")
|
Wondering if this may be related to which GPU the cuDF dataframe or series is coming from. Selecting the device with cuDF requires dropping down to CuPy AFAIK, which I think requires running |
#135 partially fixes the issue, but users have to set the device themselves by using rank = int(os.environ["LOCAL_RANK"])
with cupy.cuda.Device(rank):
dataset = Dataset(pd.DataFrame(
{"a": [[1], [2, 3], [4, 5], [6, 7, 8], [9], [10, 11], [12], [13, 14]]}
))
dataset = dataset.repartition(npartitions=2)
with Loader(
dataset,
batch_size=2,
global_rank=rank,
global_size=2,
device=rank,
) as loader:
for idx, batch in enumerate(loader):
x, y = batch
device = x["a__values"].device
print(f"rank: {rank}, device: {device}") We should move this to Merlin Core by implementing something like a |
@edknv Could you create an issue for the proposed change to |
As of 12372f4, device assignment in the PyTorch dataloader does not work correctly with multiple GPUs.
When I run the above, I get:
But for rank 1, tensors are expected to be be placed on
cuda:1
notcuda:0
.The text was updated successfully, but these errors were encountered: