Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle doesn't work #85

Open
Ilyushin opened this issue Dec 15, 2022 · 3 comments
Open

Shuffle doesn't work #85

Ilyushin opened this issue Dec 15, 2022 · 3 comments
Assignees
Labels
bug Something isn't working P1
Milestone

Comments

@Ilyushin
Copy link

Hi all!

Below an examle of code:

from merlin.loader.torch import Loader
from merlin.io import Dataset


train_ds = Dataset('train.parquet')
train_loader = Loader(train_ds, batch_size=65536, shuffle=True)

for batch in train_loader:
    print(batch)

After running I got following:

TypeError: sample() got an unexpected keyword argument 'keep_index'
@edknv
Copy link
Contributor

edknv commented Dec 15, 2022

@Ilyushin Thanks for reporting the issue. Can you provide more details so we can reproduce the issue on our end?

  • Did you use our merlin containers, e.g., nvcr.io/nvidia/merlin/merlin-pytorch:22.11 or install it with conda or pip?
  • What are the package versions that you see when you run
    python -c 'import merlin.core; print(merlin.core.__version__)'
    and
    python -c 'import merlin.dataloader; print(merlin.dataloader.__version__)'
  • It it possible to provide us with the dataset schema train_ds.schema?

@Ilyushin
Copy link
Author

Ilyushin commented Dec 23, 2022

@edknv Thank you for helping.

@rnyak rnyak added bug Something isn't working P1 labels Jan 18, 2023
@rnyak rnyak added this to the Merlin 23.02 milestone Jan 18, 2023
@edknv
Copy link
Contributor

edknv commented Jan 18, 2023

This seems to be due to the version of cudf in the nvcr.io/nvidia/pytorch:22.06-py3 container. In the older version of cudf (prior to 22.04), the keep_index parameter was not available in df.sample().

@Ilyushin Is upgrading your container an option? (e.g., to nvcr.io/nvidia/pytorch:22.07-py3 or even the latest 22.12-py3 not 22.06.) Please also note that nvcr.io/nvidia/merlin/merlin-pytorch comes with merlin-dataloader pre-installed so you don't have to install merlin-dataloader.

@karlhigley karlhigley modified the milestones: Merlin 23.02, Merlin 23.04 Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1
Projects
None yet
Development

No branches or pull requests

4 participants