You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Training MovieLens-100K on algorithms Random, ADMMSLIM, and SLIMElastic crashes with exception "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and CPU!"
CUDA available: True
command line args [--data_set_name MovieLens-100K --model_name Random] will not be used in RecBole
24 Jan 15:52 INFO
General Hyper Parameters:
gpu_id = 0
use_gpu = True
seed = 42
state = INFO
reproducibility = True
data_path = ./data_sets/MovieLens-100K
checkpoint_dir = ./data_sets/MovieLens-100K/recbole_checkpoints/
show_progress = True
save_dataset = False
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False
Training Hyper Parameters:
epochs = 50
train_batch_size = 2048
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}
eval_step = 5
stopping_step = 10
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4
Evaluation Hyper Parameters:
eval_args = {'split': {'LS': 'valid_and_test'}, 'order': 'RO', 'group_by': 'user', 'mode': {'valid': 'uni100', 'test': 'uni100'}}
repeatable = False
metrics = ['Recall', 'MRR', 'NDCG', 'Hit', 'MAP', 'Precision', 'GAUC', 'ItemCoverage', 'AveragePopularity', 'GiniIndex', 'ShannonEntropy', 'TailPercentage']
topk = [1, 3, 5, 10, 20]
valid_metric = NDCG@10
valid_metric_bigger = True
eval_batch_size = 4096
metric_decimal_place = 4
Dataset Hyper Parameters:
field_separator =
seq_separator =
USER_ID_FIELD = user_id
ITEM_ID_FIELD = item_id
RATING_FIELD = rating
TIME_FIELD = timestamp
seq_len = {}
LABEL_FIELD = label
threshold = None
NEG_PREFIX = neg_
load_col = {'inter': ['user_id', 'item_id', 'rating']}
unload_col = {}
unused_col = {}
additional_feat_suffix = []
rm_dup_inter = None
val_interval = {}
filter_inter_by_user_or_item = True
user_inter_num_interval = [0, inf)
item_inter_num_interval = [0, inf)
alias_of_user_id = None
alias_of_item_id = None
alias_of_entity_id = None
alias_of_relation_id = None
preload_weight = {}
normalize_field = []
normalize_all = False
ITEM_LIST_LENGTH_FIELD = item_length
LIST_SUFFIX = _list
MAX_ITEM_LIST_LENGTH = 50
POSITION_FIELD = position_id
HEAD_ENTITY_ID_FIELD = head_id
TAIL_ENTITY_ID_FIELD = tail_id
RELATION_ID_FIELD = relation_id
ENTITY_ID_FIELD = entity_id
benchmark_filename = None
Other Hyper Parameters:
worker = 0
wandb_project = recbole
shuffle = True
require_pow = False
enable_amp = False
enable_scaler = False
transform = None
numerical_features = []
discretization = None
kg_reverse_r = False
entity_kg_num_interval = [0, inf)
relation_kg_num_interval = [0, inf)
MODEL_TYPE = ModelType.GENERAL
encoding = utf-8
training_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'dynamic': False, 'candidate_num': 0}
MODEL_INPUT_TYPE = InputType.POINTWISE
eval_type = EvaluatorType.RANKING
single_spec = True
local_rank = 0
device = cuda
valid_neg_sample_args = {'distribution': 'uniform', 'sample_num': 100}
test_neg_sample_args = {'distribution': 'uniform', 'sample_num': 100}
24 Jan 15:52 INFO MovieLens-100K
The number of users: 944
Average actions of users: 106.04453870625663
The number of items: 1683
Average actions of items: 59.45303210463734
The number of inters: 100000
The sparsity of the dataset: 93.70575143257098%
Remain Fields: ['user_id', 'item_id', 'rating']
24 Jan 15:52 INFO [Training]: train_batch_size = [2048] train_neg_sample_args: [{'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}]
24 Jan 15:52 INFO [Evaluation]: eval_batch_size = [4096] eval_args: [{'split': {'LS': 'valid_and_test'}, 'order': 'RO', 'group_by': 'user', 'mode': {'valid': 'uni100', 'test': 'uni100'}}]
24 Jan 15:52 INFO Random()
Trainable parameters: 1
24 Jan 15:52 INFO epoch 0 training [time: 0.22s, train loss: 0.0000]
24 Jan 15:52 INFO epoch 1 training [time: 0.19s, train loss: 0.0000]
24 Jan 15:52 INFO epoch 2 training [time: 0.19s, train loss: 0.0000]
24 Jan 15:52 INFO epoch 3 training [time: 0.19s, train loss: 0.0000]
24 Jan 15:52 INFO epoch 4 training [time: 0.19s, train loss: 0.0000]
Traceback (most recent call last):
File "/mnt/./run_recbole_test.py", line 158, in <module>
best_valid_score, best_valid_result = trainer.fit(train_data, valid_data)
File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 464, in fit
valid_score, valid_result = self._valid_epoch(
File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 283, in _valid_epoch
valid_result = self.evaluate(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 616, in evaluate
interaction, scores, positive_u, positive_i = eval_func(batched_data)
File "/usr/local/lib/python3.10/site-packages/recbole/trainer/trainer.py", line 558, in _neg_sample_batch_eval
scores[row_idx, col_idx] = origin_scores
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Expected behavior
Models from the algorithms Random, ADMMSLIM, and SLIMElastic should be trained and evaluated on the MovieLens-100K data set without crashing.
Desktop (please complete the following information):
OS: Linux
RecBole Version: 1.2.0
Python Version: 3.10
PyTorch Version: 2.1.1
cudatoolkit Version: 12.1
I believe this happens during validation and that the same bug was fixed for different models in #1873.
The text was updated successfully, but these errors were encountered:
Describe the bug
Training MovieLens-100K on algorithms Random, ADMMSLIM, and SLIMElastic crashes with exception "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and CPU!"
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Models from the algorithms Random, ADMMSLIM, and SLIMElastic should be trained and evaluated on the MovieLens-100K data set without crashing.
Desktop (please complete the following information):
I believe this happens during validation and that the same bug was fixed for different models in #1873.
The text was updated successfully, but these errors were encountered: