NotImplementedError: Data has duplicate values #18

99sbr · 2022-03-25T09:00:03Z

data_model = ItemColdStartData(
training_data,
*training_data.columns, # userid, itemid
item_features=content_feature_df,
seed=seed)

print(data_model)

HERE IM GETTING ERROR: NotImplementedError: Data has duplicate values

My dataframe has multiple entries for a user. cant drop them. any help here

evfro · 2022-03-26T04:56:24Z

Hi!

The problem is not that your data contains multiple entries for a user, but that your data contains multiple entries of the same user-item pair. It's like having multiple ratings for the same movie from the same user. This is not a standard collaborative filtering scenario.

You need to deduplicate such entries, e.g., like this:

dedup_data = data.drop_duplicates(subset=['userid', 'movieid'])

99sbr · 2022-03-26T09:23:54Z

Understood thanks for the help.

Facing one more blocker. data_model.prepare() kind of takes a lot of time and freezes when I run the step. Any idea why? i know my dataset is big but any optimisation that can be followed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NotImplementedError: Data has duplicate values #18

NotImplementedError: Data has duplicate values #18

99sbr commented Mar 25, 2022 •

edited

Loading

evfro commented Mar 26, 2022

99sbr commented Mar 26, 2022

NotImplementedError: Data has duplicate values #18

NotImplementedError: Data has duplicate values #18

Comments

99sbr commented Mar 25, 2022 • edited Loading

evfro commented Mar 26, 2022

99sbr commented Mar 26, 2022

99sbr commented Mar 25, 2022 •

edited

Loading