-
Download the raw atomic files from RecBole's dataset hub. [link]
-
Sample 6,040 (all) users and corresponding candidate items for experiments.
# generate ml-1m-full.random cd llmrank/ python sample_candidates.py -d ml-1m-full -u 6040 -s random
Note that
ml-1m-full.random
has the formatuser_id<\t>candidate_item_1 candidate_item_2 ...
:5077 1116 884 1081 ... 5075 3913 749 445 ...
For each user in
ml-1m-full.random
, there are 100 randomly selected candidate items. We store these candidates for fair comparison between different variants.When we do experiments using
python evaluate.py -m Rank -d ml-1m-full
, the number of items used to construct the candidate set depends on the hyperparameterrecall_budget
. For example, whenrecall_budget=20
, the first 20 items of each line inml-1m-full.random
will be used.Then if
has_gt=True
(ground truth items are guaranteed to appear in the candidate set), the ground truth item will be appended into the candidate set (implemented intrainer.py
, lines 70-90). -
For different candidate retrieval strategies, please replace
-s random
to-s bm25
,-s bert
,-s pop
,-s bpr
,-s gru4rec
, or-s sasrec
in step 2. -
For generating candidates which are retrieved by multiple strategies (
.rabmbepobpgrsa_3
, denotes for concatenating the first two letters of each strategy name).python mix_sources.py
-
Download raw datasets from Amazon review data 2018, including the metadata and ratings only data of each category. [link] Here is an example.
dataset/ raw/ Metadata/ meta_Video_Games.json.gz Ratings/ Video_Games.csv
-
Process downstream datasets.
cd llmrank/ python data_process_amazon.py --dataset Games
Note that following [UniSRec], we split the data into separate files for training, validation and evaluation:
.train.inter
,.valid.inter
,.test.inter
. -
Repeat the second step in processing ML-1M datasets, for example:
# generate Games-6k.random cd llmrank/ python sample_candidates.py -d Games-6k -u 6000 -s random
- Generate item embedding from item related text (title).
cd llmrank/ python dataset/unisrec_auxiliary_files_process.py
- Run step 1 of UniSRec to obtain
*.feat1CLS
. - Generate the item index file.
python dataset/vqrec_auxiliary_files_process.py --dataset ml-1m