Data Preparation Guideline

Download the raw atomic files from RecBole's dataset hub. [link]
Sample 6,040 (all) users and corresponding candidate items for experiments.
```
# generate ml-1m-full.random
cd llmrank/
python sample_candidates.py -d ml-1m-full -u 6040 -s random
```
Note that ml-1m-full.random has the format user_id<\t>candidate_item_1 candidate_item_2 ...:
```
5077	1116 884 1081 ...
5075	3913 749 445 ...
```
For each user in ml-1m-full.random, there are 100 randomly selected candidate items. We store these candidates for fair comparison between different variants.

When we do experiments using python evaluate.py -m Rank -d ml-1m-full, the number of items used to construct the candidate set depends on the hyperparameter recall_budget. For example, when recall_budget=20, the first 20 items of each line in ml-1m-full.random will be used.

Then if has_gt=True (ground truth items are guaranteed to appear in the candidate set), the ground truth item will be appended into the candidate set (implemented in trainer.py, lines 70-90).
For different candidate retrieval strategies, please replace -s random to -s bm25, -s bert, -s pop, -s bpr, -s gru4rec, or -s sasrec in step 2.
For generating candidates which are retrieved by multiple strategies (.rabmbepobpgrsa_3, denotes for concatenating the first two letters of each strategy name).
```
python mix_sources.py
```

Repeat the second step in processing ML-1M datasets, for example:

# generate Games-6k.random
cd llmrank/
python sample_candidates.py -d Games-6k -u 6000 -s random

Auxiliary Files of UniSRec / VQ-Rec Preparation Guideline

Generate item embedding from item related text (title).

cd llmrank/
python dataset/unisrec_auxiliary_files_process.py

Generate the item index file.

python dataset/vqrec_auxiliary_files_process.py --dataset ml-1m