Skip to content

Latest commit

 

History

History
119 lines (86 loc) · 2.82 KB

ob2-llms-suffer-from-position-bias-and-popularity-bias.md

File metadata and controls

119 lines (86 loc) · 2.82 KB

Observation 2. LLMs suffer from position bias and popularity bias

LLMs suffer from position bias and popularity bias while ranking, which can be alleviated by specially designed prompting or bootstrapping strategies.

The order of candidates affects the ranking results of LLMs

Figure 3 (a)

We vary the position of ground-truth items at {0, 5, 10, 15, 19} and present the ranking results in Figure 3 (a).

  • We first write a bash script file.

    # ML-1M
    for pos in 0 5 10 15 19 ; do
      python evaluate.py -m Rank -d ml-1m-full --fix_pos=$pos
    done
    # Games
    for pos in 0 5 10 15 19 ; do
      python evaluate.py -m Rank -d Games-6k --fix_pos=$pos
    done
  • Then we execute the bash script file.

    cd llmrank/
    
    bash pos_bias.sh

Alleviating position bias via bootstrapping

Figure 3 (b)

We rank the candidate set repeatedly for $B$ times, with candidates randomly shuffled at each round.

  • Ours

    cd llmrank/
    
    # ML-1M
    python evaluate.py -m Rank -d ml-1m-full
    
    # Games
    python evaluate.py -m Rank -d Games-6k
  • Ours + bootstrapping

    cd llmrank/
    
    # ML-1M
    python evaluate.py -m Rank -d ml-1m-full --boots=3
    
    # Games
    python evaluate.py -m Rank -d Games-6k --boots=3

Popularity degrees of candidates affect ranking results of LLMs

Figure 3 (c)

We report the item popularity score at each position of the ranked item lists (parsed from logs).

```bash
cd llmrank/

# ML-1M
python parse_pop.py -m Rank -lp Rank-ml-1m-Jun-07-2023_13-09-03-3c1e76.log

# Games
python parse_pop.py -m Rank -d Games -lp Rank-Games-Jun-07-2023_13-09-05-55eec9.log
```

-lp refers to path of log file, which should be replaced by the real file generated by our project.

Making LLMs focus on historical interactions helps reduce popularity bias

Figure 3 (d)

We instruct LLMs to focus on histroical interacctions, thereby reducing the effects of popularity bias.

  • We first write a bash script file.

    # ML-1M
    for his_len in 5 10 20 30 40 50 ; do
      python evaluate.py -m Rank -d ml-1m-full --max_his_len=$his_len
    done
    # Games
    for his_len in 5 10 20 30 40 50 ; do
      python evaluate.py -m Rank -d Games-6k --max_his_len=$his_len
    done
  • Then we execute the bash script file.

    cd llmrank/
    
    bash alleviate_pop_bias.sh