Skip to content

cgreer/seqsort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Updates

  • Tue Sep 10 20:14:30 PDT 2024: Reran experiment + figure after bug fix

SeqSort

Sequential Halving is a simple yet powerful multi-armed bandit algorithm used in SOTA RL algorithms such as Gumbel MuZero and EfficientZero V2.

And it turns out SH can be slightly improved with a simple 1-line code change!

Results

Sequential Halving works by allocating a fixed budget of samples across a sequence of sampling rounds. In round one, every arm is sampled. After each round, the worst half of the arms are eliminated. This process continues until you have only one arm (the best arm estimate)[1].

Results

SeqHalving is great because it performs well across many scenarios[1]. Also, it's easy to use: there aren't any hyperparameters to tune when using it.

Results

It turns out that Sequential Halving can be improved with a simple change: Rerank all the arms between rounds without doing any elimination. The sampling round size is still halved at the end of each round, but all arms are eligible to participate in subsequent rounds.

Results

Results

Doing so increases the chances of finding the true best arm. I doubt this is new/novel, and it's only a slight improvement. But still, it's useful to know!

Refs:
[1] Almost Optimal Exploration in Multi-Armed Bandits

About

Slight Improvement to Sequential Halving

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages