SeqSort

Updates

Tue Sep 10 20:14:30 PDT 2024: Reran experiment + figure after bug fix

SeqSort

Sequential Halving is a simple yet powerful multi-armed bandit algorithm used in SOTA RL algorithms such as Gumbel MuZero and EfficientZero V2.

And it turns out SH can be slightly improved with a simple 1-line code change!

Sequential Halving works by allocating a fixed budget of samples across a sequence of sampling rounds. In round one, every arm is sampled. After each round, the worst half of the arms are eliminated. This process continues until you have only one arm (the best arm estimate)[1].

SeqHalving is great because it performs well across many scenarios[1]. Also, it's easy to use: there aren't any hyperparameters to tune when using it.

It turns out that Sequential Halving can be improved with a simple change: Rerank all the arms between rounds without doing any elimination. The sampling round size is still halved at the end of each round, but all arms are eligible to participate in subsequent rounds.

Doing so increases the chances of finding the true best arm. I doubt this is new/novel, and it's only a slight improvement. But still, it's useful to know!

Refs:
[1] Almost Optimal Exploration in Multi-Armed Bandits

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
seqsort.py		seqsort.py
tree_search.py		tree_search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeqSort

About

Releases

Packages

Languages

cgreer/seqsort

Folders and files

Latest commit

History

Repository files navigation

SeqSort

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages