Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/yaboidav3/ORL into main
Browse files Browse the repository at this point in the history
  • Loading branch information
davidzhu27 committed Sep 18, 2024
2 parents 21590e3 + 372a61b commit d1b373d
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 6 deletions.
20 changes: 14 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Set up and install the d4rl environments by following the instructions provided
Clone the GitHub repository and install required packages:

```bash
git clone https://github.com/yaboidav3/ORL.git && cd ORL
git clone https://github.com/uiuc-focal-lab/ORL.git && cd ORL
pip install -r requirements/requirements_dev.txt
```

Expand All @@ -25,14 +25,22 @@ Follow the prompts to create a new project or connect to an existing one. Make s

For more information on how to use Wandb, refer to the [Wandb documentation](https://docs.wandb.ai/).

## Generate Preference Datasets

Run the shell files. They will be written into the `saved` folder.

```bash
. generate_pbrl_datasets.sh
. generate_pbrl_datasets_no_overlap.sh

```
## Run Example

Run the sample Python command. Make sure you have the necessary dependencies installed and the Python environment properly configured.

```bash
. example.sh
```

## Full Experiment and Ablation Study Scripts

To run the full experiment and ablation study, use the following scripts:
Expand All @@ -56,20 +64,20 @@ Execute these scripts in your terminal:

### Main Experiments

This graph shows the comparison between different reward labeling methods: Oracle True Reward, ORL, Latent Reward Model, and IPL with True Reward.
Training log of learning with different methods on different datasets: Oracle True Reward, ORL, Latent Reward Model, and IPL with True Reward

![Graph 1](results/graphs/main_exp.png)

### Ablation Studies

This graph demonstrates the impact of using datasets of different sizes on the performance of the reward labeling method.
Training log of learning with a method on datasets of different sizes

![Graph 2](results/graphs/size.png)

This graph illustrates the performance of the reward labeling method when different Offline RL algorithms are applied.
Comparison between the learning efficiency of ORL combined with different standard offline RL algorithms

![Graph 3](results/graphs/algo.png)

This graph showcases the effect of performing multiple Bernoulli samples to generate preference labels on the performance of the reward labeling method.
Comparison between the cases where single or multiple preference labels are given to each pair of trajectories

![Graph 4](results/graphs/bernoulli.png)
1 change: 1 addition & 0 deletions saved/pbrl_datasets/placeholder.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

1 change: 1 addition & 0 deletions saved/pbrl_datasets_no_overlap/placeholder.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

0 comments on commit d1b373d

Please sign in to comment.