diff --git a/README.md b/README.md index 9a3bb07..ae2101c 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ Set up and install the d4rl environments by following the instructions provided Clone the GitHub repository and install required packages: ```bash -git clone https://github.com/yaboidav3/ORL.git && cd ORL +git clone https://github.com/uiuc-focal-lab/ORL.git && cd ORL pip install -r requirements/requirements_dev.txt ``` @@ -25,6 +25,15 @@ Follow the prompts to create a new project or connect to an existing one. Make s For more information on how to use Wandb, refer to the [Wandb documentation](https://docs.wandb.ai/). +## Generate Preference Datasets + +Run the shell files. They will be written into the `saved` folder. + +```bash +. generate_pbrl_datasets.sh +. generate_pbrl_datasets_no_overlap.sh + +``` ## Run Example Run the sample Python command. Make sure you have the necessary dependencies installed and the Python environment properly configured. @@ -32,7 +41,6 @@ Run the sample Python command. Make sure you have the necessary dependencies ins ```bash . example.sh ``` - ## Full Experiment and Ablation Study Scripts To run the full experiment and ablation study, use the following scripts: @@ -56,20 +64,20 @@ Execute these scripts in your terminal: ### Main Experiments -This graph shows the comparison between different reward labeling methods: Oracle True Reward, ORL, Latent Reward Model, and IPL with True Reward. +Training log of learning with different methods on different datasets: Oracle True Reward, ORL, Latent Reward Model, and IPL with True Reward ![Graph 1](results/graphs/main_exp.png) ### Ablation Studies -This graph demonstrates the impact of using datasets of different sizes on the performance of the reward labeling method. +Training log of learning with a method on datasets of different sizes ![Graph 2](results/graphs/size.png) -This graph illustrates the performance of the reward labeling method when different Offline RL algorithms are applied. +Comparison between the learning efficiency of ORL combined with different standard offline RL algorithms ![Graph 3](results/graphs/algo.png) -This graph showcases the effect of performing multiple Bernoulli samples to generate preference labels on the performance of the reward labeling method. +Comparison between the cases where single or multiple preference labels are given to each pair of trajectories ![Graph 4](results/graphs/bernoulli.png) diff --git a/saved/pbrl_datasets/placeholder.txt b/saved/pbrl_datasets/placeholder.txt new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/saved/pbrl_datasets/placeholder.txt @@ -0,0 +1 @@ + diff --git a/saved/pbrl_datasets_no_overlap/placeholder.txt b/saved/pbrl_datasets_no_overlap/placeholder.txt new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/saved/pbrl_datasets_no_overlap/placeholder.txt @@ -0,0 +1 @@ +