Merge branch 'main' of https://github.com/yaboidav3/ORL into main

uiuc-focal-lab · Sep 18, 2024 · d1b373d · d1b373d
2 parents 21590e3 + 372a61b
commit d1b373d
Show file tree

Hide file tree

Showing 3 changed files with 16 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -9,7 +9,7 @@ Set up and install the d4rl environments by following the instructions provided
 Clone the GitHub repository and install required packages:
 
 ```bash
-git clone https://github.com/yaboidav3/ORL.git && cd ORL
+git clone https://github.com/uiuc-focal-lab/ORL.git && cd ORL
 pip install -r requirements/requirements_dev.txt
 ```
 
@@ -25,14 +25,22 @@ Follow the prompts to create a new project or connect to an existing one. Make s
 
 For more information on how to use Wandb, refer to the [Wandb documentation](https://docs.wandb.ai/).
 
+## Generate Preference Datasets
+
+Run the shell files. They will be written into the `saved` folder.
+
+```bash
+. generate_pbrl_datasets.sh
+. generate_pbrl_datasets_no_overlap.sh
+
+```
 ## Run Example
 
 Run the sample Python command. Make sure you have the necessary dependencies installed and the Python environment properly configured.
 
 ```bash
 . example.sh
 ```
-
 ## Full Experiment and Ablation Study Scripts
 
 To run the full experiment and ablation study, use the following scripts:
@@ -56,20 +64,20 @@ Execute these scripts in your terminal:
 
 ### Main Experiments
 
-This graph shows the comparison between different reward labeling methods: Oracle True Reward, ORL, Latent Reward Model, and IPL with True Reward.
+Training log of learning with different methods on different datasets: Oracle True Reward, ORL, Latent Reward Model, and IPL with True Reward
 
 ![Graph 1](results/graphs/main_exp.png)
 
 ### Ablation Studies
 
-This graph demonstrates the impact of using datasets of different sizes on the performance of the reward labeling method.
+Training log of learning with a method on datasets of different sizes
 
 ![Graph 2](results/graphs/size.png)
 
-This graph illustrates the performance of the reward labeling method when different Offline RL algorithms are applied.
+Comparison between the learning efficiency of ORL combined with different standard offline RL algorithms
 
 ![Graph 3](results/graphs/algo.png)
 
-This graph showcases the effect of performing multiple Bernoulli samples to generate preference labels on the performance of the reward labeling method.
+Comparison between the cases where single or multiple preference labels are given to each pair of trajectories
 
 ![Graph 4](results/graphs/bernoulli.png)
diff --git a/saved/pbrl_datasets/placeholder.txt b/saved/pbrl_datasets/placeholder.txt
@@ -0,0 +1 @@
+
diff --git a/saved/pbrl_datasets_no_overlap/placeholder.txt b/saved/pbrl_datasets_no_overlap/placeholder.txt
@@ -0,0 +1 @@
+