Codebase documentation

Setup Instructions

These instructions will guide you through the process of setting up the project environment. Please follow each step carefully.

Clone the Repository: Start by cloning the repository to your local machine.

git clone [email protected]:YerevaNN/incontext_spurious.git
cd incontext_spurious

Configure Environment Variables:
- Copy the sample environment file:
```
cp .env.sample .env
```
- Open the .env file and fill in the necessary environment variables as specified in the file.
Create a Conda Environment:
- Create a new Conda environment using the provided environment file:
```
conda env create
```
- Activate the new environment:
```
conda activate incontext_spurious
```
Initialize Aim logger:
```
aim init
```

Data Download and Preparation Instructions

Overview:

The dataset required for this process will be automatically downloaded during the encoding extraction run, eliminating the need for manual downloading!

Extracting and Saving Encodings

To begin, run the script for extracting and saving encodings. By default, this uses the dinov2_vitb14 configuration.
```
python run.py --config-name=extract_encodings
```
Computing Average Norm and Generating Tokens

This step involves computing the average norm of encoding vectors. It also generates fixed tokens. During training, you have the option to use these fixed tokens or generate new ones for each instance.
```
python run.py --config-name=compute_encodings_avg_norm_and_generate_tokens
```
Generating and Saving Validation Sets

In this step, the script generates and saves validation sets.
```
python run.py --config-name=generate_and_save_val_sets datamodule.inner_train_len=null datamodule.saved_val_sets_path=null
```
Note: The command line arguments provided are essential for overriding the default configurations of the datamodule. These default settings are tailored for training purposes and might cause errors if not adjusted for this script.

❗❗❗❗❗❗❗ Attention ❗❗❗❗❗❗❗

Points 2 and 3 should only be executed by one individual to ensure consistency in the validation sets.
These steps have already been completed. The generated files (avg_norms and context_val_sets) are available. Team members can access their location via the Notion documentation.

Training

This section provides instructions for running the training script, using Hydra, a Python library, for configuration management. Hydra allows for flexible and powerful configuration, enabling you to modify settings directly from the command line.

Running the Training Script

python run.py

Customizing Configurations with Hydra

Hydra configurations provide a flexible way to adjust training parameters. You can modify these configurations either directly in the configuration files or via the command line.

Reviewing Configurations in Files:
- Configuration files are located in the configs folder.
- The train.yaml file is the root configuration file for the training script.
Command Line Configuration Overrides (Recommended):
- Hydra allows you to override configurations directly from the command line, which is the recommended approach (alternatively, you can modify the config files).
- This method is quick and does not require modifying the configuration files directly.
Example Command:
```
python run.py optimizer=adam optimizer.lr=0.01
```
In this example, the optimizer is set to adam, and the learning rate (lr) is set to 0.01.

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
configs		configs
notebooks		notebooks
src		src
.env.sample		.env.sample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codebase documentation

Setup Instructions

Data Download and Preparation Instructions

❗❗❗❗❗❗❗ Attention ❗❗❗❗❗❗❗

Training

Running the Training Script

Customizing Configurations with Hydra

About

Releases

Packages

Contributors 3

Languages

License

YerevaNN/incontext_spurious

Folders and files

Latest commit

History

Repository files navigation

Codebase documentation

Setup Instructions

Data Download and Preparation Instructions

❗❗❗❗❗❗❗ Attention ❗❗❗❗❗❗❗

Training

Running the Training Script

Customizing Configurations with Hydra

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages