regmix/mixture_config at main · sail-sg/regmix

History

Name		Name	Last commit message	Last commit date
parent directory ..
config_1b		config_1b
config_1m		config_1m
README.md		README.md
synthesize_mixture.py		synthesize_mixture.py
visualize_mixture.py		visualize_mixture.py

README.md

Data Mixture Configuration

This directory contains scripts for synthesizing and visualizing data mixtures used in the experiments described in the paper.

synthesize_mixture.py

This script is used to synthesize data mixtures for training the proxy models (1M models in the paper). You can use the following command to generate the data mixtures:

python synthesize_mixture.py --num_configs 512 --output_folder /path/to/configs

By default, it generates 512 configurations following the settings specified within the script. The configurations are saved in the config_1m directory.

visualize_mixture.py

This script is used to visualize the data mixtures generated for training the proxy models. By default, it visualizes the configurations stored in the config_1m directory. The visualizations are saved in weight_distributions.png.

If you want to visualize a different folder, you can use the following command:

python visualize_mixture.py --config_folder <path_to_config_dir>

Note that the folder must contain several yaml files which starts from n and ends with .yaml.

Weight Distribution

The following image illustrates a possible weight distribution for the data mixtures:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mixture_config

mixture_config

README.md

Data Mixture Configuration

synthesize_mixture.py

visualize_mixture.py

Weight Distribution

Files

mixture_config

Directory actions

More options

Directory actions

More options

Latest commit

History

mixture_config

Folders and files

parent directory

README.md

Data Mixture Configuration

synthesize_mixture.py

visualize_mixture.py

Weight Distribution