by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi
[arXiv] [Code] [pip Package] [Video]
- 🐣 Easy Usage (Recommended way to use our method)
- 🧪 Advanced Usage
- 🏋️♂️ Trained weights
- 🪀 Results on a Toy Dataset
- 🌴 Directory Tree
- 📃 Citation
- 👁 Contributing
- ❤ About me
- ✨ Extras
- 📝 License
⚠ Caution: TailCalibX is just TailCalib employed multiple times. Specifically, we generate a set of features once every epoch and use them to train the classifier. In order to mimic that, three things must be done at every epoch in the following order:
- Collect all the features from your dataloader.
- Use the
tailcalib
package to make the features balanced by generating samples. - Train the classifier.
- Repeat.
Use the package manager pip to install tailcalib.
pip install tailcalib
Check the instruction here for a much more detailed python package information.
# Import
from tailcalib import tailcalib
# Initialize
a = tailcalib(base_engine="numpy") # Options: "numpy", "pytorch"
# Imbalanced random fake data
import numpy as np
X = np.random.rand(200,100)
y = np.random.randint(0,10, (200,))
# Balancing the data using "tailcalib"
feat, lab, gen = a.generate(X=X, y=y)
# Output comparison
print(f"Before: {np.unique(y, return_counts=True)}")
print(f"After: {np.unique(lab, return_counts=True)}")
- Change the
data_root
for your dataset inmain.py
. - If you are using wandb logging (Weights & Biases), make sure to change the
wandb.init
inmain.py
accordingly.
- For just the methods proposed in this paper :
- For CIFAR100-LT:
run_TailCalibX_CIFAR100-LT.sh
- For mini-ImageNet-LT :
run_TailCalibX_mini-ImageNet-LT.sh
- For CIFAR100-LT:
- For all the results show in the paper :
- For CIFAR100-LT:
run_all_CIFAR100-LT.sh
- For mini-ImageNet-LT :
run_all_mini-ImageNet-LT.sh
- For CIFAR100-LT:
Check Notebooks/Create_mini-ImageNet-LT.ipynb
for the script that generates the mini-ImageNet-LT dataset with varying imbalance ratios and train-test-val splits.
-
--seed
: Select seed for fixing it.- Default :
1
- Default :
-
--gpu
: Select the GPUs to be used.- Default :
"0,1,2,3"
- Default :
-
--experiment
: Experiment number (Check 'libs/utils/experiment_maker.py').- Default :
0.1
- Default :
-
--dataset
: Dataset number.- Choices :
0 - CIFAR100, 1 - mini-imagenet
- Default :
0
- Choices :
-
--imbalance
: Select Imbalance factor.- Choices :
0: 1, 1: 100, 2: 50, 3: 10
- Default :
1
- Choices :
-
--type_of_val
: Choose which dataset split to use.- Choices:
"vt": val_from_test, "vtr": val_from_train, "vit": val_is_test
- Default :
"vit"
- Choices:
-
--cv1
to--cv9
: Custom variable to use in experiments - purpose changes according to the experiment.- Default :
"1"
- Default :
-
--train
: Run training sequence- Default :
False
- Default :
-
--generate
: Run generation sequence- Default :
False
- Default :
-
--retraining
: Run retraining sequence- Default :
False
- Default :
-
--resume
: Will resume from the 'latest_model_checkpoint.pth' and wandb if applicable.- Default :
False
- Default :
-
--save_features
: Collect feature representations.- Default :
False
- Default :
-
--save_features_phase
: Dataset split of representations to collect.- Choices :
"train", "val", "test"
- Default :
"train"
- Choices :
-
--config
: If you have a yaml file with appropriate config, provide the path here. Will override the 'experiment_maker'.- Default :
None
- Default :
Experiment | CIFAR100-LT (ResNet32, seed 1, Imb 100) | mini-ImageNet-LT (ResNeXt50) |
---|---|---|
TailCalib | Git-LFS | Git-LFS |
TailCalibX | Git-LFS | Git-LFS |
CBD + TailCalibX | Git-LFS | Git-LFS |
The higher the Imb ratio
, the more imbalanced the dataset is.
Imb ratio = maximum_sample_count / minimum_sample_count
.
Check this notebook to play with the toy example from which the plot below was generated.
TailCalibX
├── libs
│ ├── core
│ │ ├── ce.py
│ │ ├── core_base.py
│ │ ├── ecbd.py
│ │ ├── modals.py
│ │ ├── TailCalib.py
│ │ └── TailCalibX.py
│ ├── data
│ │ ├── dataloader.py
│ │ ├── ImbalanceCIFAR.py
│ │ └── mini-imagenet
│ │ ├── 0.01_test.txt
│ │ ├── 0.01_train.txt
│ │ └── 0.01_val.txt
│ ├── loss
│ │ ├── CosineDistill.py
│ │ └── SoftmaxLoss.py
│ ├── models
│ │ ├── CosineDotProductClassifier.py
│ │ ├── DotProductClassifier.py
│ │ ├── ecbd_converter.py
│ │ ├── ResNet32Feature.py
│ │ ├── ResNext50Feature.py
│ │ └── ResNextFeature.py
│ ├── samplers
│ │ └── ClassAwareSampler.py
│ └── utils
│ ├── Default_config.yaml
│ ├── experiments_maker.py
│ ├── globals.py
│ ├── logger.py
│ └── utils.py
├── LICENSE
├── main.py
├── Notebooks
│ ├── Create_mini-ImageNet-LT.ipynb
│ └── toy_example.ipynb
├── readme_assets
│ ├── method.svg
│ └── toy_example_output.svg
├── README.md
├── run_all_CIFAR100-LT.sh
├── run_all_mini-ImageNet-LT.sh
├── run_TailCalibX_CIFAR100-LT.sh
└── run_TailCalibX_mini-imagenet-LT.sh
Ignored tailcalib_pip
as it is for the tailcalib
pip package.
@inproceedings{rahul2021tailcalibX,
title = {{Feature Generation for Long-tail Classification}},
author = {Rahul Vigneswaran and Marc T. Law and Vineeth N. Balasubramanian and Makarand Tapaswi},
booktitle = {ICVGIP},
year = {2021}
}
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
🐝 Long-tail buzz : If you are interested in deep learning research which involves long-tailed / imbalanced dataset, take a look at Long-tail buzz to learn about the recent trending papers in this field.