Name		Name	Last commit message	Last commit date
parent directory ..
__pycache__		__pycache__
configs		configs
data		data
datasets		datasets
docs		docs
models		models
scripts		scripts
util		util
README.md		README.md
engine.py		engine.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
train_ddp.py		train_ddp.py
train_ds.py		train_ds.py
train_hvd.py		train_hvd.py

README.md

VLTVG-PyTorch with DDP, Horovod, and DeepSpeed

This repository contains the PyTorch implementations using Distributed Data Parallel (DDP), Horovod, and DeepSpeed for Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning. These implementations are intended for usage with the ALCF. Follow the instructions below to get started.

Common Setup

1. Dataset Preparation

Prepare the datasets as instructed in the VLTVG repository.

2. Conda Environment Setup

Load the Conda environment using the following commands:

# For PyTorch and Horovod
module load conda 
conda activate

# For DeepSpeed
module load conda/2023-01-10-unstable
conda activate

3. Python Virtual Environment Setup

First Time Setup:

Create and activate the Python virtual environment, and install required packages:

# Create Python virtual environment
python -m venv --system-site-packages vltvg
source vltvg/bin/activate

# Install required packages
pip install -r requirements.txt

Activations:

Activate the virtual environment using:

source vltvg/bin/activate

Running with Different Implementations

PyTorch DDP

aprun -n 8 -N 4 python train_ddp.py --config configs/VLTVG_R101_referit_ddp.py --checkpoint_latest --checkpoint_best

Horovod

aprun -n 8 -N 4 python train_hvd.py --config configs/VLTVG_R101_referit_ddp.py --checkpoint_latest --checkpoint_best

DeepSpeed

mpiexec --verbose \
  --envall -n 8 \
  --ppn 4 \
  --hostfile "${PBS_NODEFILE}" python train_ds.py \
  --config configs/VLTVG_R101_referit_ddp.py --polaris_nodes 2 \
  --checkpoint_latest --checkpoint_best \
  --deepspeed_config scripts/deepspeed/ds_config.json

Additional Information

For additional examples, refer to the scripts folder. Update directories and configurations accordingly for your specific setup.

Below is VLTVG's original README:

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

This is the official implementation of Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

Introduction

Our proposed framework for visual grounding. With the features from the two modalities as input, the visual-linguistic verification module and language-guided context encoder establish discriminative features for the referred object. Then, the multi-stage cross-modal decoder iteratively mulls over all the visual and linguistic features to identify and localize the object.

Visualization

For different input images and texts, we visualize the verification scores, the iterative attention maps of the multi-stage decoder, and the final localization results.

Model Zoo

The models are available in Google Drive.

	RefCOCO			RefCOCO+			RefCOCOg			ReferItGame	Flickr30k
	val	testA	testB	val	testA	testB	val-g	val-u	test-u	test	test
R50	84.53	87.69	79.22	73.60	78.37	64.53	72.53	74.90	73.88	71.60	79.18
R101	84.77	87.24	80.49	74.19	78.93	65.17	72.98	76.04	74.18	71.98	79.84

Installation

Clone the repository.

git clone https://github.com/yangli18/VLTVG

Install PyTorch 1.5+ and torchvision 0.6+.

conda install -c pytorch pytorch torchvision

Install the other dependencies.
```
pip install -r requirements.txt
```

Preparation

Please refer to get_started.md for the preparation of the datasets and pretrained checkpoints.

Training

The following is an example of model training on the RefCOCOg dataset.

python -m torch.distributed.launch --nproc_per_node=4 --use_env train.py --config configs/VLTVG_R50_gref.py

We train the model on 4 GPUs with a total batch size of 64 for 90 epochs. The model and training hyper-parameters are defined in the configuration file VLTVG_R50_gref.py. We prepare the configuration files for different datasets in the configs/ folder.

Evaluation

Run the following script to evaluate the trained model with a single GPU.

python test.py --config configs/VLTVG_R50_gref.py --checkpoint VLTVG_R50_gref.pth --batch_size_test 16 --test_split val

Or evaluate the trained model with 4 GPUs:

python -m torch.distributed.launch --nproc_per_node=4 --use_env test.py --config configs/VLTVG_R50_gref.py --checkpoint VLTVG_R50_gref.pth --batch_size_test 16 --test_split val

Citation

If you find our code useful, please cite our paper.

@inproceedings{yang2022vgvl,
  title={Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning},
  author={Yang, Li and Xu, Yan and Yuan, Chunfeng and Liu, Wei and Li, Bing and Hu, Weiming},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

Acknowledgement

Part of our code is based on the previous works DETR and ReSC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLTVG

VLTVG

README.md

VLTVG-PyTorch with DDP, Horovod, and DeepSpeed

Common Setup

1. Dataset Preparation

2. Conda Environment Setup

3. Python Virtual Environment Setup

First Time Setup:

Activations:

Running with Different Implementations

PyTorch DDP

Horovod

DeepSpeed

Additional Information

Below is VLTVG's original README:

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

Introduction

Visualization

Model Zoo

Installation

Preparation

Training

Evaluation

Citation

Acknowledgement

Files

VLTVG

Directory actions

More options

Directory actions

More options

Latest commit

History

VLTVG

Folders and files

parent directory

README.md

VLTVG-PyTorch with DDP, Horovod, and DeepSpeed

Common Setup

1. Dataset Preparation

2. Conda Environment Setup

3. Python Virtual Environment Setup

First Time Setup:

Activations:

Running with Different Implementations

PyTorch DDP

Horovod

DeepSpeed

Additional Information

Below is VLTVG's original README:

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

Introduction

Visualization

Model Zoo

Installation

Preparation

Training

Evaluation

Citation

Acknowledgement