Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification

Yuhao Wang · Pingping Zhang* · Xuehu Liu · Zhengzheng Tu · Huchuan Lu

~ 2024 Paper

In this paper, we propose a ﬂexible fusion framework named FusionReID for image-based person ReID. It comprises a Dual-branch Feature Extraction (DFE) and a Dual-attention Mutual Fusion (DMF). In DFE, we employ CNNs and Transformers to extract deep features from a single image. Besides, DMF consists of the Local Reﬁnement Unit (LRU) and Heterogenous Transmission Module (HTM). Through the continuous stacking of HTM, we unify heterogenous deep features from CNNs and Transformers. Experiments on three large-scale ReID benchmarks demonstrate that our method attains superior performances than most state-of-the-arts. Since the computation is still high, in the future, we will explore more lightweight fusion methods for the framework.

News

Introduction

Existing methods for person re-identification (ReID) face challenges due to variations in viewpoints, lighting, and postures, leading to significant differences in appearance. CNN-based methods excel at capturing local features but lack a global perspective, while Transformer-based methods capture global representations but struggle with fine-grained details. To address these issues, we propose FusionReID, a novel framework that unifies the strengths of CNNs and Transformers. Our approach involves a Dual-branch Feature Extraction (DFE) and a Dual-attention Mutual Fusion (DMF) module, enabling effective feature representation by combining local and global information.

Contributions

We propose a new fusion framework called FusionReID to unify the strengths of CNNs and Transformers for image-based person ReID.
We design a novel Dual-attention Mutual Fusion (DMF), which can generate more discriminative features with stacking Heterogenous Transmission Modules (HTM).
Our proposed framework achieves superior performances than most state-of-the-art methods on three public person ReID benchmarks.

Results

Overall Performance

Combination of CNN and Transformer

Visualizations

Grad-CAM

Attention Weights

Please refer to our paper for more details.

Reproduction

Datasets

Market1501、DukeMTMC、MSMT17
link: https://pan.baidu.com/s/1LT2au658lHPF6qovQJa8hA
code: gnwg

Pretrained

ViT-B、DeiT-B、T2T-ViT-24、T2T-ViT-14、ResNet50、ResNet101、ResNet152
link: https://pan.baidu.com/s/1lc0MPKWcDw4r1wT_d7MN7A
code: vdii

Bash

# python = 3.8
# cuda = 11.7
#!/bin/bash
source activate (your env)
cd ../(your path)
pip install -r requirements.txt
python train_net.py --config_file ../configs/MSMT17/msmt_vitb12_res50_layer2.yml

Star History

Citation

If you find FusionReID useful in your research, please consider citing:

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.idea		.idea
assets		assets
config		config
configs		configs
data		data
engine		engine
layers		layers
modeling		modeling
solver		solver
tools		tools
utils		utils
README.md		README.md
dukemtmc.sh		dukemtmc.sh
market1501.sh		market1501.sh
msmt17.sh		msmt17.sh
requirements.txt		requirements.txt
test_net.py		test_net.py
test_net.sh		test_net.sh
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification

News

Table of Contents

Introduction

Contributions