Bottleneck Transformers for Visual Recognition

Update 2021/03/14

Model	heads	Params (M)	Acc (%)
ResNet50 baseline (ref)		23.5M	93.62
BoTNet-50	1	18.8M	95.11%
BoTNet-50	4	18.8M	95.78%
BoTNet-S1-50	1	18.8M	95.67%
BoTNet-S1-59	1	27.5M	95.98%
BoTNet-S1-77	1	44.9M	wip

from model import Model

model = ResNet50(num_classes=1000, resolution=(224, 224))
x = torch.randn([2, 3, 224, 224])
print(model(x).size())

from model import MHSA

resolution = 14
mhsa = MHSA(planes, width=resolution, height=resolution)

Paper link
Author: Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani
Organization: UC Berkeley, Google Research

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
model.py		model.py
preprocess.py		preprocess.py