Openclip-ait

Tested on RTX3080: Model: ViT-L-14::laion2b-s32b-b82k

shape	pt (ms)	ait (ms)	without `flash_attn`	mean idff	max diff
(1, 77)	8.6888	0.8523	1.6269	0.00335	0.01758
(2, 77)	8.7543	0.9854	2.0161	0.00333	0.01782
(4, 77)	8.7231	1.2459	2.8970	0.00358	0.04297
(8, 77)	9.4466	2.0201	4.8552	0.00355	0.03906
(16, 77)	10.0222	3.4399	8.7880	0.00333	0.03906
(1, 224, 224, 3)	18.0799	3.7753	8.4608
(2, 224, 224, 3)	17.9421		8.4604

Model: ViT-g-14::laion2b-s12b-b42k

shape	pt (ms)	ait (ms)	without `flash_attn`	mean idff	max diff
(1, 77)	-		-
(1, 224, 224, 3)	30.3925		13.8009

Known Issues:

Index Tensor with Tensor not supported in encode_text (see: facebookincubator/AITemplate#49)
Vit-g-14 with head_size=88 not supported by flash attention (see: facebookincubator/AITemplate#53)

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
aitemplate @ bbc311b		aitemplate @ bbc311b
assets		assets
modeling		modeling
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
benchmark.py		benchmark.py
compile.py		compile.py
compile_text.py		compile_text.py
compile_vision.py		compile_vision.py
huggingface_layers.txt		huggingface_layers.txt
openclip_layer.txt		openclip_layer.txt
openclip_vision_ait_layer.txt		openclip_vision_ait_layer.txt