Skip to content

Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)

Notifications You must be signed in to change notification settings

Andylau-BIT/Awesome-Visual-Transformer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 

Repository files navigation

Awesome Visual-Transformer Awesome

Collect some Transformer with Computer-Vision (CV) papers. If you find some ignored papers, please open issues or pull requests.

Papers

Transformer original paper

Technical blog

  • [Chinese Blog] 3W字长文带你轻松入门视觉transformer [Link]
  • [Chinese Blog] Vision Transformer 超详细解读 (原理分析+代码解读) [Link]

Survey

  • Transformers in Vision: A Survey [paper] - 2021.01.04
  • A Survey on Visual Transformer [paper] - 2020.12.24

arXiv papers

  • [CvT] CvT: Introducing Convolutions to Vision Transformers[paper] [code]
  • Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding [paper]
  • [TFPose] TFPose: Direct Human Pose Estimation with Transformers [paper]
  • [TransCenter] TransCenter: Transformers with Dense Queries for Multiple-Object Tracking [paper]
  • [ViViT] ViViT: A Video Vision Transformer [paper]
  • [CrossViT] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [paper]
  • [TS-CAM] TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization[paper]
  • Face Transformer for Recognition [paper]
  • On the Adversarial Robustness of Visual Transformers[paper]
  • Understanding Robustness of Transformers for Image Classification[paper]
  • Lifting Transformer for 3D Human Pose Estimation in Video[paper]
  • [GSA-Net] Global Self-Attention Networks for Image Recognition[paper]
  • High-Fidelity Pluralistic Image Completion with Transformers[paper] [code]
  • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows[paper] [code]
  • [DPT] Vision Transformers for Dense Prediction[paper] [code]
  • [TransFG] TransFG: A Transformer Architecture for Fine-grained Recognition?[paper]
  • [TimeSformer] Is Space-Time Attention All You Need for Video Understanding?[paper]
  • Multi-view 3D Reconstruction with Transformer[paper]
  • Can Vision Transformers Learn without Natural Images?[paper] [code]
  • Transformers Solve the Limited Receptive Field for Monocular Depth Prediction[paper]
  • End-to-End Trainable Multi-Instance Pose Estimation with Transformers[paper]
  • Instance-level Image Retrieval using Reranking Transformers[paper] [code]
  • [BossNAS] BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search[paper] [code]
  • [CeiT] Incorporating Convolution Designs into Visual Transformers[paper]
  • [DeepViT] DeepViT: Towards Deeper Vision Transformer[paper]
  • [TNT] Transformer in Transformer[paper] [code]
  • Enhancing Transformer for Video Understanding Using Gated Multi-Level Attention and Temporal Adversarial Training[paper]
  • 3D Human Pose Estimation with Spatial and Temporal Transformers[paper] [code]
  • [SUNETR] SUNETR: Transformers for 3D Medical Image Segmentation[paper]
  • Scalable Visual Transformers with Hierarchical Pooling [paper]
  • [ConViT] ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases [paper]
  • [TransMed] TransMed: Transformers Advance Multi-modal Medical Image Classification [paper]
  • [U-Transformer] U-Net Transformer: Self and Cross Attention for Medical Image Segmentation [paper]
  • [SpecTr] SpecTr: Spectral Transformer for Hyperspectral Pathology Image Segmentation [paper] [code]
  • [TransBTS] TransBTS: Multimodal Brain Tumor Segmentation Using Transformer [paper] [code]
  • [SSTN] SSTN: Self-Supervised Domain Adaptation Thermal Object Detection for Autonomous Driving [paper]
  • [GANsformer] Generative Adversarial Transformers [paper] [code]
  • [PVT] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions [paper] [code]
  • Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer [paper] [code]
  • [MedT] Medical Transformer: Gated Axial-Attention for Medical Image Segmentation [paper] [code]
  • [CPVT] Do We Really Need Explicit Position Encodings for Vision Transformers? [paper] [code]
  • Deepfake Video Detection Using Convolutional Vision Transformer[paper]
  • Training Vision Transformers for Image Retrieval[paper]
  • [TransReID] TransReID: Transformer-based Object Re-Identification[paper]
  • [VTN] Video Transformer Network[paper]
  • [T2T-ViT] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [paper] [code]
  • [BoTNet] Bottleneck Transformers for Visual Recognition [paper]
  • [CPTR] CPTR: Full Transformer Network for Image Captioning [paper]
  • Learn to Dance with AIST++: Music Conditioned 3D Dance Generation [paper] [code]
  • [Trans2Seg] Segmenting Transparent Object in the Wild with Transformer [paper] [code]
  • [SMCA] Fast Convergence of DETR with Spatially Modulated Co-Attention [paper]
  • Investigating the Vision Transformer Model for Image Retrieval Tasks [paper]
  • [Trear] Trear: Transformer-based RGB-D Egocentric Action Recognition [paper]
  • [VisualSparta] VisualSparta: Sparse Transformer Fragment-level Matching for Large-scale Text-to-Image Search [paper]
  • [TrackFormer] TrackFormer: Multi-Object Tracking with Transformers [paper]
  • [LETR] Line Segment Detection Using Transformers without Edges [paper]
  • [TAPE] Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry [paper]
  • [TRIQ] Transformer for Image Quality Assessment [paper] [code]
  • [TransTrack] TransTrack: Multiple-Object Tracking with Transformer [paper] [code]
  • [SETR] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [paper] [code]
  • [TransPose] TransPose: Towards Explainable Human Pose Estimation by Transformer [paper]
  • [DeiT] Training data-efficient image transformers & distillation through attention [paper] [code]
  • [Pointformer] 3D Object Detection with Pointformer [paper]
  • [ViT-FRCNN] Toward Transformer-Based Object Detection [paper]
  • [Taming-transformers] Taming Transformers for High-Resolution Image Synthesis [paper] [code]
  • [SceneFormer] SceneFormer: Indoor Scene Generation with Transformers [paper]
  • [PCT] PCT: Point Cloud Transformer [paper]
  • [METRO] End-to-End Human Pose and Mesh Reconstruction with Transformers [paper]
  • [PointTransformer] Point Transformer[paper]
  • [PED] DETR for Pedestrian Detection[paper]
  • Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry[paper]
  • [C-Tran] General Multi-label Image Classification with Transformers[paper]
  • [TSP-FCOS] Rethinking Transformer-based Set Prediction for Object Detection[paper]
  • [ACT] End-to-End Object Detection with Adaptive Clustering Transformer[paper]
  • [STTR] Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers[paper] [code]
  • [VTs] Visual Transformers: Token-based Image Representation and Processing for Computer Vision[paper]

2021

  • [NDT-Transformer] NDT-Transformer: Large-Scale 3D Point Cloud Localisation using the Normal Distribution Transform Representation (ICRA)[paper]
  • [TransT] Transformer Tracking (CVPR) [paper] [code]
  • Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking (CVPR oral) [paper]
  • [VisTR] End-to-End Video Instance Segmentation with Transformers (CVPR) [paper]
  • Transformer Interpretability Beyond Attention Visualization (CVPR) [paper] [code]
  • [IPT] Pre-Trained Image Processing Transformer(CVPR) [paper]
  • [UP-DETR] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers (CVPR) [paper]
  • [Vision Transformer] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(ICLR)[paper] [code]
  • [Deformable DETR] Deformable DETR: Deformable Transformers for End-to-End Object Detection(ICLR)[paper] [code]
  • [LAMBDANETWORKS] MODELING LONG-RANGE INTERACTIONS WITHOUT ATTENTION (ICLR) paper] [code]
  • [LSTR] End-to-end Lane Shape Prediction with Transformers(WACV) [paper] [code]

2020

  • [DETR] End-to-End Object Detection with Transformers (ECCV) [paper] [code]
  • [FPT] Feature Pyramid Transformer(CVPR) [paper] [code]
  • [TTSR] Learning Texture Transformer Network for Image Super-Resolution(CVPR) [paper] [code]
  • [STTN] Learning Joint Spatial-Temporal Transformations for Video Inpainting(ECCV) [paper] [code]

Acknowledgement

Thanks the template from Awesome-Crowd-Counting

About

Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published