CVPR2021最新信息及已接收论文/代码(持续更新)

官网链接：http://cvpr2021.thecvf.com
开会时间：2021年6月19日-6月25日
论文接收公布时间：2021年2月28日

接收论文IDs：

CVPR 2021 接收论文列表！27%接受率！

❗❗❗🌟🌟🌟 CVPR 2021 收录论文已全部公布，下载可在【我爱计算机视觉】后台回复“CVPR2021”，即可收到。共计 1660 篇。

❗❗❗🌟🌟🌟 全部论文已粗略分类完毕，请查阅。

❗❗❗注：后续论文的细致分类汇总将发布在公众号【OpenCV中文网】，敬请关注。

历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~

2022 年论文分类汇总戳这里

↘️CVPR-2022-Papers ↘️WACV-2022-Papers

2021年论文分类汇总戳这里

↘️ICCV-2021-Papers ↘️CVPR-2021-Papers

2020 年论文分类汇总戳这里

↘️CVPR-2020-Papers ↘️ECCV-2020-Papers

🐶	🐭	🐹	🐯
73.Object Re-identification(物体重识别)	72.Gaze Estimation(视线估计)	71.Image-to-Image Translation(图像到图像翻译)	70.NLP(自然语言处理)
68.Crowd Counting(计数)	67.Defect Detection(缺陷检测)	66.Optical Flow Estimation(光流估计)	65.Style Transfer(风格迁移)
64.Speech processing(语音处理)	63.Image Processing(图像处理)	62.Free-Hand Sketches(手绘草图识别)	61.算法
60. SLAM/AR/机器人	59.深度学习模型	58.Metric Learning(度量学习/相似度学习)	57.Sign Language Recognition(手语识别)
56.Computational Photography(光学、几何、光场成像、计算摄影)	55.Graph Matching(图匹配)	54.Emotion Perception(情绪感知/情感预测)	53.Dataset(数据集)
52. Image Generation/Synthesis(图像生成)	51.Contrastive Learning(对比学习)	50.OCR	49.Adversarial Learning(对抗学习)
48.Image Representation(图像表示)	47.Vision-Language(视觉语言)	46.Human-Object Interaction(人物交互)	45.Camera Localization(相机定位)
44. Image/video Captioning(图像/视频字幕)	43.Active Learning(主动学习)	42.Scene Flow Estimation(场景流估计)	41. Representation Learning(表示学习（图像+字幕）)
40.Superpixel (超像素)	39.Debiasing(去偏见)	38.Class-Incremental learning(类增量学习)	37.Continual Learning(持续学习)
36.Action Detection and Recognition(动作检测与识别)	35.Image Clustering(图像聚类)	34.Image/Fine-Grained Classification(图像分类/细粒度分类)	33.6D Pose Estimation(6D位姿估计)
32.View Synthesis(视图合成)	31.Open-Set Recognition(开放集识别)	30.Neural rendering(神经渲染)	29.Human Pose Estimation(人体姿态估计)
28.Dense prediction(密集预测)	27.Semantic Line Detection(语义线检测)	26.Video Processing(视频相关技术)	25.3D(三维视觉)
24.Reinforcement Learning(强化学习)	23.Autonomous Driving(自动驾驶)	22.Medical Imaging(医学影像)	21.Transformer/Self-attention
20.Person Re-Identification(人员重识别)	19.Quantization/Pruning/Knowledge Distillation/Model Compression(量化、剪枝、蒸馏、模型压缩/扩展与优化)	18.Aeria/Drones/Satellite/RS Image(航空影像/无人机)	17.Super-Resolution(超分辨率)
16.Visual Question Answering(视觉问答)	15.GAN	14.Few-Shot/Zero-Shot Learning,Domain Generalization/Adaptation(小/零样本学习，域适应，域泛化)	13.Image/Video Retrieval(图像/视频检索)
12.Image Quality Assessment(图像质量评估)	11. Face(人脸技术)	10.Neural Architecture Search(神经架构搜索)	9.Object Tracking(目标跟踪)
8.Image Segmentation(图像分割)	7.Object Detection(目标检测)	6.Data Augmentation(数据增广)	5.Anomaly Detection(异常检测)
4.Weakly Supervised/Semi-Supervised/Self-supervised/Unsupervised Learning(自/半/弱监督学习)	3.Point Cloud(点云)	2.Graph Neural Networks(图卷积网络GNN)	1.Unkown(未分类)

74.Place Recognition(位置识别)

SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud Based Place Recognition
😮oral⭐code

73.Object Re-identification(物体重识别)

Refining Pseudo Labels with Clustering Consensus over Generations for Unsupervised Object Re-identification

72.Gaze Estimation(视线估计)

Weakly-Supervised Physically Unconstrained Gaze Estimation
😮oral⭐code
Gaze 目标检测
- Dual Attention Guided Gaze Target Detection in the Wild
  ⭐code

71.Image-to-Image Translation(图像到图像翻译)

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
⭐code
CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation
😮oral🏠project
解读：CoCosNet v2解锁“高配版”图像翻译
Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation
Saliency-Guided Image Translation
Not Just Compete, but Collaborate: Local Image-to-Image Translation via Cooperative Mask Prediction
Unpaired Image-to-Image Translation via Latent Energy Transport
⭐code
图像翻译
- Unbalanced Feature Transport for Exemplar-Based Image Translation
- The Spatially-Correlative Loss for Various Image Translation Tasks
  ⭐code🏠project📺video

70.NLP(自然语言处理)

Learning Graphs for Knowledge Transfer With Limited Labels
⭐code🏠project

69.Transfer learning(迁移学习)

域迁移
- Visualizing Adapted Knowledge in Domain Transfer
  ⭐code

68.Crowd Counting(计数)

Learning To Count Everything
⭐code

67.Defect Detection(缺陷检测)

CutPaste: Self-Supervised Learning for Anomaly Detection and Localization

66.Optical Flow Estimation(光流估计)

UPFlow:Upsampling Pyramid for Unsupervised Optical Flow Learning
粗解：8
Learning Optical Flow from a Few Matches
⭐code
Learning optical flow from still images
⭐code🏠project
AutoFlow: Learning a Better Training Set for Optical Flow
🏠project
AutoFlow ：CVPR 2021 Oral ,作者发明了一种专为光流算法训练而设计的数据渲染方法，所训练得到的PWC-Net 与 RAFT光流算法达到了SOTA,代码和数据将开源。
UPFlow: Upsampling Pyramid for Unsupervised Optical Flow Learning
⭐code

65.Style Transfer(风格迁移)

Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes
⭐code
ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows
⭐code
Lipstick ain't enough: Beyond Color Matching for In-the-Wild Makeup Transfer
⭐code
Rethinking and Improving the Robustness of Image Style Transfer
😮oral
解读：CVPR2021 最佳论文候选—提高图像风格迁移的鲁棒性
Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
⭐code
Style-Aware Normalized Loss for Improving Arbitrary Style Transfer
😮oral
In the Light of Feature Distributions: Moment Matching for Neural Style Transfer
⭐code🏠project
ArtCoder: An End-to-End Method for Generating Scanning-Robust Stylized QR Codes
Adaptive Convolutions for Structure-Aware Style Transfer
Learning To Warp for Style Transfer
⭐code
Single-Shot Freestyle Dance Reenactment
CT-Net: Complementary Transfering Network for Garment Transfer With Arbitrary Geometric Changes
DualAST: Dual Style-Learning Networks for Artistic Style Transfer
⭐code
What Can Style Transfer and Paintings Do For Model Robustness?
⭐code
运动迁移
- Autoregressive Stylized Motion Synthesis with Generative Flow

64.Speech processing(语音处理)

Can audio-visual integration strengthen robustness under multimodal attacks?
⭐code
Robust Audio-Visual Instance Discrimination
立体音频生成
- Visually Informed Binaural Audio Generation without Binaural Audios
  ⭐code🏠project📺video
视听分离
声音-视频解析
- Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
A-V
- Positive Sample Propagation Along the Audio-Visual Event Line
  ⭐code
语音人脸关联
- Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association

63.Image Processing(图像处理)

图像信号处理
- Invertible Image Signal Processing
  ⭐code🏠project
光谱重建
- Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB
  😮oral

62.Free-Hand Sketches(手绘草图识别)

Cloud2Curve: Generation and Vectorization of Parametric Sketches

61.算法

因果推理算法
- ACRE: Abstract Causal REasoning Beyond Covariation
  ⭐code🏠project
抽象时空推理算法
- Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
  ⭐code🏠project

60. SLAM/AR/机器人

Tangent Space Backpropagation for 3D Transformation Groups
⭐code
视觉里程计
- Generalizing to the Open World: Deep Visual Odometry with Online Adaptation
机器人
- Visual Room Rearrangement
  😮oral🏠project📺video
- GATSBI: Generative Agent-centric Spatio-temporal Object Interaction
  😮oral⭐code📺video
- DexYCB: A Benchmark for Capturing Hand Grasping of Objects
  ⭐code🏠project📺video
- ContactOpt: Optimizing Contact to Improve Grasps
  ⭐code
  机器人手抓取
- ManipulaTHOR: A Framework for Visual Object Manipulation
  😮oral⭐code🏠project📺video
- 视觉导航
  - Pushing it out of the Way: Interactive Visual Navigation
    🏠project📺video
  - Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation
    🏠project📺video
AR

59.Capsule Network(胶囊网络)(深度学习模型)

Dynamic Slimmable Network
😮oral⭐code
Towards Evaluating and Training Verifiably Robust Neural Networks
😮oral⭐code
Activate or Not: Learning Customized Activation
⭐code
粗解：4
解读：CVPR 2021 | 自适应激活函数ACON: 统一ReLU和Swish的新范式
DISCO: Dynamic and Invariant Sensitive Channel Obfuscation for Deep Neural Networks
⭐code
Capsule Network(胶囊网络)
- Capsule Network Is Not More Robust Than Convolutional Network

58.Metric Learning(度量学习/相似度学习)

Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales
⭐code
Embedding Transfer with Label Relaxation for Improved Metric Learning
Noise-resistant Deep Metric Learning with Ranking-based Instance Selection
⭐code
Unsupervised Hyperbolic Metric Learning
Deep Compositional Metric Learning
⭐code
SLADE: A Self-Training Framework for Distance Metric Learning
Asymmetric Metric Learning for Knowledge Transfer
⭐code
Relative Order Analysis and Optimization for Unsupervised Deep Metric Learning

57.Sign Language Recognition(手语识别)

Skeleton Based Sign Language Recognition Using Whole-body Keypoints
⭐code
Read and Attend: Temporal Localisation in Sign Language Videos
🏠project
Fingerspelling Detection in American Sign Language
手语翻译
- Improving Sign Language Translation with Monolingual Data by Sign Back-Translation
  🌻dataset

56.Computational Photography(光学、几何、光场成像、计算摄影)

Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging
⭐code🏠project
Mask-ToF: Learning Microlens Masks for Flying Pixel Correction in Time-of-Flight Imaging
🏠project
Passive Inter-Photon Imaging
😮oral
Shape and Material Capture at Home
⭐code🏠project
Event-based Synthetic Aperture Imaging with a Hybrid Network
分享会
High-Speed Image Reconstruction Through Short-Term Plasticity for Spiking Cameras
Leveraging the Availability of Two Cameras for Illuminant Estimation
相机姿势
室内照明估计
- Indoor Lighting Estimation Using an Event Camera
Phase Retrieval相位恢复算法
- Physics-Based Iterative Projection Complex Neural Network for Phase Retrieval in Lensless Microscopy Imaging

55.Graph Matching(图匹配)

Deep Graph Matching under Quadratic Constraint
⭐code

54.Emotion Perception(情绪感知/情感预测)

Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality
🏠project
Human Multimodal Emotion Recognition(人类多模态情感识别)
- Progressive Modality Reinforcement for Human Multimodal Emotion Recognition From Unaligned Multimodal Sequences

53.Dataset(数据集)

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
🌻dataset
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
🏠project
Benchmarking Representation Learning for Natural World Image Collections
🌻dataset
SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data
😮oral🌻dataset
Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback
🌻dataset
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
🌻dataset📺video
人脸图像修饰数据集
PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency
⭐code
室外场景
- OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets
  😮oral🌻dataset🏠project
视觉艺术
- ArtEmis: Affective Language for Visual Art
  🏠project主页中包含全部：数据集、代码、视频等
UGC 视频质量评估
- Rich Features for Perceptual Quality Assessment of UGC Videos
  🌻dataset
室内定位数据集
- Large-Scale Localization Datasets in Crowded Indoor Spaces
  🌻dataset
- Zillow Indoor Dataset: Annotated Floor Plans With 360deg Panoramas and 3D Room Layouts
数据集(人类意图研究)
- Intentonomy: A Dataset and Study Towards Human Intent Understanding
  😮oral⭐code
人脸识别数据集
- Virtual Fully-Connected Layer: Training a Large-Scale Face Recognition Dataset With Limited Computational Resources
  ⭐code
视觉属性预测数据集
- Learning To Predict Visual Attributes in the Wild
  🌻dataset
数据集(Object-Centric Videos)
- Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild With Pose Annotations
  🌻dataset
视频场景解析
- VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
  🌻dataset🏠project
数据集（手语）
- How2Sign: A Large-Scale Multimodal Dataset for Continuous American Sign Language
  🌻dataset

52. Image Generation/Synthesis(图像生成)

Spatially-Adaptive Pixelwise Networks for Fast Image Translation
🏠project
采用超网络和隐式函数，极快的图像到图像翻译速度（比基线快18倍）
Image Generators with Conditionally-Independent Pixel Synthesis
😮oral⭐code

Im2Vec: Synthesizing Vector Graphics without Vector Supervision
😮oral⭐code🏠project
Context-Aware Layout to Image Generation with Enhanced Object Appearance
⭐code
Adversarial Generation of Continuous Images
⭐code
StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis
IMAGINE: Image Synthesis by Image-Guided Model Inversion
SSN: Soft Shadow Network for Image Compositing
Mask-Embedded Discriminator With Region-Based Semantic Regularization for Semi-Supervised Class-Conditional Image Synthesis
Learning Semantic Person Image Generation by Region-Adaptive Normalization
⭐code
MUST-GAN: Multi-Level Statistics Transfer for Self-Driven Person Image Generation
Combining Semantic Guidance and Deep Reinforcement Learning for Generating Human Level Paintings
⭐code
Diverse Semantic Image Synthesis via Probability Distribution Modeling
⭐code
Mol2Image: Improved Conditional Flow Models for Molecule to Image Synthesis

51.Contrastive Learning(对比学习)

AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries
⭐code
解读:CVPR 2021接收论文：AdCo基于对抗的对比学习
LAFEAT: Piercing Through Adversarial Defenses with Latent Features
😮oral⭐code
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
⭐code
Mining Better Samples for Contrastive Learning of Temporal Correspondence
Jo-SRC: A Contrastive Approach for Combating Noisy Labels
⭐code
Neighborhood Contrastive Learning for Novel Class Discovery

50.OCR

Fourier Contour Embedding for Arbitrary-Shaped Text Detection
Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter
Sequence-to-Sequence Contrastive Learning for Text Recognition
A Multiplexed Network for End-to-End, Multilingual OCR
TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption
场景文本检测
手写文本识别
- MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition
文本分割
- Rethinking Text Segmentation: A Novel Dataset and a Text-Specific Refinement Approach
  ⭐code
视频文本检测
- Semantic-Aware Video Text Detection
文本检测
- Self-Attention Based Text Knowledge Mining for Text Detection
  ⭐code

49.Adversarial Learning(对抗学习)

Simulating Unknown Target Models for Query-Efficient Black-box Attacks
⭐code
黑盒对抗攻击
Delving into Data: Effectively Substitute Training for Black-box Attack
基于高效训练替代模型的黑盒攻击方法
解读：8
LiBRe: A Practical Bayesian Approach to Adversarial Detection
⭐code

Invisible Perturbations: Physical Adversarial Examples Exploiting the Rolling Shutter Effect
Enhancing the Transferability of Adversarial Attacks Through Variance Tuning
⭐code
Natural Adversarial Examples
⭐code
SurFree: A Fast Surrogate-Free Black-Box Attack
⭐code
Regularizing Neural Networks via Adversarial Model Perturbation
⭐code
Adversarial Imaging Pipelines
MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation
Universal Spectral Adversarial Attacks for Deformable Shapes
Adversarial Robustness Across Representation Spaces
⭐code
Protecting Intellectual Property of Generative Adversarial Networks From Ambiguity Attacks
⭐code
Dual Attention Suppression Attack: Generate Adversarial Camouflage in Physical World
😮oral⭐code
Learning Compositional Representation for 4D Captures with Neural ODE
对抗攻击
- Adversarial Laser Beam: Effective Physical-World Attack to DNNs in a Blink

48.Image Representation(图像表示)

Learning Continuous Image Representation with Local Implicit Image Function
😮oral⭐code🏠project📺video

47.Vision-Language(视觉语言)

Structured Scene Memory for Vision-Language Navigation
⭐code

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
⭐code
UC2: Universal Cross-Lingual Cross-Modal Vision-and-Language Pre-Training
VinVL: Revisiting Visual Representations in Vision-Language Models
⭐code
Connecting What To Say With Where To Look by Modeling Human Attention Traces
⭐code🏠project
Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval
VLN BERT: A Recurrent Vision-and-Language BERT for Navigation
😮oral⭐code
Transitional Adaptation of Pretrained Models for Visual Storytelling
Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation
⭐code
Causal Attention for Vision-Language Tasks
⭐code

46.Human-Object Interaction(人物交互)

Learning Asynchronous and Sparse Human-Object Interaction in Videos
QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information
⭐code
Reformulating HOI Detection as Adaptive Set Prediction
⭐code

Detecting Human-Object Interaction via Fabricated Compositional Learning
⭐code
Affordance Transfer Learning for Human-Object Interaction Detection
⭐code
Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection
⭐code
Hierarchical Video Prediction Using Relational Layouts for Human-Object Interactions

45.Camera Localization(相机定位)

Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments
😮oral
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
⭐code
Learning Camera Localization via Dense Scene Matching
⭐code
Privacy Preserving Localization and Mapping From Uncalibrated Cameras

视觉定位
- VS-Net: Voting with Segmentation for Visual Localization
  ⭐code🏠project📺video

44. Image/video Captioning(图像/视频字幕)

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
⭐code🏠project📺video
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
视频字幕、视频问答和视频对话任务的多模式框架
Open-book Video Captioning with Retrieve-Copy-Generate Network

图像字幕

43.Active Learning(主动学习)

Vab-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning

Task-Aware Variational Adversarial Active Learning
⭐code

42.Scene Flow Estimation(场景流估计)

场景流估计

41. Representation Learning(表示学习（图像+字幕）)

VirTex: Learning Visual Representations from Textual Annotations
⭐code
Exploring Simple Siamese Representation Learning
😮oral⭐code
Representation Learning via Global Temporal Alignment and Cycle-Consistency
⭐code

SelfDoc: Self-Supervised Document Representation Learning
CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models
Unsupervised Hyperbolic Representation Learning via Message Passing Auto-Encoders
⭐code
Boosting Video Representation Learning With Multi-Faceted Integration

40.Superpixel (超像素)

Learning the Superpixel in a Non-iterative and Lifelong Manner
⭐code

39.Debiasing(去偏见)

Fair Attribute Classification through Latent Space De-biasing
⭐code🏠project
Reducing Domain Gap by Reducing Style Bias
⭐code

偏差矫正
- EnD: Entangling and Disentangling deep representations for bias correction
  ⭐code

38.Class-Incremental learning(类增量学习)

IIRC: Incremental Implicitly-Refined Classification
🏠project
Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning
⭐code
DER: Dynamically Expandable Representation for Class Incremental Learning
⭐code
Distilling Causal Effect of Data in Class-Incremental Learning
⭐code
Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning

Adaptive Aggregation Networks for Class-Incremental Learning
⭐code
增量学习

37. Continual Learning(持续学习)

Training Networks in Null Space for Continual Learning
😮oral⭐code

Efficient Feature Transformations for Discriminative and Generative Continual Learning
Rainbow Memory: Continual Learning with a Memory of Diverse Samples
Rectification-based Knowledge Retention for Continual Learning
Layerwise Optimization by Gradient Decomposition for Continual Learning
Continual Learning via Bit-Level Information Preserving
⭐code
Training Networks in Null Space of Feature Covariance for Continual Learning
😮oral
ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-Supervised Continual Learning

36.Action Detection and Recognition(动作检测与识别)

Coarse-Fine Networks for Temporal Activity Detection in Videos
⭐code
3D CNNs with Adaptive Temporal Feature Resolutions
⭐code🏠project
Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack
📺video
BASAR:Black-box Attack on Skeletal Action Recognition
🏠project📺video
解读：对抗攻防新方向：动作识别算法容易被攻击！
TDN: Temporal Difference Networks for Efficient Action Recognition
⭐code
ACTION-Net: Multipath Excitation for Action Recognition
⭐code
解读：CVPR 2021 | 用于动作识别，即插即用、混合注意力机制的 ACTION 模块
解读：CVPR 2021 ｜针对强时序依赖，即插即用、混合注意力机制的 ACTION 模块
No frame left behind: Full Video Action Recognition

Recognizing Actions in Videos from Unseen Viewpoints
Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories
Motion Representations for Articulated Animation
⭐code🏠project📺video
Home Action Genome: Cooperative Compositional Action Understanding
Anticipating human actions by correlating past with the future with Jaccard similarity measures
Graph-Based High-Order Relation Modeling for Long-Term Action Recognition
Representing Videos As Discriminative Sub-Graphs for Action Recognition
Three Birds with One Stone: Multi-Task Temporal Action Detection via Recycling Temporal Annotations
Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization
⭐code
Spatio-temporal Contrastive Domain Adaptation for Action Recognition
Deep Analysis of CNN-Based Spatio-Temporal Representations for Action Recognition
⭐code
Semi-Supervised Action Recognition With Temporal Contrastive Learning
⭐code🏠project📺video
WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos
BABEL: Bodies, Action and Behavior With English Labels
⭐code🏠project📺video
动作定位
- Few-Shot Transformation of Common Actions into Time and Space
  ⭐code
时序动作定位
- Modeling Multi-Label Action Dependencies for Temporal Action Localization
  😮oral⭐code
  提出基于注意力的网络架构来学习视频中的动作依赖性，用于解决多标签时间动作定位任务。
- The Blessings of Unlabeled Background in Untrimmed Videos
  ⭐code
- Temporal Context Aggregation Network for Temporal Action Proposal Refinement
- Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
  基于显著边界特征学习的无锚框时序动作定位
  解读：10
- CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
- Action Unit Memory Network for Weakly Supervised Temporal Action Localization
- Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization
  ⭐code
- Uncertainty Guided Collaborative Training for Weakly Supervised Temporal Action Detection
Video Actor Segmentation
- Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
动作分割
- Learning To Segment Actions From Visual and Language Instructions via Differentiable Weak Sequence Alignment
- 时序动作分割
  - Temporal Action Segmentation from Timestamp Supervision
    ⭐code
  - Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation
    ⭐code
- 无监督动作分割
  - Action Shuffle Alternating Learning for Unsupervised Action Segmentation
- 监督动作分割
- Anchor-Constrained Viterbi for Set-Supervised Action Segmentation
- 视频动作分割
  - Global2Local: Efficient Structure Search for Video Action Segmentation
    ⭐code
    从全局到局部：面向视频动作分割的高效网络结构搜索
    解读：19
Video Moment Localization(视频时刻定位)
- Structured Multi-Level Interaction Network for Video Moment Localization via Language Query
时空事件定位
- Multi-Shot Temporal Event Localization: A Benchmark
  ⭐code🏠project

35.Image Clustering(图像聚类)

Improving Unsupervised Image Clustering With Robust Learning
⭐code
利用鲁棒学习改进无监督图像聚类技术
Jigsaw Clustering for Unsupervised Visual Representation Learning
😮oral⭐code
COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction
⭐code

34.Image Classification(图像分类)

Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
⭐code
Differentiable Patch Selection for Image Recognition
⭐code
Achieving Robustness in Classification Using Optimal Transport With Hinge Regularization
Are Labels Always Necessary for Classifier Accuracy Evaluation?

细粒度分类
- Fine-grained Angular Contrastive Learning with Coarse Labels
  😮oral
  ⭐code
  使用自监督进行 Coarse Labels（粗标签）的细粒度分类方面的工作。粗标签与细粒度标签相比，更容易和更便宜，因为细粒度标签通常需要域专家。
- Graph-based High-Order Relation Discovery for Fine-grained Recognition
  基于特征间高阶关系挖掘的细粒度识别方法
  解读：20
- Few-Shot Classification with Feature Map Reconstruction Networks
  ⭐code📺video
- A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification
  😮oral
- GLAVNet: Global-Local Audio-Visual Cues for Fine-Grained Material Recognition
- Learning Deep Classifiers Consistent With Fine-Grained Novelty Detection
- Your "Flamingo" is My "Bird": Fine-Grained, or Not
  😮oral⭐code
- Discrimination-Aware Mechanism for Fine-Grained Representation Learning
- Neural Prototype Trees for Interpretable Fine-grained Image Recognition
  ⭐code
图像分类
半监督图像分类
- SimPLE: Similar Pseudo Label Exploitation for Semi-Supervised Classification
  ⭐code
视觉识别
- Fair Feature Distillation for Visual Recognition
- 长尾视觉识别
物体分类
- Object Classification From Randomized EEG Trials
Nearest Neighbor Matching(最近邻匹配)
- Nearest Neighbor Matching for Deep Clustering
  ⭐code
OOD检测

33.6D Pose Estimation(6D位姿估计)

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
😮oral⭐code
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation
⭐code
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
😮oral⭐code
Wide-Depth-Range 6D Object Pose Estimation in Space
⭐code
DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency

Single-view robot pose and joint angle estimation via render & compare
😮oral⭐code🏠project📺video
Keypoint-Graph-Driven Learning Framework for Object Pose Estimation
StablePose: Learning 6D Object Poses From Geometrically Stable Patches

32.View Synthesis(视图合成)

ID-Unet: Iterative Soft and Hard Deformation for View Synthesis
😮oral⭐code
NeX: Real-time View Synthesis with Neural Basis Expansion
😮oral🏠project📺video
利用神经基础扩展的实时视图合成技术
Layout-Guided Novel View Synthesis from a Single Indoor Panorama
⭐code
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes
🏠project
Stable View Synthesis
⭐code

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes
🏠project📺video

31.Open-Set Recognition(开放集识别)

Counterfactual Zero-Shot and Open-Set Visual Recognition
⭐code
Few-shot Open-set Recognition by Transformation Consistency
Learning Placeholders for Open-Set Recognition
😮oral

30.Neural rendering(神经渲染)

DeRF: Decomposed Radiance Fields
🏠project
D-NeRF: Neural Radiance Fields for Dynamic Scenes
🏠project

Neural Lumigraph Rendering
🌻dataset🏠project📺video
斯坦福大学
AutoInt: Automatic Integration for Fast Neural Volume Rendering
😮oral🏠project📺video
斯坦福大学
pixelNeRF: Neural Radiance Fields from One or Few Images
⭐code🏠project📺video
IBRNet: Learning Multi-View Image-Based Rendering
🏠project
备注：有学者评论pixelNeRF和IBRNet的工作思想相近，但IBRNet似乎更加成熟。
Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans
⭐code🏠project📺video
浙大等学者发明的Neural Body算法，输入多角度视频可输出3D人体和新角度视图。
NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis
🏠project📺video
在任意照明条件下，根据一组输入图像生成完整的3D场景
Self-Supervised Visibility Learning for Novel View Synthesis
⭐code
STaR: Self-Supervised Tracking and Reconstruction of Rigid Objects in Motion With Neural Rendering
⭐code🏠project📺video
Pulsar: Efficient Sphere-Based Neural Rendering
Learning Compositional Radiance Fields of Dynamic Human Heads
😮oral🏠project
NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
Neural Geometric Level of Detail: Real-Time Rendering With Implicit 3D Shapes
⭐code🏠project
Space-Time Neural Irradiance Fields for Free-Viewpoint Video
🏠project📺video
Neural Scene Graphs for Dynamic Scenes
😮oral🏠project📺video
NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

29.Human Pose Estimation(人体姿态估计)

Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration
⭐code
Monocular Real-time Full Body Capture with Inter-part Correlations
📺video
在电影动作特效中，人体运动捕捉是关键技术，高质量的捕捉往往需要特殊设备，而如果能使用普通RGB相机进行运动捕捉，将会使人人都是特效师。该视频来自清华、马普所等单位的学者发表于CVPR2021的论文结果，使用单目RGB相机的动作捕捉。
Behavior-Driven Synthesis of Human Dynamics
⭐code🏠project
Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation
⭐code
粗解：2
Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression
⭐code
SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks
😮oral🏠project
On Self-Contact and Human Pose
🏠project
Lite-HRNet: A Lightweight High-Resolution Network
⭐code
解读：Lite-HRNet：轻量级HRNet，FLOPs大幅下降
Deep Dual Consecutive Network for Human Pose Estimation
⭐code
3D Human Action Representation Learning via Cross-View Consistency Pursuit
⭐code
Body Meshes as Points
⭐code
Unsupervised Human Pose Estimation through Transforming Shape Templates
🏠project
When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks
Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking

3D手部重建
- Model-based 3D Hand Reconstruction via Self-Supervised Learning
  ⭐code📺video
人体运动迁移
- Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling
  ⭐code📺video
Human Volumetric Capture
- POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture
  😮oral🏠project
- High-Fidelity Neural Human Motion Transfer from Monocular Video
3D人体姿态估计
- CanonPose: Self-supervised Monocular 3D Human Pose Estimation in the Wild
  ⭐code
- Context Modeling in 3D Human Pose Estimation: A Unified Perspective
- PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers
  ⭐code📺video
  通过消除 location-dependent 透视效果来改进3D人体姿势估计技术工作。
- Graph Stacked Hourglass Networks for 3D Human Pose Estimation
- Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors
  😮oral🏠project
- SimPoE: Simulated Character Control for 3D Human Pose Estimation
  😮oral🏠project
- Reconstructing 3D Human Pose by Watching Humans in the Mirror
  😮oral⭐code🏠project
- Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo
  ⭐code
- PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation
  😮oral⭐code
- AGORA: Avatars in Geography Optimized for Regression Analysis
  🏠project
- Intelligent Carpet: Inferring 3D Human Pose From Tactile Signals
- HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation
  ⭐code
- Neural Descent for Visual 3D Human Pose and Shape
- Probabilistic 3D Human Shape and Pose Estimation from Multiple Unconstrained Images in the Wild
动物姿态估计
- From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation
  😮oral⭐code📺video
3D人体网格配准
- Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration
  ⭐code🏠project📺video
多人人体重建
- Multi-person Implicit Reconstruction from a Single Image
3D人体运动
- We are More than Our Joints: Predicting how 3D Bodies Move
  🏠project📺video
  分享会
人体运动捕捉
- Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors
  😮oral🏠project📺video
- ChallenCap: Monocular 3D Capture of Challenging Human Performances Using Multi-Modal References
多人姿态估计
- FCPose: Fully Convolutional Multi-Person Pose Estimation with Dynamic Instance-Aware Convolutions
  ⭐code
  FCPose，无 ROI 和无分组的端到端可训练人体姿势估计器可以达到更好的准确性和速度，在 COCO 数据集上，使用 DLA-34 主干的 FCPose 实时版本比 Mask R-CNN（ResNet-101）快 4.5 倍（41.67FPS vs. 9.26FPS），同时实现了性能的提高。与最近的自上而下和自下而上的方法相比，FCPose 还实现了更好的速度/准确度权衡。
- Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks
  ⭐code
手-物交互姿态估计
- Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time
  ⭐code🏠project📺video
人体关键点检测
- Regressive Domain Adaptation for Unsupervised Keypoint Detection
  ⭐code
3D人体形状
- LEAP: Learning Articulated Occupancy of People
  ⭐code🏠project📺video
- Beyond Static Features for Temporally Consistent 3D Human Pose and Shape From a Video
  ⭐code📺video
人体动画（姿势迁移）
- Pose-Guided Human Animation From a Single Image in the Wild
基于人体感应的3D健身训练自动系统
- AIFit: Automatic 3D Human-Interpretable Feedback Models for Fitness Training
  🏠project
三维人体运动
- Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes
  ⭐code🏠project📺video
三维人体重建
- StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision
  ⭐code🏠project
手势到手势翻译
- Model-Aware Gesture-to-Gesture Translation
3D人体运动预测
- Towards Accurate 3D Human Motion Prediction From Incomplete Observations
手势识别
- Body2Hands: Learning To Infer 3D Hands From Conversational Gesture Body Dynamics
  ⭐code🏠project📺video
三维人体网格重建
- Holistic 3D Human and Scene Mesh Estimation From Single View Images
微观手势情感分析
- iMiGUE: An Identity-Free Video Dataset for Micro-Gesture Understanding and Emotion Analysis
  ⭐code
Dense Human Correspondences
- HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences
  ⭐code🏠project📺video

28.Dense prediction(密集预测)

Densely connected multidilated convolutional networks for dense prediction tasks
提出的D3Net在语义分割&音乐源分离任务上的表现优于SOTA网络
Dense Contrastive Learning for Self-Supervised Visual Pre-Training
😮oral⭐code

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning
⭐code
Densely Connected Multi-Dilated Convolutional Networks for Dense Prediction Tasks
⭐code

27.Semantic Line Detection(语义线检测)

Harmonious Semantic Line Detection via Maximal Weight Clique Selection
⭐code

26.Video Processing(视频相关技术)

Skip-Convolutions for Efficient Video Processing
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples
⭐code
Learning by Aligning Videos in Time
Hierarchical Motion Understanding via Motion Programs
🏠project📺video
Stochastic Image-to-Video Synthesis using cINNs
⭐code🏠project
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions
🏠project
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Learning To Reconstruct High Speed and High Dynamic Range Videos From Events
视频摘要
- Learning Discriminative Prototypes with Dynamic Time Warping
  ⭐code
- Learning Triadic Belief Dynamics in Nonverbal Communication from Videos
  😮oral⭐code
视频编解码
- MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing
  ⭐code
- FVC: A New Framework towards Deep Video Compression in Feature Space
  😮oral
- Memory-Efficient Network for Large-Scale Video Compressive Sensing
  ⭐code
- Deep Learning in Latent Space for Video Prediction and Compression
  ⭐code
视频插帧
- FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
  ⭐code🏠project
- Deep Animation Video Interpolation in the Wild
  ⭐code
- TimeLens: Event-based Video Frame Interpolation
  ⭐code🌻dataset📺video
- Time Lens: Event-based Video Frame Interpolation
  ⭐code🏠project📺video
视频语言学习（video-and-language learning）
- Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling
  😮oral⭐code
视频预测
- Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
  🏠project📺video
- Learning Semantic-Aware Dynamics for Video Prediction
- Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning
  ⭐code
  解读：引入记忆模块，突破长距离依赖视频预测的性能瓶颈
- Learning Goals from Failure
  ⭐code🏠project
- MotionRNN: A Flexible Model for Video Prediction With Spacetime-Varying Motions
视频理解
- Context-aware Biaffine Localizing Network for Temporal Sentence Grounding
  ⭐code
- Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
  🏠project
- Visual Semantic Role Labeling for Video Understanding
  🏠project
- Temporal Query Networks for Fine-grained Video Understanding
  😮oral🏠project
- Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
- FrameExit: Conditional Early Exiting for Efficient Video Recognition
  😮oral
- Towards Long-Form Video Understanding
视频缩放
- Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling
  ⭐code🏠project
视频异常检测
- MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection
- Learning Normal Dynamics in Videos With Meta Prototype Network
  ⭐code
  又好又快的视频异常检测，引入元学习的动态原型学习组件
- Anomaly Detection in Video via Self-Supervised and Multi-Task Learning
视频声源定位
- Localizing Visual Sounds the Hard Way
  ⭐code🏠project
视频分析
- Self-Supervised Learning for Semi-Supervised Temporal Action Proposal
  ⭐code
视频生成
- Playable Video Generation
  😮oral⭐code🏠project📺video
- One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing
  😮oral⭐code🏠project📺video
  解读：颠覆视频压缩的不一定是新压缩算法，而可能是GAN！英伟达新算法最高压缩90%流量
  Nvidia的新研究，使用人脸关键点+GAN重建视频通话，相比传统的H.264节省90%流量。代码未开源，但英伟达的GAN框架开源了。
视频视角切换
- Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos
Action Selection Learning
- Weakly Supervised Action Selection Learning in Video
  ⭐code
视频描述
- Towards Diverse Paragraph Captioning for Untrimmed Videos
  ⭐code
视频分类
- Over-the-Air Adversarial Flickering Attacks Against Video Recognition Networks
  ⭐code
视频字幕
- Sketch, Ground, and Refine: Top-Down Dense Video Captioning
  ⭐code
Video Grounding
- Cascaded Prediction Network via Segment Tree for Temporal Video Grounding
- Interventional Video Grounding With Dual Contrastive Learning
视频修复
- Progressive Temporal Feature Alignment Network for Video Inpainting
  ⭐code
  作者提出 Progressive Temporal Feature Alignment Network，利用光流从相邻帧中提取的特征逐步丰富当前帧的特征。纠正了时空特征传播阶段的 spatial misalignment，极大地提高了 inpainted videos 的视觉质量和时空一致性。在 DAVIS 和 FVI 数据集上实现了与现有深度学习方法相比的最先进性能。
- Restore From Restored: Video Restoration With Pseudo Clean Video
  ⭐code
视频去模糊化
- Gated Spatio-Temporal Attention-Guided Video Deblurring
视频去噪
- Efficient Multi-Stage Video Denoising With Recurrent Spatio-Temporal Fusion
视频质量评估
- Patch-VQ: 'Patching Up' the Video Quality Problem
  🏠project
视频动作计数
- Repetitive Activity Counting by Sight and Sound
  ⭐code📺video
视频稳定
3D Video Stabilization With Depth Estimation by CNN-Based Optimization
📺video
Real-Time Selfie Video Stabilization
⭐code
视频去雨
- Self-Aligned Video Deraining With Transmission-Depth Consistency
- Semi-Supervised Video Deraining With Dynamical Rain Generator
  ⭐code
video looping technique
- Animating Pictures with Eulerian Motion Fields
  🏠project📺video
视频识别
- 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition
- MoViNets: Mobile Video Networks for Efficient Video Recognition
  ⭐code
行为识别
- Multi-Label Activity Recognition using Activity-specific Features and Activity Correlations
视频表征学习
- Spatiotemporal Contrastive Video Representation Learning
  ⭐code
- Removing the Background by Adding the Background: Towards Background Robust Self-Supervised Video Representation Learning
视频编码
- Deep Perceptual Preprocessing for Video Coding

25.3D(三维视觉)

A Deep Emulator for Secondary Motion of 3D Characters
😮oral🏠project
Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction
😮oral🏠project📺video
Deep Implicit Templates for 3D Shape Representation
😮oral⭐code🏠project📺video
CVPR 2021 Oral，清华学者提出Deep Implicit Templates，极大扩展DIF能力
SMPLicit: Topology-aware Generative Model for Clothed People
🏠project
Picasso: A CUDA-based Library for Deep Learning over 3D Meshes
⭐code
Semi-supervised Synthesis of High-Resolution Editable Textures for 3D Humans

RGB-D Local Implicit Function for Depth Completion of Transparent Objects
🏠project
Deep Two-View Structure-from-Motion Revisited
Deformed Implicit Field: Modeling 3D Shapes with Learned Dense Correspondence
S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
Deep Polarization Imaging for 3D Shape and SVBRDF Acquisition
😮oral🏠project📺video
Learning Feature Aggregation for Deep 3D Morphable Models
⭐code
Plan2Scene: Converting Floorplans to 3D Scenes
⭐code🏠project📺video
View Generalization for Single Image Textured 3D Models
🏠project📺video
Mirror3D: Depth Refinement for Mirror Surfaces
⭐code🏠project
Learning To Recover 3D Scene Shape From a Single Image
⭐code
Normal Integration via Inverse Plane Fitting With Minimum Point-to-Plane Distance
⭐code
Shelf-Supervised Mesh Prediction in the Wild
🏠project
Unsupervised Learning of 3D Object Categories From Videos in the Wild
DeepVideoMVS: Multi-View Stereo on Video With Recurrent Spatio-Temporal Fusion
⭐code📺video
NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go
Learning Monocular 3D Reconstruction of Articulated Categories From Motion
⭐code🏠project
Deep Active Surface Models
Neural Splines: Fitting 3D Surfaces With Infinitely-Wide Neural Networks
😮oral⭐code
Learning View Selection for 3D Scenes
StruMonoNet: Structure-Aware Monocular 3D Prediction
Physically-Aware Generative Network for 3D Shape Modeling
Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach
DeepSurfels: Learning Online Appearance Fusion
⭐code🏠project📺video
深度估计
- PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss
- Beyond Image to Depth: Improving Depth Prediction using Echoes
  ⭐code🏠project
- Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos
  😮oral⭐code🏠project📺video
- LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering
  😮oral⭐code🏠project
- S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation
  😮oral
- Depth Completion with Twin Surface Extrapolation at Occlusion Boundaries
  ⭐code
- Self-supervised Learning of Depth Inference for Multi-view Stereo
  ⭐code
- SMD-Nets: Stereo Mixture Density Networks
  ⭐code
- The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth
  ⭐code
- Single Image Depth Estimation using Wavelet Decomposition
  ⭐code
- Differentiable Diffusion for Dense Depth Estimation from Multi-view Images
  ⭐code🏠project📺video
- SliceNet: Deep Dense Depth Estimation From a Single Indoor Panorama Using a Slice-Based Representation
- AdaBins: Depth Estimation Using Adaptive Bins
- Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion
  ⭐code
- S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation
- LED2-Net: Monocular 360deg Layout Estimation via Differentiable Depth Rendering
  😮oral⭐code🏠project
- Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks
- Robust Consistent Video Depth Estimation
  🏠project📺video
- 单目深度估计
  - Monocular Depth Estimation via Listwise Ranking Using the Plackett-Luce Model
  - Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging
    ⭐code🏠project📺video
  - 3D Packing for Self-Supervised Monocular Depth Estimation
    😮oral⭐code
- 深度预测
  - Single Image Depth Prediction With Wavelet Decomposition
    ⭐code
三维重建
- Deep Implicit Moving Least-Squares Functions for 3D Reconstruction
  ⭐code
- Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction
  🏠project
- Learning Parallel Dense Correspondence from Spatio-Temporal Descriptors for Efficient and Robust 4D Reconstruction
  ⭐code
- Fostering Generalization in Single-view 3D Reconstruction by Learning a Hierarchy of Local and Global Shape Priors
- NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
  😮oral⭐code🏠project
- Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
  ⭐code🏠project📺video
- CodedStereo: Learned Phase Masks for Large Depth-of-field Stereo
  😮oral
- SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements
  🏠project📺video
- LASR: Learning Articulated Shape Reconstruction from a Monocular Video
  🏠project
- Sketch2Model: View-Aware 3D Modeling from Single Free-Hand Sketches
- Birds of a Feather: Capturing Avian Shape Models from Images
  🏠project📺video
- Multi-view 3D Reconstruction of a Texture-less Smooth Surface of Unknown Generic Reflectance
  ⭐code
- Generative PointNet: Deep Energy-Based Learning on Unordered Point Sets for 3D Generation, Reconstruction and Classification
  ⭐code🏠project
- From Points to Multi-Object 3D Reconstruction
- DI-Fusion: Online Implicit 3D Reconstruction With Deep Priors
  ⭐code
- D2IM-Net: Learning Detail Disentangled Implicit Fields From Single Images
- Residential Floor Plan Recognition and Reconstruction
- Indoor Panorama Planar 3D Reconstruction via Divide and Conquer
- Single-View 3D Object Reconstruction from Shape Priors in Memory
- Deep Optimized Priors for 3D Shape Modeling and Reconstruction
- MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera
  ⭐code🏠project📺video
- PluckerNet: Learn to Register 3D Line Reconstructions
- 三维网格重建
  - Self-Supervised 3D Mesh Reconstruction From Single Images
语义场景补全
- Semantic Scene Completion via Integrating Instances and Scene in-the-Loop
  ⭐code
三维关键点
- KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control
  😮oral⭐code🏠project📺video
三维形状补全
- Unsupervised 3D Shape Completion through GAN Inversion
  ⭐code🏠project
三维形状适配
- Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images
  ⭐code
三维压缩
- Neural 3D Scene Compression via Model Compression
Stereo Matching-立体匹配
- A Decomposition Model for Stereo Matching
Depth Completion-深度补全
- Depth Completion using Plane-Residual Representation
- Radar-Camera Pixel Depth Association for Depth Completion
  ⭐code
三维网格
- DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes With Biharmonic Coordinates
  😮oral⭐code
3D形状
- DECOR-GAN: 3D Shape Detailization by Conditional Refinement
  ⭐code📺oral video📺demo
depth map fusion
- NeuralFusion: Online Depth Fusion in Latent Space
网格重建
- Learning Delaunay Surface Elements for Mesh Reconstruction
  ⭐code
3D morphable model(三维形变模型)
- i3DMM: Deep Implicit 3D Morphable Model of Human Heads
  🏠project📺video

24.Reinforcement Learning(强化学习)

Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph
⭐code🏠project
Unsupervised Learning for Robust Fitting:A Reinforcement Learning Approach
Unsupervised Visual Attention and Invariance for Reinforcement Learning

Unsupervised Learning for Robust Fitting: A Reinforcement Learning Approach
⭐code
Iterative Shrinking for Referring Expression Grounding Using Deep Reinforcement Learning
⭐code

23.Autonomous Driving(自动驾驶)

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition
⭐code
ECCV 2020 Facebook Mapillary Visual Place Recognition Challenge 冠军方案
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
Self-Supervised Pillar Motion Learning for Autonomous Driving
⭐code
Learning by Watching
Binary TTC: A Temporal Geofence for Autonomous Navigation
⭐code📺video

GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving
😮oral🏠project📺video
车道线预测
轨迹预测
人体轨迹预测
交通场景
- SceneGen: Learning To Generate Realistic Traffic Scenes
车辆重识别
- PhD Learning: Learning With Pompeiu-Hausdorff Distances for Video-Based Vehicle Re-Identification
HD map reconstruction
- Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-View Transformation
HD 图生成
- HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps
车辆检测
- Robust Multimodal Vehicle Detection in Foggy Weather Using Complementary Lidar and Radar Signals
  ⭐code
车辆姿态估计
- Exploring intermediate representation for monocular vehicle pose estimation
  ⭐code

22.Medical Imaging(医学影像)

3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management
用纯多模态 CT 影像可替代目前 JHMI 的需要做肿瘤化学检测和 DNA 测序+医学影像的综合多模态诊断流程，从诊断准确度上有可比较性，定量诊断精度更优
Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies
⭐code
肿瘤影像里面智能 PACS 辅助医生读片的重要功能
Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-constrained Optimization
基于CT 影像的骨折/骨质疏松系统
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
⭐code
多机构合作，利用联合学习改进基于深度学习的磁共振图像重建技术
DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images
😮oral⭐code
DeepTag: 一种无监督的深度学习方法，用于心脏标记磁共振图像的运动跟踪
Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles

XProtoNet: Diagnosis in Chest Radiography with Global and Local Explanations
Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation
医学图像分割
医学图像合成
- Brain Image Synthesis with Unsupervised Multivariate Canonical CSCℓ4Net
  😮oral
- Brain Image Synthesis with Unsupervised Multivariate Canonical CSCℓ4Net
手术技能评估
- Towards Unified Surgical Skill Assessment
  ⭐code
微创手术
- Minimally Invasive Surgery for Sparse Neural Networks in Contrastive Manner
放射线报告生成
- A Self-Boosting Framework for Automated Radiographic Report Generation
- Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation
MR图像重建
- MR Image Super-Resolution With Squeeze and Excitation Reasoning Attention Network
- Joint Deep Model-Based MR Image and Coil Sensitivity Reconstruction Network (Joint-ICNet) for Fast MRI
关键点检测与跟踪
- Reciprocal Landmark Detection and Tracking With Extremely Few Annotations
X光检测
- Leveraging Large-Scale Weakly Labeled Data for Semi-Supervised Mass Detection in Mammograms

21.Transformer

Transformer Interpretability Beyond Attention Visualization
⭐code
MIST: Multiple Instance Spatial Transformer Network
⭐code
试图从热图中进行可微的top-K选择(MIST)（目前在自然图像上也有了一些结果；) 用它可以在没有任何定位监督的情况下进行检测和分类（并不是它唯一能做的事情!）
Variational Transformer Networks for Layout Generation
Lesion-Aware Transformers for Diabetic Retinopathy Grading
Gaussian Context Transformer

小样本动作识别
- Temporal-Relational CrossTransformers for Few-Shot Action Recognition
  ⭐code
目标检测
- UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
  😮oral⭐code
- 单样本目标检测
  - Adaptive Image Transformer for One-Shot Object Detection
图像处理
- Pre-Trained Image Processing Transformer
  ⭐code⭐gitee
人机交互
- End-to-End Human Object Interaction Detection with HOI Transformer
  ⭐code
- HOTR: End-to-End Human-Object Interaction Detection with Transformers
  😮oral
图像分割
- 语义分割
  - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
    ⭐code🏠project
    基于Transformers从序列到序列的角度重新思考语义分割
    解读：16
    解读：Transformer 在语义分割中的应用，曾位ADE20K 榜首（44.42% mIoU）
  - Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation
- 视频实例分割
  - VisTR: End-to-End Video Instance Segmentation with Transformers
    😮oral⭐code
- 全景分割
  - MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
跟踪
- Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
  😮oral⭐code
  more:Transformer再蓄力，跟踪任务中创新高，桥接独立帧，跨帧传递时域信息，CVPR 2021 Oral
- Transformer Tracking
  ⭐code
动作预测
- Multimodal Motion Prediction with Stacked Transformers
  ⭐code🏠project📺video
Self-attention自注意力机制
- Scaling Local Self-Attention For Parameter Efficient Visual Backbones
  😮oral
  解读：超越卷积的自注意力模型，谷歌、UC伯克利提出HaloNet
检索
- Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning
  ⭐code
特征匹配
- LoFTR: Detector-Free Local Feature Matching with Transformers
  ⭐code🏠project
姿势识别
- Pose Recognition with Cascade Transformers
  ⭐code
自动驾驶
- Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
  ⭐code
视觉识别
- Bottleneck Transformers for Visual Recognition
Video Hashing
- Self-Supervised Video Hashing via Bidirectional Transformers
  ⭐code
视觉和语言导航
- Topological Planning With Transformers for Vision-and-Language Navigation
人体姿态与网格重建
- End-to-End Human Pose and Mesh Reconstruction with Transformers
  ⭐code
直线段检测
- Line Segment Detection Using Transformers Without Edges
  😮oral⭐code
图像分类
- General Multi-Label Image Classification With Transformers
时序语言定位
- Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos
场景布局
- LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
  ⭐code
面部动作单元检测
- Facial Action Unit Detection With Transformers
高分辨率图像合成
- Taming Transformers for High-Resolution Image Synthesis
  😮oral⭐code

20.Person Re-Identification(人员重识别)

Meta Batch-Instance Normalization for Generalizable Person Re-Identification
⭐code
Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification
Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification
⭐code
Intra-Inter Camera Similarity for Unsupervised Person Re-Identification
⭐code
论文公开
Anchor-Free Person Search
⭐code

Lifelong Person Re-Identification via Adaptive Knowledge Accumulation
⭐code
Group-aware Label Transfer for Domain Adaptive Person Re-identification
⭐code|code
Neural Feature Search for RGB-Infrared Person Re-Identification
Combined Depth Space based Architecture Search For Person Re-identification
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
😮oral
Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos
😮oral
BiCnet-TKS: Learning Efficient Spatial-Temporal Representation for Video Person Re-Identification
⭐code
Generalizable Person Re-identification with Relevance-aware Mixture of Experts
Person30K: A Dual-Meta Generalization Network for Person Re-Identification
Prototype-Guided Saliency Feature Learning for Person Search
UnrealPerson: An Adaptive Pipeline Towards Costless Person Re-Identification
⭐code
Learning to Generalize Unseen Domains via Memory-based Multi-Source Meta-Learning for Person Re-Identification
⭐code
Farewell to Mutual Information: Variational Distillation for Cross-Modal Person Re-Identification
Learning 3D Shape Feature for Texture-Insensitive Person Re-Identification
Partial Person Re-Identification With Part-Part Correspondence Learning
Coarse-To-Fine Person Re-Identification With Auxiliary-Domain Classification and Second-Order Information Bottleneck
Unsupervised Pre-Training for Person Re-Identification
Joint Generative and Contrastive Learning for Unsupervised Person Re-Identification
⭐code📺video
Wide-Baseline Multi-Camera Calibration Using Person Re-Identification
Watching You: Global-Guided Reciprocal Learning for Video-Based Person Re-Identification
⭐code
Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification
Person Re-identification using Heterogeneous Local Graph Attention Networks
Fine-Grained Shape-Appearance Mutual Learning for Cloth-Changing Person Re-Identification
拥挤人群计数
- Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting
  ⭐code🏠project
- Cross-View Cross-Scene Multi-View Crowd Counting
- A Generalized Loss Function for Crowd Counting and Localization
基于 Transformer
- Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer
行人检测
- Variational Pedestrian Detection
- Generalizable Pedestrian Detection: The Elephant in the Room
  ⭐code
行人跟踪
- Tracking Pedestrian Heads in Dense Crowd
  ⭐code🏠project
步态识别
- Cross-View Gait Recognition With Deep Universal Linear Embeddings

19.Quantization/Pruning/Knowledge Distillation/Model Compression(量化、剪枝、蒸馏、模型压缩/扩展与优化)

Learning Student Networks in the Wild
⭐code
ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
⭐code
RepVGG: Making VGG-style ConvNets Great Again
⭐code
Coordinate Attention for Efficient Mobile Network Design
⭐code

剪枝
模型扩展
- Fast and Accurate Model Scaling
  ⭐code
量化
知识蒸馏
可逆神经网络
- Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks
  🏠project
模型压缩
- CDFI: Compression-Driven Network Design for Frame Interpolation
  ⭐code
- Towards Efficient Tensor Decomposition-Based DNN Model Compression With Optimization Framework
模型优化
- Rethinking Channel Dimensions for Efficient Model Design
  ⭐code

18.Aerial/Drones/Satellite/RS Image(航空影像/无人机)

UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
⭐code
SIPSA-Net: Shift-Invariant Pan Sharpening with Moving Object Alignment for Satellite Imagery
⭐code

航空影像分割
- PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
  ⭐code
航空影像检测
- ReDet: A Rotation-equivariant Detector for Aerial Object Detection
  ⭐code
无人机检测
- Dogfight: Detecting Drones from Drones Videos
多视角卫星摄影测量
- Shadow Neural Radiance Fields for Multi-view Satellite Photogrammetry

17.Super-Resolution(超分辨率)

Data-Free Knowledge Distillation For Image Super-Resolution
⭐code
AdderSR: Towards Energy Efficient Image Super-Resolution
⭐code
Cross-MPI: Cross-scale Stereo for Image Super-Resolution using Multiplane Images
🏠project📺video
CVPR 2021，Cross-MPI以底层场景结构为线索的端到端网络，在大分辨率（x8）差距下也可完成高保真的超分辨率
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
⭐code

Robust Reference-based Super-Resolution via C²-Matching
⭐code
GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution
😮oral🏠project
解读：CVPR 2021 Oral | GLEAN: 基于隐式生成库的高倍率图像超分辨率
BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond
⭐code🏠project
Temporal Modulation Network for Controllable Space-Time Video Super-Resolution
⭐code 作者主页
基于时空特征可控插值的视频超分辨率网络
解读：18
Unsupervised Degradation Representation Learning for Blind Super-Resolution
⭐code
SRWarp: Generalized Image Super-Resolution under Arbitrary Transformation
⭐code
MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution
⭐code
作者提出用于 RefSR 的新方法：MASA 网络，包含两个新设计的模块。其中 Match （匹配）和 Extraction（提取）模块大大降低了计算成本。Spatial Adaptation（空间适应）模块用来学习 LR 和 Ref 图像之间的分布差异，并以空间适应的方式将参考特征的分布 remaps（重新映射）为 LR特征的分布。以此更加鲁棒地处理不同的参考图像。
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
Exploring Sparsity in Image Super-Resolution for Efficient Inference
⭐code
Neural Side-by-Side: Predicting Human Preferences for No-Reference Super-Resolution Evaluation
⭐code
Tackling the Ill-Posedness of Super-Resolution Through Adaptive Target Generation
⭐code
LAU-Net: Latitude Adaptive Upscaling Network for Omnidirectional Image Super-Resolution
⭐code
Image Super-Resolution With Non-Local Sparse Attention
Unsupervised Real-World Image Super Resolution via Domain-Distance Aware Training
⭐code
Single Pair Cross-Modality Super Resolution
End-to-End Learning for Joint Image Demosaicing, Denoising and Super-Resolution
Learning Scene Structure Guidance via Cross-Task Knowledge Transfer for Single Depth Super-Resolution
Deep Burst Super-Resolution
Light Field Super-Resolution With Zero-Shot Learning
Fast Bayesian Uncertainty Estimation and Reduction of Batch Normalized Single Image Super-Resolution Network
⭐code🏠project
Practical Single-Image Super-Resolution Using Look-Up Table
⭐code
Interpreting Super-Resolution Networks With Local Attribution Maps
Scene Text Telescope: Text-Focused Scene Image Super-Resolution
盲超分辨
- Learning the Non-Differentiable Optimization for Blind Super-Resolution
- Flow-based Kernel Prior with Application to Blind Super-Resolution
  ⭐code
- KOALAnet: Blind Super-Resolution Using Kernel-Oriented Adaptive Local Adjustment
视频超分辨率
- Space-Time Distillation for Video Super-Resolution
- Turning Frequency to Resolution: Video Super-Resolution via Event Cameras

16.Visual Question Answering(视觉问答)

Counterfactual VQA: A Cause-Effect Look at Language Bias
⭐code
AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
🏠project📺video
Domain-robust VQA with diverse datasets and methods but no target labels
🏠project
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
⭐code
Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing
Roses Are Red, Violets Are Blue... but Should VQA Expect Them To?
🌻dataset
Predicting Human Scanpaths in Visual Question Answering
Separating Skills and Concepts for Novel Visual Question Answering
⭐code
How Transferable Are Reasoning Patterns in VQA?
⭐code🏠project📺video
Explicit Knowledge Incorporation for Visual Reasoning
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Image-Text Matching
- Discrete-Continuous Action Space Policy Gradient-Based Attention for Image-Text Matching
视频问答
交通相关VQA
- SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning Over Traffic Events
  ⭐code

15.GAN

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
⭐code
Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs
⭐code
Efficient Conditional GAN Transfer with Knowledge Propagation across Classes
⭐code
Anycost GANs for Interactive Image Synthesis and Editing
⭐code🏠project📺video
Anycost GAN，可适应广泛的硬件和延迟要求，以及实现交互式图像编辑
TediGAN: Text-Guided Diverse Image Generation and Manipulation
⭐code🏠project📺video
Generative Hierarchical Features from Synthesizing Images
😮oral⭐code🏠project
作者称预训练 GAN 生成器可以当作是一种学习的多尺度损失。用它进行训练可以带来高度竞争的层次化和分离的视觉特征，称之为生成层次化特征（GH-Feat）。并进一步表明，GH-Feat不仅有利于生成性任务，更重要的是有利于分辨性任务，包括人脸验证、关键点检测、layout prediction、迁移学习、style mixing、图像编辑等。
Teachers Do More Than Teach: Compressing Image-to-Image Models
⭐code
PISE: Person Image Synthesis and Editing with Decoupled GAN
⭐code
LOHO: Latent Optimization of Hairstyles via Orthogonalization
⭐code
HumanGAN: A Generative Model of Humans Images
HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms
⭐code
DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
⭐code

pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
😮oral🏠project📺video
更多：斯坦福学者提出周期性隐式生成对抗网络（π-GAN或pi-GAN），用于高质量的3D感知图像合成
斯坦福大学
ReMix: Towards Image-to-Image Translation with Limited Data
Unsupervised Disentanglement of Linear-Encoded Facial Semantics
Content-Aware GAN Compression
Regularizing Generative Adversarial Networks under Limited Data
⭐code🏠project
Where and What? Examining Interpretable Disentangled Representations
⭐code
Few-shot Image Generation via Cross-domain Correspondence
🏠project
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort
😮oral
Surrogate Gradient Field for Latent Space Manipulation
StylePeople: A Generative Model of Fullbody Human Avatars
🏠project
Ensembling with Deep Generative Views
⭐code🏠project
Continuous Face Aging via Self-estimated Residual Age Embedding
Blur, Noise, and Compression Robust Generative Adversarial Networks
Adaptive Weighted Discriminator for Training Generative Adversarial Networks
⭐code
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort
🏠project
House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects
⭐code🏠project
Roof-GAN: Learning To Generate Roof Geometry and Relations for Residential Houses
⭐code
Exploring Adversarial Fake Images on Face Manifold
Hyper-LifelongGAN: Scalable Lifelong Learning for Image Conditioned Generation
GANmut: Learning Interpretable Conditional Space for Gamut of Emotions
⭐code
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation
Positional Encoding As Spatial Inductive Bias in GANs
⭐code🏠project
Partition-Guided GANs
3D Shape Generation With Grid-Based Implicit Functions
Linear Semantics in Generative Adversarial Networks
⭐code🏠project📺video
Cross-Modal Contrastive Learning for Text-to-Image Generation
Lifting 2D StyleGAN for 3D-Aware Face Generation
Unsupervised Learning of Depth and Depth-of-Field Effect From Natural Images With Aperture Rendering Generative Adversarial Networks
😮oral🏠project
Training Generative Adversarial Networks in One Stage
⭐code
Self-Supervised Video GANs: Learning for Appearance Consistency and Motion Coherency
Closed-Form Factorization of Latent Semantics in GANs
😮oral⭐code🏠project📺video
Discovering Interpretable Latent Space Directions of GANs Beyond Binary Attributes
⭐code
Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement
L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing
Spatially-invariant Style-codes Controlled Makeup Transfer
⭐code
无监督图像合成
- Posterior Promoted GAN With Distribution Discriminator for Unsupervised Image Synthesis
  ⭐code
图像到图像翻译
- Memory-guided Unsupervised Image-to-image Translation
- Image-to-image Translation via Hierarchical Style Disentanglement
  😮oral⭐code
  在图像到图像翻译上实现层次风格解耦
- CoMoGAN: continuous model-guided image-to-image translation
  😮oral⭐code
- Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
  ⭐code🏠project
图像编辑
- StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
  ⭐code📺video
- Navigating the GAN Parameter Space for Semantic Image Editing
  ⭐code
人脸图像合成
- TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
  ⭐code📺video

14.Few-Shot/Zero-Shot Learning,Domain Generalization/Adaptation(小/零样本学习，域适应，域泛化)

小样本学习
域泛化
- FSDR: Frequency Space Domain Randomization for Domain Generalization
  受 JPEG 将空间图像转换为多个频率分量(FCs)的启发，提出频率空间域随机化(FSDR)，通过保留域变量FCs(DIFs)和只随机化域变量FCs(DVFs)来随机化频率空间的图像。
- Domain Generalization via Inference-time Label-Preserving Target Projections
- Adaptive Methods for Real-World Domain Generalization
  😮 Oral
- Progressive Domain Expansion Network for Single Domain Generalization
  ⭐code
- A Fourier-based Framework for Domain Generalization
  😮oral⭐code
- Adversarially Adaptive Normalization for Single Domain Generalization
- Generalization on Unseen Domains via Inference-Time Label-Preserving Target Projections
- Uncertainty-Guided Model Generalization to Unseen Domains
  ⭐code
- Open Domain Generalization with Domain-Augmented Meta-Learning
零样本学习
域适应

13.Image/Video Retrieval(图像/视频检索)

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Convolutional Hough Matching
😮oral🏠project
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
⭐code
图像检索
视频检索
- On Semantic Similarity in Video Retrieval
  ⭐code🏠project📺video
视觉搜索
- Compatibility-aware Heterogeneous Visual Search
跨模态检索
检索(三维形状检索和变形的联合学习)
- Joint Learning of 3D Shape Retrieval and Deformation

12.Image Quality Assessment(图像质量评估)

图像恢复Image Restoration
- Multi-Stage Progressive Image Restoration
  ⭐code
- See through Gradients: Image Batch Recovery via GradInversion
- Controllable Image Restoration for Under-Display Camera in Smartphones
- Zero-Shot Single Image Restoration Through Controlled Perturbation of Koschmieder's Model
  🏠project
- High-Quality Stereo Image Restoration From Double Refraction
- Image Restoration for Under-Display Camera
- 漫画修复
  - Exploiting Aliasing for Manga Restoration
去阴影Shadow Removal
- Auto-Exposure Fusion for Single-Image Shadow Removal
  ⭐code
- From Shadow Generation to Shadow Removal
  ⭐code
- No Shadow Left Behind: Removing Objects and Their Shadows Using Approximate Lighting and Geometry
去模糊Deblurring
- DeFMO: Deblurring and Shape Recovery of Fast Moving Objects
  ⭐code📺video
- ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring
- Explore Image Deblurring via Blur Kernel Space
- Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes
  ⭐code
- Learning a Non-Blind Deblurring Network for Night Blurry Images
- Ultra-High-Definition Image Dehazing via Multi-Guided Bilateral Learning
- Test-Time Fast Adaptation for Dynamic Scene Deblurring via Meta-Auxiliary Learning
- Blind Deblurring for Saturated Images
- Explore Image Deblurring via Encoded Blur Kernel Space
  ⭐code
- Learning Spatially-Variant MAP Models for Non-Blind Image Deblurring
去反射Reflection Removal
- Robust Reflection Removal with Reflection-free Flash-only Cues
  ⭐code
- Single Image Reflection Removal With Absorption Effect
  ⭐code
- Panoramic Image Reflection Removal
去雾
- Learning to Restore Hazy Video: A New Real-World Dataset and A New Method
  学习复原有雾视频：一种新的真实数据集及算法
  解读：9
- Contrastive Learning for Compact Single Image Dehazing
  ⭐code
  基于对比学习的紧凑图像去雾方法
  解读：5
- PSD: Principled Synthetic-to-Real Dehazing Guided by Physical Priors
  ⭐code
去噪Denoising
- Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images
  ⭐code
  解读：CVPR 2021 | Neighbor2Neighbor：仅需噪声图像即可训练任意降噪网络的方法
- NBNet: Noise Basis Learning for Image Denoising with Subspace Projection
  ⭐code
  粗解：9
- Invertible Denoising Network: A Light Solution for Real Noise Removal
  ⭐code
- FBI-Denoiser: Fast Blind Image Denoiser for Poisson-Gaussian Noise
  ⭐code
- Recorrupted-to-Recorrupted: Unsupervised Deep Learning for Image Denoising
- The Neural Tangent Link Between CNN Denoisers and Non-Local Filters
  ⭐code
- Deep Denoising of Flash and No-Flash Pairs for Photography in Low-Light Environments
  🏠project
- Adaptive Consistency Prior Based Deep Network for Image Denoising
- EventZoom: Learning To Denoise and Super Resolve Neuromorphic Events
  🏠project📺video
- Extreme Low-Light Environment-Driven Image Denoising Over Permanently Shadowed Lunar Regions With a Physical Noise Model
- Guided Integrated Gradients: An Adaptive Path Method for Removing Noise
- Effective Snapshot Compressive-Spectral Imaging via Deep Denoising and Total Variation Priors
  ⭐code
- Deep Convolutional Dictionary Learning for Image Denoising
  ⭐code
- Learning An Explicit Weighting Scheme for Adapting Complex HSI Noise
- Pseudo 3D Auto-Correlation Network for Real Image Denoising
去雨Deraining
- Semi-Supervised Video Deraining with Dynamic Rain Generator
- Closing the Loop: Joint Rain Generation and Removal via Disentangled Image Translation
- Robust Representation Learning With Feedback for Single Image Deraining
  ⭐code
- Multi-Decoding Deraining Network and Quasi-Sparsity Based Training
- Image De-Raining via Continual Learning
- From Rain Generation to Rain Removal
  ⭐code
- Memory Oriented Transfer Learning for Semi-Supervised Image Deraining
- Removing Raindrops and Rain Streaks in One Go
- 控制雨量
  - Controlling the Rain: From Removal to Rendering
曝光校正
- Learning Multi-Scale Photo Exposure Correction
  ⭐code
图像修复Image Inpainting
- Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE
  ⭐code
- TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations
  🏠project
- Image Inpainting with External-internal Learning and Monochromic Bottleneck
  ⭐code
- PD-GAN: Probabilistic Diverse GAN for Image Inpainting
  ⭐code
- Image Inpainting Guided by Coherence Priors of Semantics and Textures
图像编辑
- DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
  ⭐code
图像压缩
- Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton
- Slimmable Compressive Autoencoders for Practical Neural Image Compression
  ⭐code
- Checkerboard Context Model for Efficient Learned Image Compression
- Learning Scalable ℓ∞-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression
  ⭐code
- Deep Homography for Efficient Stereo Image Compression
  ⭐code
  分享会
- iVPF: Numerical Invertible Volume Preserving Flow for Efficient Lossless Compression
- What's in the Image? Explorable Decoding of Compressed Images
- Learning Scalable lY=-Constrained Near-Lossless Image Compression via Joint Lossy Image and Residual Compression
- Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation
de-rendering
- De-rendering the World's Revolutionary Artefacts
  🏠project📺video
- How To Exploit the Transferability of Learned Image Compression to Conventional Codecs
消除图像伪影
- Removing Diffraction Image Artifacts in Under-Display Camera via Dynamic Skip Connection Network
  ⭐code🏠project
图像对齐
- Deep Lucas-Kanade Homography for Multimodal Image Alignment
  ⭐code
图像和谐化
- Region-aware Adaptive Instance Normalization for Image Harmonization
  ⭐code
- Intrinsic Image Harmonization
  ⭐code
图像增强
- CAMERAS: Enhanced Resolution and Sanity Preserving Class Activation Mapping for Image Saliency
  ⭐code
- Retinex-Inspired Unrolling With Cooperative Prior Architecture Search for Low-Light Image Enhancement
  ⭐code🏠project
- Debiased Subjective Assessment of Real-World Image Enhancement
- Learning Temporal Consistency for Low Light Video Enhancement From Single Images
  ⭐code
Image Stabilization防抖
- Digital Gimbal: End-to-end Deep Image Stabilization with Learnable Exposure Times
去散焦模糊
- Iterative Filter Adaptive Network for Single Image Defocus Deblurring
去遮挡
- Human De-Occlusion: Invisible Perception and Recovery for Humans
  🏠project
增强夜间可视度
- Nighttime Visibility Enhancement by Increasing the Dynamic Range and Suppression of Light Effects
图像补全
- Prior Based Human Completion
image steganography(图片隐写术)
- Large-Capacity Image Steganography Based on Invertible Neural Networks
Image Blending
- Bridging the Visual Gap: Wide-Range Image Blending
  ⭐code
图像矫正
- Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow
  ⭐code
Defocus Blur Detection(检测由散焦导致的模糊区域)
- Self-Generated Defocus Blur Detection via Dual Adversarial Discriminators
  ⭐code
场景恢复（不同天气、成像）
- Rank-One Prior: Toward Real-Time Scene Recovery
- ZeroScatter: Domain Transfer for Long Distance Imaging and Vision Through Scattering Media
  🏠project📺video
Image cropping(图片裁剪)
- Composing Photos Like a Photographer
Image Stitching(图像拼接)
- Leveraging Line-Point Consistence To Preserve Structures for Wide Parallax Image Stitching
  ⭐code
深度估计+图像修复
- Dual Pixel Exploration: Simultaneous Depth Estimation and Image Restoration
Image extrapolation
- OCONet: Image Extrapolation by Object Completion
图像编辑
- Learning by Planning: Language-Guided Global Image Editing
图像质量
- Quality-Agnostic Image Recognition via Invertible Decoder
- Troubleshooting Blind Image Quality Models in the Wild
  ⭐code
HDR Deghosting(HDR去伪影)
- Labeled From Unlabeled: Exploiting Unlabeled Data for Few-Shot Deep HDR Deghosting
图像增亮
- Restoring Extremely Dark Images in Real Time
  ⭐code
图像降质
- DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows
  😮oral⭐code
Specular highlight 检测与去除
- A Multi-Task Network for Joint Specular Highlight Detection and Removal

11. Face(人脸技术)

Towards High Fidelity Face Relighting with Realistic Shadows
⭐code
IronMask: Modular Architecture for Protecting Deep Face Template
Everything's Talkin': Pareidolia Face Reenactment
⭐code🏠project📺video
人脸识别
- A 3D GAN for Improved Large-pose Facial Recognition
- When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
  😮oral⭐code
- MagFace: A Universal Representation for Face Recognition and Quality Assessment
  😮oral⭐code
  人脸识别+质量，今年的Oral presentation。代码待整理
- WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition
  🏠project
- ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
  😮oral🏠project📺video
- Spherical Confidence Learning for Face Recognition
  ⭐code
  😮oral
  基于超球流形置信度学习的人脸识别
- CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement
- Cross-Domain Similarity Learning for Face Recognition in Unseen Domains
- HLA-Face: Joint High-Low Adaptation for Low Light Face Detection
  🏠project
- FACESEC: A Fine-grained Robustness Evaluation Framework for Face Recognition Systems
- Dynamic Class Queue for Large Scale Face Recognition In the Wild
  ⭐code
- Consistent Instance False Positive Improves Fairness in Face Recognition
  ⭐code
  基于实例误报一致性的人脸识别公平性提升方法
  解读：7
- VirFace: Enhancing Face Recognition via Unlabeled Shallow Data
- Variational Prototype Learning for Deep Face Recognition
- Mitigating Face Recognition Bias via Group Adaptive Classifier
  ⭐code
- Pseudo Facial Generation With Extreme Poses for Face Recognition
- Improving Transferability of Adversarial Patches on Face Recognition With Generative Models
- Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition
合成人脸（Deepfake/Face Forgery）检测
- Multi-attentional Deepfake Detection
  ⭐code
- Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection
- MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes
- Face Forensics in the Wild
  😮oral⭐code
- Improving the Efficiency and Robustness of Deepfakes Detection through Precise Geometric Features
  ⭐code
- Lips Don't Lie: A Generalisable and Robust Approach To Face Forgery Detection
- Representative Forgery Mining for Fake Face Detection
  ⭐code
- Exploring Adversarial Fake Images on Face Manifold
- Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain
- Generalizing Face Forgery Detection With High-Frequency Features
- Face Forgery Detection by 3D Decomposition
人脸质量评估
- SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance
  ⭐code
  基于相似度分布距离的无监督人脸质量评估
  解读：6
3D人脸重建
- 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction
  ⭐code🏠project
- Riggable 3D Face Reconstruction via In-Network Optimization
  ⭐code
  本文通过一个嵌入了网络内优化的端到端可训练网络，解决了从单目 RGB 图像中 riggable 3D 人脸重建。并且达到了最先进的重建精度，合理的鲁棒性和泛化能力，可以应用于标准的 face rig 应用，如重定位。
- Pixel Codec Avatars
  😮oral
- Inverting Generative Adversarial Renderer for Face Reconstruction
  ⭐code
  解读：商汤、港中文实现单目人脸重建新突破：基于生成网络的渲染器！几何形状更精准！渲染效果更真实！
- Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection
  在开放的人像集合中学习3D人脸的聚合与特异化重建
  😮oral⭐code
- Monocular Reconstruction of Neural Face Reflectance Fields
  🏠project
- Learning Complete 3D Morphable Face Models From Images and Videos
  🏠project
人脸表情识别
- Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition
- Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition
- Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition
- Learning a Facial Expression Embedding Disentangled from Identity
人脸聚类
- Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes
  ⭐code🏠project
人脸编辑
- High-Fidelity and Arbitrary Face Editing
人脸跟踪
- High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation
  🏠project📺video
广角人脸矫正
- Practical Wide-Angle Portraits Correction with Deep Structured Models
  ⭐code
  粗解：7
人脸活体检测
- Cross Modal Focal Loss for RGBD Face Anti-Spoofing
  ⭐code
音频驱动合成赋有情感的人脸
- Audio-Driven Emotional Video Portraits
  ⭐code🏠project
换脸
- Information Bottleneck Disentanglement for Identity Swapping
  分享会
- One Shot Face Swapping on Megapixels
  🌻dataset
人脸修复
- FaceInpainter: High Fidelity Face Adaptation to Heterogeneous Domains
  分享会
- Progressive Semantic-Aware Style Transformation for Blind Face Restoration
  ⭐code
- GAN Prior Embedded Network for Blind Face Restoration in the Wild
  ⭐code
- Towards Real-World Blind Face Restoration With Generative Facial Prior
  ⭐code
人脸动画
- Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
  ⭐code🏠project📺video
  解读： “以音动人”：姿态可控的语音驱动说话人脸
3D Talking Faces
- LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization
  📺video
人脸认证
- Privacy-Preserving Image Features via Adversarial Affine Subspace Embeddings
人脸纹理补全
- OSTeC: One-Shot Texture Completion
  ⭐code
人脸对齐
- img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation
  ⭐code
人脸老龄化
- Continuous Face Aging via Self-Estimated Residual Age Embedding
Facial Action Unit Detection(面部运动单元检测)
- Hybrid Message Passing With Performance-Driven Structures for Facial Action Unit Detection
- Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection
- Dynamic Probabilistic Graph Convolution for Facial Action Unit Intensity Estimation
人脸重建
- Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction
  😮oral⭐code🏠project📺video
人脸属性识别
- Learning Spatial-Semantic Relationship for Facial Attribute Recognition With Limited Labeled Data
人脸模糊化
- Perceptual Indistinguishability-Net (PI-Net): Facial Image Obfuscation with Manipulable Semantics
人脸生成
- Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset
  ⭐code

10.Neural Architecture Search(神经架构搜索)

AttentiveNAS: Improving Neural Architecture Search via Attentive
HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens
⭐code
ReNAS: Relativistic Evaluation of Neural Architecture Search
OPANAS: One-Shot Path Aggregation Network Architecture Search for Object
Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search
北京大学人工智能研究院机器学习研究中心
Contrastive Neural Architecture Search with Neural Architecture Comparators
⭐code
Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator
⭐code
Prioritized Architecture Sampling with Monto-Carlo Tree Search
⭐code

One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking
⭐code
NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network Training and Architecture Optimization
🏠project
Neural Architecture Search with Random Labels
粗解：1
解读：基于随机标签的神经架构搜索
Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search
⭐code
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search
⭐code🌻dataset
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers
😮oral⭐code
DOTS: Decoupling Operation and Topology in Differentiable Architecture Search
⭐code
NPAS: A Compiler-Aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration
DSRNA: Differentiable Search of Robust Neural Architectures
Rethinking Graph Neural Architecture Search From Message-Passing
⭐code
FP-NAS: Fast Probabilistic Neural Architecture Search
FBNetV3: Joint Architecture-Recipe Search Using Predictor Pretraining
AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling
⭐code

9.Object Tracking(目标跟踪)

Rotation Equivariant Siamese Networks for Tracking
⭐code
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
⭐code
LightTrack：用神经架构搜索得到的轻量级跟踪网络，精度超过SiamRPN++ 和 Ocean，速度快12倍，参数量只有1/13，Flops仅有1/38。代码将开源。
Track, Check, Repeat: An EM Approach to Unsupervised Tracking
🏠project📺video
Learning To Filter: Siamese Relation Network for Robust Tracking
⭐code
Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation
⭐code

CapsuleRRT: Relationships-Aware Regression Tracking via Capsules
Siamese Natural Language Tracker: Tracking by Natural Language Descriptions With Siamese Trackers
⭐code
MeanShift++: Extremely Fast Mode-Seeking With Applications to Segmentation and Object Tracking
Learning To Fuse Asymmetric Feature Maps in Siamese Trackers
⭐code
多目标跟踪
- Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
  ⭐code
- Track to Detect and Segment: An Online Multi-Object Tracker
  ⭐code🏠project📺video
  TraDeS ：CVPR 2021多目标跟踪算法，改进了目前联合检测与跟踪的在线方法，使用跟踪线索辅助检测，在多个数据集实现了大幅精度提升，作者来自纽约州立大学。代码已开源。
- Multiple Object Tracking with Correlation Learning
  提出 CorrTracker，一个统一的关联跟踪器，可以密集建模目标之间的关联，并通过关联传递信息。在 MOT17 上获得最先进的 MOTA 76.5% 和 IDF1 73.6%。
- Learning a Proposal Classifier for Multiple Object Tracking
  ⭐code
- Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking
  ⭐code
- Online Multiple Object Tracking with Cross-Task Synergy
  ⭐code
- SiamMOT: Siamese Multi-Object Tracking
  ⭐code
- DyGLIP: A Dynamic Graph Model with Link Prediction for Accurate Multi-Camera Multiple Object Tracking
  ⭐code
- Quasi-Dense Similarity Learning for Multiple Object Tracking
  😮oral⭐code
- Discriminative Appearance Modeling With Multi-Track Pooling for Real-Time Multi-Object Tracking
  ⭐code
- GMOT-40: A Benchmark for Generic Multiple Object Tracking
  ⭐code
- Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT Philosophy
  ⭐code
- Improving Multiple Object Tracking With Single Object Tracking
- 3D多目标跟踪
  - Seeing Behind Objects for 3D Multi-Object Tracking in RGB-D Sequences
视觉目标跟踪
单目标跟踪
- Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark
  🏠project📺video
- SiamGAT: Graph Attention Tracking
  ⭐code
视觉跟踪
STMTrack: Template-Free Visual Tracking With Space-Time Memory Networks
⭐code
姿势跟踪
- TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking
  🏠project
行人跟踪
- Improving Multiple Pedestrian Tracking by Track Management and Occlusion Handling

8.Image Segmentation(图像分割)

Information-Theoretic Segmentation by Inpainting Error Maximization
Capturing Omni-Range Context for Omnidirectional Segmentation
⭐code
Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
⭐code🏠project

Locate then Segment: A Strong Pipeline for Referring Image Segmentation
InverseForm: A Loss Function for Structured Boundary-Aware Segmentation
😮oral
Omnimatte: Associating Objects and Their Effects in Video
😮oral🏠project
Unsupervised Part Segmentation through Disentangling Appearance and Shape
Encoder Fusion Network With Co-Attention Embedding for Referring Image Segmentation
Bottom-Up Shift and Reasoning for Referring Image Segmentation
⭐code
DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation
ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Image Segmentation
DyStaB: Unsupervised Object Segmentation via Dynamic-Static Bootstrapping
实例分割
- Zero-Shot Instance Segmentation
  ⭐code
  创新奇智首次提出零样本实例分割，助力解决工业场景数据瓶颈难题
- Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers
  ⭐code
  解读：双图层实例分割，大幅提升遮挡处理性能
- Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency
- FAPIS: A Few-shot Anchor-free Part-based Instance Segmenter
- Weakly-supervised Instance Segmentation via Class-agnostic Learning with Salient Images
  ⭐code
- Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation
  ⭐code
- RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features
  ⭐code
- A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation
- Incremental Few-Shot Instance Segmentation
  ⭐code
- Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segmentation
  😮oral⭐code
- Point Cloud Instance Segmentation Using Probabilistic Embeddings
  🏠project
- DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
- Robust Instance Segmentation Through Reasoning About Multi-Object Occlusion
  ⭐code
- Deeply Shape-Guided Cascade for Instance Segmentation
  ⭐code
- ColorRL: Reinforced Coloring for End-to-End Instance Segmentation
  ⭐code
- Unsupervised Discovery of the Long-Tail in Instance Segmentation Using Hierarchical Self-Supervision
- DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution
  ⭐code
- BoxInst: High-Performance Instance Segmentation With Box Annotations
  ⭐code
全景分割
- 4D Panoptic LiDAR Segmentation
  ⭐code
- Cross-View Regularization for Domain Adaptive Panoptic Segmentation
  😮oral
  用于域自适应全景分割的跨视图正则化方法
- Part-aware Panoptic Segmentation
  ⭐code
- Toward Joint Thing-and-Stuff Mining for Weakly Supervised Panoptic Segmentation
  联合物体和物质挖掘的弱监督全景分割
  解读：15
- Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation
  ⭐code
- Fully Convolutional Networks for Panoptic Segmentation
  😮oral⭐code
  粗解：11
- Panoptic Segmentation Forecasting
  ⭐code
- Exemplar-Based Open-Set Panoptic Segmentation Network
  ⭐code🏠project
- Hierarchical Lovasz Embeddings for Proposal-free Panoptic Segmentation
- VIP-DeepLab: Learning Visual Perception With Depth-Aware Video Panoptic Segmentation
  ⭐code
- Learning To Associate Every Segment for Video Panoptic Segmentation
- LiDAR-Based Panoptic Segmentation via Dynamic Shifting Network
  ⭐code
- LPSNet: A Lightweight Solution for Fast Panoptic Segmentation
- Improving Panoptic Segmentation at All Scales
语义分割
- PLOP: Learning without Forgetting for Continual Semantic Segmentation
  ⭐code
- Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation
- Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
- Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
  😮oral⭐code
- Learning Statistical Texture for Semantic Segmentation
- MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation
  ⭐code
  语义分割中的无监督域适应的域感知元损失校正
- Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations
  ⭐code📺video
- Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
  ⭐code
- Rethinking BiSeNet For Real-time Semantic Segmentation
  ⭐code
- BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation
  ⭐code
- Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation
  ⭐code
- Cross-Dataset Collaborative Learning for Semantic Segmentation
- Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization
- Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation
  ⭐code
- Source-Free Domain Adaptation for Semantic Segmentation
- PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering
  ⭐code
- Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation
  🏠project
- Progressive Semantic Segmentation
  ⭐code
- Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization
  🏠project
- DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation
  😮oral⭐code
  实现夜间语义分割最先进性能，已开源。
- Self-supervised Augmentation Consistency for Adapting Semantic Segmentation
  ⭐code
- Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation
  ⭐code
- Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
  ⭐code
- Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation
- Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency
- Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation
- Uncertainty Reduction for Model Adaptation in Semantic Segmentation
  ⭐code
- HyperSeg: Patch-Wise Hypernetwork for Real-Time Semantic Segmentation
  ⭐code🏠project
- Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds
- Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation
  ⭐code
- Few-Shot 3D Point Cloud Semantic Segmentation
  ⭐code
- Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation
  ⭐code
- Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation
  ⭐code
- (AF)2-S3Net: Attentive Feature Fusion With Adaptive Feature Selection for Sparse Semantic Segmentation Network
- One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation
- Exploit Visual Dependency Relations for Semantic Segmentation
- Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs
- ABMDRNet: Adaptive-Weighted Bi-Directional Modality Difference Reduction Network for RGB-T Semantic Segmentation
- CGA-Net: Category Guided Aggregation for Point Cloud Semantic Segmentation
场景理解/场景解析
- Bidirectional Projection Network for Cross Dimension Scene Understanding
  😮oral⭐code
- RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
  😮oral⭐code
- CoCoNets: Continuous Contrastive 3D Scene Representations
  🏠project📺video
  来自CMU的学者提出一种3D场景表示方法，利用自监督对比学习和输入的RGB与RGBD场景数据学习而来，这种特征表示方法在目标跟踪、检测等下游任务中表现出良好的性能。
- RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction
- 3D-to-2D Distillation for Indoor Scene Parsing
- Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts
- 场景图合成/分析
  - SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences
    🏠project
  - Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation
    场景图生成---场景解析
  - Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis
    🏠project
    利用面向边缘的推理进行基于3D点的场景图分析---场景理解
  - Fully Convolutional Scene Graph Generation
    😮oral
  - Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation
    ⭐code
  - Linguistic Structures as Weak Supervision for Visual Scene Graph Generation
    ⭐code
  - Energy-Based Learning for Scene Graph Generation
    ⭐code
- 3D 场景理解
抠图
- Real-Time High Resolution Background Matting
  ⭐code🏠project📺video
  最新开源抠图技术，实时快速高分辨率，4k(30fps)、现代GPU（60fps）
  解读：单块GPU实现4K分辨率每秒30帧，华盛顿大学实时视频抠图再升级，毛发细节到位
   最新开源抠图技术，实时快速高分辨率，4k(30fps)、现代GPU（60fps）
- Mask Guided Matting via Progressive Refinement Network
  ⭐code
- Semantic Image Matting
  ⭐code
- Improved Image Matting via Real-Time User Clicks and Uncertainty Estimation
  📺video
- Learning Affinity-Aware Upsampling for Deep Image Matting
雷达分割
- Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
  😮oral⭐code
  在 SemanticKITTI 榜单排名第一（until CVPR DDL），在 nuScenes 中获得 SOTA，并对其他基于激光雷达的任务保持了良好的泛化能力，包括激光雷达全景分割和激光雷达三维检测，其中就基于此工作，在 SemanticKITTI 全景分割榜单也排名第一。
视频目标分割
- Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild
  ⭐code
- Efficient Regional Memory Network for Video Object Segmentation
  ⭐code🏠project
- Learning Position and Target Consistency for Memory-based Video Object Segmentation
  在 DAVIS 和 Youtube-VOS 基准上都达到了最先进的性能，并在 DAVIS 2020 挑战半监督 VOS 任务中排名第一。
- Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps
  😮oral⭐code
- Reciprocal Transformations for Unsupervised Video Object Segmentation
  ⭐code
- Delving Deep Into Many-to-Many Attention for Few-Shot Video Object Segmentation
  ⭐code
- Video Object Segmentation Using Global and Instance Embedding Learning
- SwiftNet: Real-Time Video Object Segmentation
  ⭐code
- SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
  ⭐code
- Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
  ⭐code🏠project📺video
- Learning Dynamic Network Using a Reuse Gate Function in Semi-Supervised Video Object Segmentation
  ⭐code
- point set tracking
  - Polygonal Point Set Tracking
- 视频多目标分割
  - Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation
视频实例分割
- SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation
  ⭐code📺video
  文章介绍一个简单有效的单阶段框架：SG-Net，与传统的两阶段框架相比，可以有效提高掩码质量和推理速度。
- Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
  ⭐code
小样本分割
- Self-Guided and Cross-Guided Learning for Few-Shot Segmentation
  ⭐code
- Adaptive Prototype Learning and Allocation for Few-Shot Segmentation
  ⭐code
- Few-Shot Segmentation Without Meta-Learning: A Good Transductive Inference Is All You Need?
  ⭐code
伪装目标分割
- Camouflaged Object Segmentation with Distraction Mining
  🏠project
视频抠图
- Deep Video Matting via Spatio-Temporal Alignment and Aggregation
  🌻dataset
点云分割
- Omni-supervised Point Cloud Segmentation via Gradual Receptive Field Component Reasoning
  ⭐code
语义部分分割
- Repurposing GANs for One-Shot Semantic Part Segmentation
  😮oral🏠project
镜像分割
- Depth-Aware Mirror Segmentation
  🏠project📺video
运动分割
- Learning To Segment Rigid Motions From Two Frames
  ⭐code
细粒度分割
- Learning Fine-Grained Segmentation of 3D Shapes without Part Labels

7.Object Detection(目标检测)

Multiple Instance Active Learning for Object Detection
⭐code
Positive-Unlabeled Data Purification in the Wild for Object Detection
Depth from Camera Motion and Object Detection
⭐github📺video
通过使用“普通手机摄像头运动+目标检测的包围框”数据，设计RNN网络实现了达到最先进精度的目标深度估计。
Towards Open World Object Detection
😮oral⭐code
General Instance Distillation for Object Detection
近年来，知识蒸馏已被证明是模型压缩的有效解决方案。可以使轻量级的学生模型获得从繁琐的教师模型中提取的知识，但以往的检测蒸馏方法对于不同的检测框架的泛化能力较弱，而且严重依赖ground truth（GT），忽略了实例之间有价值的关系信息。为此，作者在本文中提出新的基于判别性实例的检测任务蒸馏方法，不考虑 GT 区分的正负，命名为通用实例蒸馏（GID）。该方法包含一个通用实例选择模块(GISM)，可以充分利用基于特征、基于关系和基于响应的知识进行蒸馏。实验验证，学生模型在各种检测框架中可以实现显著的 AP 改进，甚至优于教师模型。具体来说，RetinaNet 与 ResNet-50 在 COCO 数据集上用 GID 实现了39.1% 的 mAP，比基线 36.2% 超出了 2.9%，甚至优于基于 ResNet-101 的教师模型 38.1% 的 AP。
MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection
Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection
😮oral

You Only Look One-level Feature
⭐code
开源 YOLOF，无需 FPN，速度比 YOLOv4 快13%
解读：目标检测算法YOLOF：You Only Look One-level Feature

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
⭐code
End-to-End Object Detection with Fully Convolutional Network
⭐code
解读：丢弃Transformer，FCN也可以实现E2E检测
Robust and Accurate Object Detection via Adversarial Learning

Distilling Object Detectors via Decoupled Features
⭐code
OTA: Optimal Transport Assignment for Object Detection
⭐code
Scale-aware Automatic Augmentation for Object Detection
⭐code
A Closer Look at Fourier Spectrum Discrepancies for CNN-generated Images Detection
😮oral🏠project
IQDet: Instance-wise Quality Distribution Sampling for Object Detection
粗解：20
Domain-Specific Suppression for Adaptive Object Detection
PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
Dynamic Head: Unifying Object Detection Heads with Attentions
⭐code📺video
Open-Vocabulary Object Detection Using Captions
😮oral⭐code
MobileDets: Searching for Object Detection Architectures for Mobile Accelerators
⭐code
Layer-Wise Searching for 1-Bit Detectors
OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
⭐code
GAIA: A Transfer Learning System of Object Detection That Fits Your Needs
⭐code
DetectoRS: Detecting Objects With Recursive Feature Pyramid and Switchable Atrous Convolution
⭐code
RankDetNet: Delving Into Ranking Constraints for Object Detection
AQD: Towards Accurate Quantized Object Detection
😮oral⭐code
Class-Aware Robust Adversarial Training for Object Detection
Scaled-YOLOv4: Scaling Cross Stage Partial Network
Improved Handling of Motion Blur in Online Object Detection
🏠project
The Translucent Patch: A Physical and Universal Attack on Object Detectors
📺video
Unbiased Mean Teacher for Cross-Domain Object Detection
⭐code
Interpolation-Based Semi-Supervised Learning for Object Detection
⭐code
Neural Auto-Exposure for High-Dynamic Range Object Detection
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
Black-Box Explanation of Object Detectors via Saliency Maps
😮oral🏠project📺video
小样本目标检测
- Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection
  首个研究少样本检测任务的语义关系推理，并证明它可提升强基线的潜。
- Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection
  ⭐code
  北京大学人工智能研究院机器学习研究中心
- FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding
  ⭐code
- Generalized Few-Shot Object Detection without Forgetting
  粗解：16
- Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss
- Hallucination Improves Few-Shot Object Detection
  ⭐code
- Few-Shot Object Detection via Classification Refinement and Distractor Retreatment
- Transformation Invariant Few-Shot Object Detection
- Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection
  ⭐code
多目标检测
- There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
  🏠project
3D目标检测
- Categorical Depth Distribution Network for Monocular 3D Object Detection
  😮oral⭐code
- 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection
  ⭐code🏠project📺video
  更多：CVPR 2021|利用IoU预测进行半监督式3D目标检测
- Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection
  ⭐code
- M3DSSD: Monocular 3D Single Stage Object Detector
  ⭐code
- GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection
  ⭐code📺video
  提出并集成 GrooMeD-NMS，用于单目3D目标检测。解决了训练和推理管道之间的不匹配问题，在 KITTI 基准数据集上实现最先进的单目3D目标检测结果，表现与基于单目视频的方法相当。
- LiDAR R-CNN: An Efficient and Universal 3D Object Detector
  ⭐code
- Delving into Localization Errors for Monocular 3D Object Detection
  ⭐code
- HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection
  🏠project
- Objects are Different: Flexible Monocular 3D Object Detection
  ⭐code
- Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds
  ⭐code
- PointAugmenting: Cross-Modal Augmentation for 3D Object Detection
  分享会
- SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud
  ⭐code
  提出 Self-Ensembling Single-Stage object Detector（SE-SSD），用于在室外点云中进行准确和有效的 3D 目标检测。关键在于利用 soft 和 hard targets 与所制定的约束条件来共同优化模型，而不在推理中引入额外的计算。与之前的所有作品相比，SE-SSD 达到了顶级性能。此外，它在 KITTI 基准中的汽车检测中获得了最高的精度（分别在 BEV 和 3D 排行榜上排名第一和第二），并具有超高的推理速度。
- Offboard 3D Object Detection From Point Cloud Sequences
- Monocular 3D Object Detection: An Extrinsic Parameter Free Approach
- SRDAN: Scale-Aware and Range-Aware Domain Adaptation Network for Cross-Dataset 3D Object Detection
- PVGNet: A Bottom-Up One-Stage 3D Object Detector With Integrated Multi-Level Features
- MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation
  ⭐code
- LiDAR-Aug: A General Rendering-Based Augmentation Framework for 3D Object Detection
- ST3D: Self-Training for Unsupervised Domain Adaptation on 3D Object Detection
  ⭐code
- RangeIoUDet: Range Image Based Real-Time 3D Object Detector Optimized by Intersection Over Union
- Center-Based 3D Object Detection and Tracking
  ⭐code
- 3D Object Detection with Pointformer
  ⭐code
- To the Point: Efficient 3D Object Detection in the Range Image With Graph Convolution Kernels
- RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection
- 3D-MAN: 3D Multi-Frame Attention Network for Object Detection
旋转目标检测
- Dense Label Encoding for Boundary Discontinuity Free Rotation Detection
  ⭐code
弱监督目标定位
- Shallow Feature Matters for Weakly Supervised Object Localization
- Unveiling the Potential of Structure Preserving for Weakly Supervised Object Localization
  ⭐code
  基于结构信息保持的弱监督目标定位
  解读：13
- Strengthen Learning Tolerance for Weakly Supervised Object Localization
  🏠project
密集目标检测
- Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
  ⭐code
  解读：目标检测无痛涨点之 Generalized Focal Loss V2
- VarifocalNet: An IoU-Aware Dense Object Detector
  😮oral⭐code
- Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection
显著目标检测
- Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion
  😮oral
- Weakly Supervised Video Salient Object Detection
  ⭐code
- Uncertainty-aware Joint Salient Object and Camouflaged Object Detection
  ⭐code
- Calibrated RGB-D Salient Object Detection
  ⭐code
- From Semantic Categories to Fixations: A Novel Weakly-Supervised Visual-Auditory Saliency Detection Approach
  ⭐code
- co-saliency detection(协同显著目标检测)
  - DeepACG: Co-Saliency Detection via Semantic-Aware Contrast Gromov-Wasserstein Distance
  - Group Collaborative Learning for Co-Salient Object Detection
    ⭐code
半监督目标检测
- Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection
- Points As Queries: Weakly Semi-Supervised Object Detection by Points
  粗解：6
- Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection
- Humble Teachers Teach Better Students for Semi-Supervised Object Detection
长尾目标检测
- Adaptive Class Suppression Loss for Long-Tail Object Detection
  ⭐code
- Equalization Loss v2: A New Gradient Balance Approach for Long-Tailed Object Detection
  ⭐code
- Seesaw Loss for Long-Tailed Instance Segmentation
  ⭐code
单阶目标检测
- I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors
阴影检测
- Triple-Cooperative Video Shadow Detection
  ⭐code
- Single-Stage Instance Shadow Detection with Bidirectional Relation Learning
  😮oral⭐code
无监督目标检测
- Unsupervised Object Detection With LIDAR Clues
域适应目标检测
- RPN Prototype Alignment for Domain Adaptive Object Detector
glass surface detection
- Rich Context Aggregation With Reflection Prior for Glass Surface Detection
伪装物体检测
- Mutual Graph Learning for Camouflaged Object Detection
  ⭐code
- Simultaneously Localize, Segment and Rank the Camouflaged Objects
  ⭐code
Any-Shot目标检测
- UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation
  ⭐code

6.Data Augmentation(数据增广)

KeepAugment: A Simple Information-Preserving Data Augmentation

SuperMix: Supervising the Mixing Data Augmentation
⭐code
On Feature Normalization and Data Augmentation
⭐code
StyleMix: Separating Content and Style for Enhanced Data Augmentation
⭐code

5.Anomaly Detection(异常检测)

Multiresolution Knowledge Distillation for Anomaly Detection
⭐code
PANDA: Adapting Pretrained Features for Anomaly Detection and Segmentation
⭐code
Glancing at the Patch: Anomaly Localization with Global and Local Feature Comparison

驾驶场景下的像素异常检测
- Pixel-Wise Anomaly Detection in Complex Driving Scenes
  ⭐code

4.Weakly Supervised/Semi-Supervised/Self-supervised/Unsupervised Learning(自/半/弱监督学习)

弱监督
- Weakly Supervised Learning of Rigid 3D Scene Flow
  😮oral⭐code🏠project
- Relation-aware Instance Refinement for Weakly Supervised Visual Grounding
  ⭐code
半监督
自监督
- Self-supervised Geometric Perception
  😮oral⭐code
  作者称 SGP 是第一个在几何感知中进行特征学习的通用框架，不需要任何来自 ground-truth 几何标签的监督。SGP以EM方式运行，它迭代执行几何模型的鲁棒估计以生成伪标签，并在噪声伪标签的监督下进行特征学习。将 SGP 应用于相机姿势估计和点云配准，并证明在大规模真实数据集中，SGP 的性能等同于甚至优于监督的权威。
- Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting
  ⭐code
- Self-supervised Motion Learning from Static Images
- SOLD2: Self-supervised Occlusion-aware Line Description and Detection
  😮oral⭐code
- All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training
  ⭐code
- Global Transport for Fluid Reconstruction with Learned Self-Supervision
  😮oral⭐code
- Task Programming: Learning Data Efficient Behavior Representations
  😮oral⭐code🏠project
- Audio-Visual Instance Discrimination with Cross-Modal Agreement
- Safe Local Motion Planning With Self-Supervised Freespace Forecasting
  ⭐code
- Back to Event Basics: Self-Supervised Learning of Image Reconstruction for Event Cameras via Photometric Constancy
  ⭐code🏠project
- Exponential Moving Average Normalization for Self-Supervised and Semi-Supervised Learning
- How Well Do Self-Supervised Models Transfer?
  ⭐code
- The Lottery Tickets Hypothesis for Supervised and Self-Supervised Pre-Training in Computer Vision Models
  ⭐code
- OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning
  ⭐code
- SSLayout360: Semi-Supervised Indoor Layout Estimation From 360deg Panorama
- Instance Localization for Self-supervised Detection Pretraining
  ⭐code
- CASTing Your Model: Learning to Localize Improves Self-Supervised Representations
  ⭐code
- Self-supervised Motion Learning from Static Images
- SPSG: Self-Supervised Photometric Scene Generation From RGB-D Scans
- SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning
无监督
- A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
  ⭐code
Unsupervised Visual Representation Learning by Tracking Patches in Video
⭐code
SMURF: Self-Teaching Multi-Frame Unsupervised RAFT with Full-Image Warping
⭐code
PAUL: Procrustean Autoencoder for Unsupervised Lifting
Progressive Stage-Wise Learning for Unsupervised Feature Representation Enhancement
VDSM: Unsupervised Video Disentanglement With State-Space Modeling and Deep Mixtures of Experts
Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination
⭐code
Recurrent Multi-View Alignment Network for Unsupervised Surface Registration
⭐code
Feature-Level Collaboration: Joint Unsupervised Learning of Optical Flow, Stereo Depth and Camera Motion

3.Point Cloud(点云)

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion
MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization
😮oral⭐code
TPCN: Temporal Point Cloud Networks for Motion Forecasting
用于运动预测的时空点云网络
How Privacy-Preserving are Line Clouds? Recovering Scene Details from 3D Lines
⭐code
PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
⭐code
Point2Skeleton: Learning Skeletal Representations from Point Clouds
😮oral⭐code🏠project
FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
RPSRNet: End-to-End Trainable Rigid Point Set Registration Network using Barnes-Hut 2D-Tree Representation
Point Cloud Upsampling via Disentangled Refinement
⭐code
Regularization Strategy for Point Cloud via Rigidly Mixed Sample
⭐code
Verifiability and Predictability: Interpreting Utilities of Network Architectures for Point Cloud Processing

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos
PointNetLK Revisited
😮oral⭐code
PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds
⭐code
点云配准
点云补全
点云关键点检测
- Skeleton Merger: an Unsupervised Aligned Keypoint Detector
  ⭐code
3D点云
点云压缩
- VoxelContext-Net: An Octree based Framework for Point Cloud Compression
点云识别
- 3D Spatial Recognition without Spatially Labeled 3D
  🏠project
点云分割
- SCF-Net: Learning Spatial Contextual Features for Large-Scale Point Cloud Segmentation
  ⭐code

2.Graph Neural Networks(图卷积网络GNN、GCN、GMN)

Sequential Graph Convolutional Network for Active Learning
Quantifying Explainers of Graph Neural Networks in Computational Pathology
⭐code
Binary Graph Neural Networks
⭐code
Amalgamating Knowledge from Heterogeneous Graph Neural Networks

GCN
Graph Matching Networks(GMN)
- LayoutGMN: Neural Graph Matching for Structural Layout Similarity
- Scene Essence

1.Unkown(未分类)

Reconsidering Representation Alignment for Multi-view Clustering
⭐code
Self-supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map
Neural Geometric Level of Detail:Real-time Rendering with Implicit 3D Surfaces
😮Oral⭐code🏠project
Data-Free Model Extraction
⭐code
Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning
😮oral
PatchmatchNet: Learned Multi-View Patchmatch Stereo
😮oral⭐code
Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning
⭐code🏠project
Semantic Palette: Guiding Scene Generation with Class Proportions
Multi-Objective Interpolation Training for Robustness to Label Noise
⭐code
Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations
⭐code
Simpler Certified Radius Maximization by Propagating Covariances
😮oral⭐code📺video
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
⭐code
Discovering Hidden Physics Behind Transport Dynamics
😮oral
Soft-IntroVAE: Analyzing and Improving the Introspective Variational Autoencoder
😮oral⭐code🏠project
Deep Gradient Projection Networks for Pan-sharpening
⭐code
Consensus Maximisation Using Influences of Monotone Boolean Functions
😮oral⭐code

Forecasting Irreversible Disease via Progression Learning
Causal Hidden Markov Model for Time Series Disease Forecasting
⭐code🏠project

Knowledge Evolution in Neural Networks
😮oral⭐code

RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words
⭐code
RSTNet: 基于可区分视觉词和非视觉词的自适应注意力机制的图像描述生成模型
解读：14
Removing the Background by Adding the Background: Towards a Background Robust Self-supervised Video Representation Learning
通过添加背景来去除背景影响：背景鲁棒的自监督视频表征学习
解读：11
Representative Batch Normalization with Feature Calibration
😮oral⭐code🏠project
作者主页
基于特征校准的表征批规范化方法解读：4
Involution: Inverting the Inherence of Convolution for Visual Recognition
⭐code
解读：CVPR'21 | involution：超越convolution和self-attention的神经网络新算子
Spatially Consistent Representation Learning
⭐code
Limitations of Post-Hoc Feature Alignment for Robustness
⭐code
AutoDO: Robust AutoAugment for Biased Data with Label Noise via Scalable Probabilistic Implicit Differentiation
⭐code
Augmentation Strategies for Learning with Noisy Labels
⭐code
PGT: A Progressive Method for Training Models on Long Videos
😮oral⭐code
Generic Perceptual Loss for Modeling Structured Output Dependencies
Masksembles for Uncertainty Estimation
⭐code🏠project
Student-Teacher Learning from Clean Inputs to Noisy Inputs
Scene-Intuitive Agent for Remote Embodied Visual Grounding
Meta-Mining Discriminative Samples for Kinship Verification
Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
⭐code📺video
论文公开
Diverse Branch Block: Building a Convolution as an Inception-like Unit
⭐code
OTCE: A Transferability Metric for Cross-Domain Cross-Task Representations
Disentangled Cycle Consistency for Highly-realistic Virtual Try-On
⭐code
Stylized Neural Painting
⭐code🏠project📺video
风格化的神经绘画,Stylized Neural Painting,提出 image-to-painting 翻译方法，生成生动逼真、风格可控的绘画艺术作品
Confluent Vessel Trees with Accurate Bifurcations
⭐code
Repopulating Street Scenes
Can We Characterize Tasks Without Labels or Features?
⭐code
Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding
Online Learning of a Probabilistic and Adaptive Scene Representation
Generative Modelling of BRDF Textures from Flash Images
⭐code🏠project
PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting
🏠project
作者发明的逆向渲染算法PhySG，可以从一组RGB输入图像中重建物体几何图形、材质和光照，全程端到端运行。
Self-supervised Video Representation Learning by Context and Motion Decoupling
Dynamic Region-Aware Convolution
粗解：14
Meta Pseudo Labels
⭐code📺video
PQA: Perceptual Question Answering
CondenseNet V2: Sparse Feature Reactivation for Deep Networks
⭐code
CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching
⭐code
Neural Camera Simulators
⭐code
Simpler Certified Radius Maximization by Propagating Covariances
😮oral⭐code📺video
Lighting, Reflectance and Geometry Estimation from 360∘ Panoramic Stereo
⭐code
MetricOpt: Learning to Optimize Black-Box Evaluation Metrics
😮oral
Deep Stable Learning for Out-Of-Distribution
分享会
Learning a Self-Expressive Network for Subspace Clustering
分享会
Heterogeneous Grid Convolution for Adaptive, Efficient, and Controllable Computation
Extreme Rotation Estimation using Dense Correlation Volumes
🏠project
Decoupled Dynamic Filter Networks
🏠project📺video
MongeNet: Efficient Sampler for Geometric Deep Learning
⭐code🏠project📺video
Multi-Perspective LSTM for Joint Visual Representation Learning
⭐code
Quantum Permutation Synchronization
A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts
DriveGAN: Towards a Controllable High-Quality Neural Simulation
😮oral
Faster Meta Update Strategy for Noise-Robust Deep Learning
⭐code
NeRD: Neural 3D Reflection Symmetry Detector
⭐code
SSAN: Separable Self-Attention Network for Video Representation Learning
Scene-aware Generative Network for Human Motion Synthesis
Stochastic Whitening Batch Normalization
CLCC: Contrastive Learning for Color Constancy
⭐code
Magic Layouts: Structural Prior for Component Detection in User Interface Designs
GIRAFFE: Representing Scenes As Compositional Generative Neural Feature Fields
😮oral⭐code🏠project
Polygonal Building Extraction by Frame Field Learning
⭐code
MP3: A Unified Model To Map, Perceive, Predict and Plan
NewtonianVAE: Proportional Control and Goal Identification From Pixels via Physical Latent Spaces
Fast End-to-End Learning on Protein Surfaces
Flow Guided Transformable Bottleneck Networks for Motion Retargeting
Polka Lines: Learning Structured Illumination and Reconstruction for Active Stereo
Patch2Pix: Epipolar-Guided Pixel-Level Correspondences
⭐code📺video
Pixel-Aligned Volumetric Avatars
Learnable Motion Coherence for Correspondence Pruning
⭐code🏠project
DualGraph: A Graph-Based Method for Reasoning About Label Noise
Automatic Correction of Internal Units in Generative Neural Networks
Adaptive Rank Estimate in Robust Principal Component Analysis
Cluster-Wise Hierarchical Generative Model for Deep Amortized Clustering
3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding
Ranking Neural Checkpoints
On Focal Loss for Class-Posterior Probability Estimation: A Theoretical Perspective
Learning Deep Latent Variable Models by Short-Run MCMC Inference With Optimal Transport Correction
Learning the Best Pooling Strategy for Visual Semantic Embedding
⭐code🏠project
Backdoor Attacks Against Deep Learning Systems in the Physical World
Relevance-CAM: Your Model Already Knows Where To Look
⭐code
On Robustness and Transferability of Convolutional Neural Networks
Square Root Bundle Adjustment for Large-Scale Reconstruction
🏠project📺video
Crossing Cuts Polygonal Puzzles: Models and Solvers
Sparse Multi-Path Corrections in Fringe Projection Profilometry
Understanding the Behaviour of Contrastive Loss
Dual Contradistinctive Generative Autoencoder
⭐code
Metadata Normalization
⭐code
End-to-End Rotation Averaging With Multi-Source Propagation
⭐code
UV-Net: Learning From Boundary Representations
Mixed-Privacy Forgetting in Deep Networks
Double Low-Rank Representation With Projection Distance Penalty for Clustering
Lighting, Reflectance and Geometry Estimation From 360deg Panoramic Stereo
Building Reliable Explanations of Unreliable Neural Networks: Locally Smoothing Perspective of Model Interpretation
DAT: Training Deep Networks Robust To Label-Noise by Matching the Feature Distributions
⭐code
End-to-End High Dynamic Range Camera Pipeline Optimization
Dual-GAN: Joint BVP and Noise Modeling for Remote Physiological Measurement
User-Guided Line Art Flat Filling With Split Filling Mechanism
KSM: Fast Multiple Task Adaption via Kernel-Wise Soft Mask Learning
Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization
Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression
⭐code
Group Whitening: Balancing Learning Efficiency and Representational Capacity
⭐code
Privacy-Preserving Collaborative Learning With Automatic Transformation Search
😮oral
Post-Hoc Uncertainty Calibration for Domain Drift Scenarios
⭐code
Efficient Initial Pose-Graph Generation for Global SfM
⭐code
Spk2ImgNet: Learning To Reconstruct Dynamic Scene From Continuous Spike Stream
A Dual Iterative Refinement Method for Non-Rigid Shape Matching
⭐code
Improving Accuracy of Binary Neural Networks Using Unbalanced Activation Distribution
Rotation-Only Bundle Adjustment
⭐code
HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features
⭐code
Cross-Iteration Batch Normalization
⭐code
Multimodal Contrastive Training for Visual Representation Learning
Spatially-Varying Outdoor Lighting Estimation From Intrinsics
Personalized Outfit Recommendation With Learnable Anchors
Architectural Adversarial Robustness: The Case for Deep Pursuit
SetVAE: Learning Hierarchical Composition for Generative Modeling of Set-Structured Data
⭐code
Truly shift-invariant convolutional neural networks
⭐code
Scalable Differential Privacy With Sparse Network Finetuning
OpenMix: Reviving Known Knowledge for Discovering Novel Visual Categories in an Open World
Event-Based Bispectral Photometry Using Temporally Modulated Illumination
Towards Extremely Compact RNNs for Video Recognition With Fully Decomposed Hierarchical Tucker Structure
Enriching ImageNet With Human Similarity Judgments and Psychological Embeddings
A Quasiconvex Formulation for Radial Cameras
BRepNet: A Topological Message Passing System for Solid Models
😮oral
Exploiting & Refining Depth Distributions With Triangulation Light Curtains
🏠project📺video
Multispectral Photometric Stereo for Spatially-Varying Spectral Reflectances: A Well Posed Problem?
⭐code
SOON: Scenario Oriented Object Navigation With Graph-Based Exploration
Mesoscopic Photogrammetry With an Unstabilized Phone Camera
⭐code
Convolutional Hough Matching Networks
😮oral⭐code🏠project
Learned Initializations for Optimizing Coordinate-Based Neural Representations
🏠project📺video
Patchwise Generative ConvNet: Training Energy-Based Models From a Single Natural Image for Internal Learning
LQF: Linear Quadratic Fine-Tuning
Positive-Congruent Training: Towards Regression-Free Model Updates
Shape from Sky: Polarimetric Normal Recovery Under The Sky
Orthogonal Over-Parameterized Training
😮oral
Optimal Gradient Checkpoint Search for Arbitrary Computation Graphs
😮oral⭐code
T-vMF Similarity for Regularizing Intra-Class Feature Distribution
⭐code
Defending Multimodal Fusion Models Against Single-Source Adversaries
Rotation Coordinate Descent for Fast Globally Optimal Rotation Averaging
😮oral⭐code
Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations
How Does Topology Influence Gradient Propagation and Model Performance of Deep Networks With DenseNet-Type Skip Connections?
⭐code
Deep Stable Learning for Out-of-Distribution Generalization
TrafficSim: Learning To Simulate Realistic Multi-Agent Behaviors
Sign-Agnostic Implicit Learning of Surface Self-Similarities for Shape Modeling and Reconstruction From Raw Point Clouds
Effective Sparsification of Neural Networks With Global Sparsity Constraint
Hyperdimensional computing as a framework for systematic aggregation of image descriptors
🏠project
Time Adaptive Recurrent Neural Network
⭐code
4D Hyperspectral Photoacoustic Data Restoration with Reliability Analysis
Neighborhood Normalization for Robust Geometric Feature Learning
⭐code
Neural Surface Maps
⭐code🏠project📺video
Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods
NormalFusion: Real-Time Acquisition of Surface Normals for High-Resolution RGB-D Scanning
Bilinear Parameterization for Non-Separable Singular Value Penalties
On the Difficulty of Membership Inference Attacks
⭐code
ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks
⭐code
Multi-Label Learning From Single Positive Labels
CompositeTasking: Understanding Images by Spatial Composition of Tasks
⭐code
Searching for Fast Model Families on Datacenter Accelerators
⭐code
Understanding and Simplifying Perceptual Distances
Bayesian Nested Neural Networks for Uncertainty Calibration and Adaptive Compression
⭐code
An Alternative Probabilistic Interpretation of the Huber Loss
Scale-Localized Abstract Reasoning
⭐code🌻dataset
Inferring CAD Modeling Sequences Using Zone Graphs
Partially View-Aligned Representation Learning With Noise-Robust Contrastive Loss
Blocks-World Cameras
The Affective Growth of Computer Vision
Polarimetric Normal Stereo
Uncalibrated Neural Inverse Rendering for Photometric Stereo of General Surfaces
RSG: A Simple but Effective Module for Learning Imbalanced Datasets
⭐code
Fast Sinkhorn Filters: Using Matrix Scaling for Non-Rigid Shape Correspondence With Functional Maps
⭐code
MetaSets: Meta-Learning on Point Sets for Generalizable Representations
⭐code
Isometric Multi-Shape Matching
Efficient Deformable Shape Correspondence via Multiscale Spectral Manifold Wavelets Preservation
TearingNet: Point Cloud Autoencoder To Learn Topology-Friendly Representations
Boosting Ensemble Accuracy by Revisiting Ensemble Diversity Metrics
Convolutional Dynamic Alignment Networks for Interpretable Classifications
😮oral⭐code
EDNet: Efficient Disparity Estimation With Cost Volume Combination and Attention-Based Spatial Residual
How Robust are Randomized Smoothing based Defenses to Data Poisoning?
⭐code
Generative Interventions for Causal Learning
Learning to Identify Correct 2D-2D Line Correspondences on Sphere
Domain-Independent Dominance of Adaptive Methods
⭐code
Combinatorial Learning of Graph Edit Distance via Dynamic Embedding
IMODAL: Creating Learnable User-Defined Deformation Models
⭐code
Robust Bayesian Neural Networks by Spectral Expectation Bound Regularization
⭐code
Neural Cellular Automata Manifold
MultiLink: Multi-Class Structure Recovery via Agglomerative Clustering and Model Selection
⭐code
A Sliced Wasserstein Loss for Neural Texture Synthesis
A Second-Order Approach to Learning with Instance-Dependent Label Noise
😮oral⭐code
Hilbert Sinkhorn Divergence for Optimal Transport
The Multi-Temporal Urban Development SpaceNet Dataset
Inverse Simulation: Reconstructing Dynamic Geometry of Clothed Humans via Optimal Control
Learning Decision Trees Recurrently Through Communication
⭐code
Learning the Predictability of the Future
⭐code🏠project
RaScaNet: Learning Tiny Models by Raster-Scanning Images
Joint Negative and Positive Learning for Noisy Labels
The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network Architectures
⭐code
Understanding Failures of Deep Networks via Robust Feature Extraction
⭐code
Gradient-based Algorithms for Machine Teaching
Geo-FARM: Geodesic Factor Regression Model for Misaligned Pre-Shape Responses in Statistical Shape Analysis
A Functional Approach to Rotation Equivariant Non-Linearities for Tensor Field Networks
Real-Time Sphere Sweeping Stereo From Multiview Fisheye Images
Taskology: Utilizing Task Relations at Scale
Soteria: Provable Defense against Privacy Leakage in Federated Learning from Representation Perspective
⭐code
Spatial Assembly Networks for Image Representation Learning
SKFAC: Training Neural Networks With Faster Kronecker-Factored Approximate Curvature
⭐code
Student-Teacher Learning from Clean Inputs to Noisy Inputs
Adversarial Invariant Learning
S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-Bit Neural Networks via Guided Distribution Calibration
⭐code
MaxUp: Lightweight Adversarial Training With Data Augmentation Improves Neural Network Training
Explaining Classifiers using Adversarial Perturbations on the Perceptual Ball
Visual Grounding
- Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding
  ⭐code
- Refer-It-in-RGBD: A Bottom-Up Approach for 3D Visual Grounding in RGBD Images
语义匹配
- Probabilistic Model Distillation for Semantic Correspondence
  ⭐code
- PatchMatch-Based Neighborhood Consensus for Semantic Correspondence
  ⭐code
- Discovering Relationships Between Object Categories via Universal Canonical Maps
梯度压缩
- Communication Efficient SGD via Gradient Sampling With Bayes Prior
自动生成漫画
- Generating Manga From Illustrations via Mimicking Manga Creation Workflow
联合学习
- EffiScene: Efficient Per-Pixel Rigidity Inference for Unsupervised Joint Learning of Optical Flow, Depth, Camera Pose and Motion Segmentation
DL
- DeepLM: Large-Scale Nonlinear Least Squares on Deep Learning Frameworks Using Stochastic Domain Decomposition
  ⭐code
姿势估计（非人体）
- Globally Optimal Relative Pose Estimation With Gravity Prior
全家福
- Inception Convolution With Efficient Dilation Search
  图像识别、人体姿态估计、目标检测、实例分割
视觉推理
- Transformation Driven Visual Reasoning
  ⭐code🏠project
mesh saliency
- Mesh Saliency: An Independent Perceptual Measure or a Derivative of Image Saliency?
  ⭐code
3D场景交互
- Populating 3D Scenes by Learning Human-Scene Interaction
  ⭐code🏠project📺video
Stereo Matching(立体匹配)
- AdaStereo: A Simple and Efficient Approach for Adaptive Stereo Matching
- HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching
  ⭐code
- Bilateral Grid Learning for Stereo Matching Networks
  ⭐code
图像到视频合成
- Understanding Object Dynamics for Interactive Image-to-Video Synthesis
  ⭐code🏠project
Audio-Visual Navigation(视听导航)
- Semantic Audio-Visual Navigation
  ⭐code🏠project📺video
字体生成
- DG-Font: Deformable Generative Networks for Unsupervised Font Generation
  ⭐code
多任务学习
- Deep Multi-Task Learning for Joint Localization, Perception, and Prediction
视觉导航
- Visual Navigation With Spatial Attention
图像匹配
- Co-Attention for Conditioned Image Matching
  ⭐code🏠project
texture recognition(纹理识别)
- Deep Texture Recognition via Exploiting Cross-Layer Statistical Self-Similarity
Hyperspectral Image Reconstruction(高光谱图像重建)
- Learning Tensor Low-Rank Prior for Hyperspectral Image Reconstruction
Visual Odometry(视觉里程计)
- Spatiotemporal Registration for Event-Based Visual Odometry
  🌻dataset
image registration(图像配准)
- Learning-Based Image Registration With Meta-Regularization
semantic part completion(语义场景补全)
- Towards Part-Based Understanding of RGB-D Scans
行人和车辆相互作用
- Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers
情感计算
- A Circular-Structured Representation for Visual Emotion Distribution Learning
估计密集的图像与图像之间的对应关系和相关的信度估计
- Learning Accurate Dense Correspondences and When To Trust Them
  😮oral⭐code🏠project📺video
用Deep-Red Flash看黑暗中的物体
- Seeing in Extra Darkness Using a Deep-Red Flash

Files

README.md

Latest commit

History

README.md

File metadata and controls

CVPR2021最新信息及已接收论文/代码(持续更新)

❗❗❗🌟🌟🌟 CVPR 2021 收录论文已全部公布，下载可在【我爱计算机视觉】后台回复“CVPR2021”，即可收到。共计 1660 篇。

❗❗❗🌟🌟🌟 全部论文已粗略分类完毕，请查阅。

❗❗❗注：后续论文的细致分类汇总将发布在公众号【OpenCV中文网】，敬请关注。

历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~

2022 年论文分类汇总戳这里

2021年论文分类汇总戳这里

2020 年论文分类汇总戳这里

目录

74.Place Recognition(位置识别)

73.Object Re-identification(物体重识别)

72.Gaze Estimation(视线估计)

71.Image-to-Image Translation(图像到图像翻译)

70.NLP(自然语言处理)

69.Transfer learning(迁移学习)

68.Crowd Counting(计数)

67.Defect Detection(缺陷检测)

66.Optical Flow Estimation(光流估计)

65.Style Transfer(风格迁移)

64.Speech processing(语音处理)

63.Image Processing(图像处理)

62.Free-Hand Sketches(手绘草图识别)

61.算法

60. SLAM/AR/机器人

59.Capsule Network(胶囊网络)(深度学习模型)

58.Metric Learning(度量学习/相似度学习)

57.Sign Language Recognition(手语识别)

56.Computational Photography(光学、几何、光场成像、计算摄影)

55.Graph Matching(图匹配)

54.Emotion Perception(情绪感知/情感预测)

53.Dataset(数据集)

52. Image Generation/Synthesis(图像生成)

51.Contrastive Learning(对比学习)

50.OCR

49.Adversarial Learning(对抗学习)

48.Image Representation(图像表示)

47.Vision-Language(视觉语言)

46.Human-Object Interaction(人物交互)

45.Camera Localization(相机定位)

44. Image/video Captioning(图像/视频字幕)

43.Active Learning(主动学习)

42.Scene Flow Estimation(场景流估计)

41. Representation Learning(表示学习（图像+字幕）)

40.Superpixel (超像素)

39.Debiasing(去偏见)

38.Class-Incremental learning(类增量学习)

37. Continual Learning(持续学习)

36.Action Detection and Recognition(动作检测与识别)

35.Image Clustering(图像聚类)

34.Image Classification(图像分类)

33.6D Pose Estimation(6D位姿估计)

32.View Synthesis(视图合成)

31.Open-Set Recognition(开放集识别)

30.Neural rendering(神经渲染)

29.Human Pose Estimation(人体姿态估计)

28.Dense prediction(密集预测)

27.Semantic Line Detection(语义线检测)

26.Video Processing(视频相关技术)

25.3D(三维视觉)

24.Reinforcement Learning(强化学习)

23.Autonomous Driving(自动驾驶)

22.Medical Imaging(医学影像)

21.Transformer

20.Person Re-Identification(人员重识别)

19.Quantization/Pruning/Knowledge Distillation/Model Compression(量化、剪枝、蒸馏、模型压缩/扩展与优化)

18.Aerial/Drones/Satellite/RS Image(航空影像/无人机)

17.Super-Resolution(超分辨率)

16.Visual Question Answering(视觉问答)

15.GAN

14.Few-Shot/Zero-Shot Learning,Domain Generalization/Adaptation(小/零样本学习，域适应，域泛化)

13.Image/Video Retrieval(图像/视频检索)

12.Image Quality Assessment(图像质量评估)

11. Face(人脸技术)

10.Neural Architecture Search(神经架构搜索)