CVPR2021最新信息及已接收论文/代码(持续更新)

本贴是对 CVPR2021 已接受论文的粗略汇总，后期会有更详细的总结。期待ing......

官网链接：http://cvpr2021.thecvf.com
开会时间：2021年6月19日-6月25日
论文接收公布时间：2021年2月28日

接收论文IDs：

CVPR 2021 接收论文列表！27%接受率！

🎆🎆🎆更新提示：3月16日新增20篇(2人脸+1姿态+1三维+1分割+1跟踪+1点云+1航拍+2图像增强+1知识蒸馏+1人物交互+1视频预测+1GAN+1GNN+4未分)

人脸
- 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction
  ⭐code🏠project
- Cross-Domain Similarity Learning for Face Recognition in Unseen Domains
姿态
- Learning Compositional Representation for 4D Captures with Neural ODE
三维
- Beyond Image to Depth: Improving Depth Prediction using Echoes
  ⭐code🏠project
- Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos
  😮oral⭐code🏠project📺video
分割
- Monte Carlo Scene Search for 3D Scene Understanding
跟踪
- Learning a Proposal Classifier for Multiple Object Tracking
  ⭐code
点云
- Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding
航拍
- ReDet: A Rotation-equivariant Detector for Aerial Object Detection
  ⭐code
图像增强
- Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images
- Semi-Supervised Video Deraining with Dynamic Rain Generator
知识蒸馏
- Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation
  ⭐code
人物交互
- Detecting Human-Object Interaction via Fabricated Compositional Learning
  ⭐code
视频预测
- Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
  🏠project📺video
GAN
- DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
  ⭐code
GNN
- Binary Graph Neural Networks
未分

🎆🎆🎆更新提示：3月15日新增20篇（1人脸+3分割+13D+1成像+1医学+1量化+1动作识别+1持续学习+1视频生成+1Reid+1VQA+2GAN+1NAS+16D+3未分）

人脸
- CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement
分割
- Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
  ⭐code
- [Rethinking BiSeNet For Real-time Semantic Segmentation]
  ⭐code
- Modular Interactive Video Object Segmentation:Interaction-to-Mask, Propagation and Difference-Aware Fusion
  😮oral⭐code🏠project📺video
三维
- PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss
成像
- Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging
  ⭐code🏠project
医学
- DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets
  ⭐code
量化
- Learnable Companding Quantization for Accurate Low-bit Neural Networks
动作识别
- ACTION-Net: Multipath Excitation for Action Recognition
  ⭐code
  新出论文
持续学习
- Training Networks in Null Space for Continual Learning
  😮oral⭐code
图像视频生成
- Playable Video Generation
  😮oral⭐code🏠project📺video
Reid
- Intra-Inter Camera Similarity for Unsupervised Person Re-Identification
  ⭐code
VQA
- Counterfactual VQA: A Cause-Effect Look at Language Bias
  ⭐code
GAN
- HumanGAN: A Generative Model of Humans Images
- HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms
  ⭐code
NAS
- Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator
  ⭐code
6D
- FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
  😮oral⭐code
  新出论文
未分
- Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
  ⭐code
- CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching
  ⭐code
- Augmentation Strategies for Learning with Noisy Labels
  ⭐code

🐱	🐶	🐭	🐹	🐯
❌	❌	❌	❌	Workshop征稿
60.视频预测	59.光学、几何、光场成像	58.图匹配	57.情感预测	56.数据集
55.视频相关技术	54.图像/视频生成	53.对比学习	52.OCR	51.对抗学习
50.图像表示	49.视觉语言导航VLN	48.人物交互HOI	47.相机定位	46.图像/视频字幕
45.主动学习	44.动作预测	43.表示学习（图像+字幕）	42.超像素	41.视频语言学习
40.模型偏见消除	39.类增量学习	38.持续学习	37.视频插帧	36.动作检测与识别
35.图像聚类	34.图像/细粒度分类	33.6D位姿估计	32.视图合成	31. 开放集识别
30.新视角合成	29.姿态估计	28.密集预测	27.活体检测	26.视频编解码
25.三维视觉	24.强化学习	23.自动驾驶	22.医学影像	21.Transformer
20.人员重识别/人群计数	19.量化、剪枝、蒸馏、模型压缩与优化	18.航空影像	17.超分辨率	16.视觉问答
15.GAN	14.小/零样本学习，域适应，域泛化	13.图像检索	12.图像增广	11.人脸技术
10.神经架构搜索	9.目标跟踪	8.图像分割	7.目标检测	6.数据增强
5.异常检测	4.自/半/弱监督学习	3.点云	2.图卷积网络GNN	1.未分类

60.视频预测

Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
🏠project📺video

59.光学、几何、光场成像

Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging
⭐code🏠project

58.图匹配

Deep Graph Matching under Quadratic Constraint

57.情感预测

Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality
🏠project

56.数据集

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
🌻dataset

55.视频相关技术

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples
⭐code

54.图像/视频生成Image Generation

Spatially-Adaptive Pixelwise Networks for Fast Image Translation
🏠project
采用超网络和隐式函数，极快的图像到图像翻译速度（比基线快18倍）
Image Generators with Conditionally-Independent Pixel Synthesis
😮oral⭐code

Im2Vec: Synthesizing Vector Graphics without Vector Supervision
😮oral⭐code🏠project
视频生成
- Playable Video Generation
  😮oral⭐code🏠project📺video

53.对比学习

AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries
⭐code
解读:CVPR 2021接收论文：AdCo基于对抗的对比学习

52.OCR

场景文本检测
- What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels
- Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
  😮oral⭐code

51.对抗学习

Simulating Unknown Target Models for Query-Efficient Black-box Attacks
⭐code
黑盒对抗攻击
Delving into Data: Effectively Substitute Training for Black-box Attack
基于高效训练替代模型的黑盒攻击方法
解读：8

50.图像表示Image Representation

Learning Continuous Image Representation with Local Implicit Image Function
😮oral⭐code🏠project📺video

49.视觉语言导航vision-language navigation

Structured Scene Memory for Vision-Language Navigation

48.人物交互（human-object interaction）

Learning Asynchronous and Sparse Human-Object Interaction in Videos
QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information
⭐code
Reformulating HOI Detection as Adaptive Set Prediction

Detecting Human-Object Interaction via Fabricated Compositional Learning
⭐code

47.相机定位(Camera Localization)

Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments
😮oral

46.图像/视频字幕

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
⭐code🏠project📺video
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
视频字幕、视频问答和视频对话任务的多模式框架
Open-book Video Captioning with Retrieve-Copy-Generate Network

45.主动学习

Vab-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning

44.动作预测

Learning the Predictability of the Future
预测未来
⭐code🏠project📺video

43.表示学习（图像+字幕）

VirTex: Learning Visual Representations from Textual Annotations
⭐code

42.超像素

Learning the Superpixel in a Non-iterative and Lifelong Manner

41.视频语言学习（video-and-language learning）

Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling
😮oral⭐code

40.模型偏见消除

Fair Attribute Classification through Latent Space De-biasing
⭐code🏠project

39.类增量学习（class-incremental learning）

IIRC: Incremental Implicitly-Refined Classification
🏠project
Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning

38.持续学习

Rainbow Memory: Continual Learning with a Memory of Diverse Samples
Training Networks in Null Space for Continual Learning
😮oral⭐code

37.视频插帧

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
⭐code🏠project

36.动作检测与识别

Coarse-Fine Networks for Temporal Activity Detection in Videos
3D CNNs with Adaptive Temporal Feature Resolutions
Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack
BASAR:Black-box Attack on Skeletal Action Recognition
📺video
TDN: Temporal Difference Networks for Efficient Action Recognition
⭐code
ACTION-Net: Multipath Excitation for Action Recognition
⭐code

时序动作定位
- Modeling Multi-Label Action Dependencies for Temporal Action Localization
  😮oral
  提出基于注意力的网络架构来学习视频中的动作依赖性，用于解决多标签时间动作定位任务。
- Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
  基于显著边界特征学习的无锚框时序动作定位
  解读：10

35.图像聚类

Improving Unsupervised Image Clustering With Robust Learning
⭐code
利用鲁棒学习改进无监督图像聚类技术

34.图像分类

PML: Progressive Margin Loss for Long-tailed Age Classification
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
⭐code
Fine-grained Angular Contrastive Learning with Coarse Labels
😮oral
使用自监督进行 Coarse Labels（粗标签）的细粒度分类方面的工作。粗标签与细粒度标签相比，更容易和更便宜，因为细粒度标签通常需要域专家。
Graph-based High-Order Relation Discovery for Fine-grained Recognition
基于特征间高阶关系挖掘的细粒度识别方法
解读：20

33.6D位姿估计

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
⭐code
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation
⭐code
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
😮oral⭐code

32.视图合成

ID-Unet: Iterative Soft and Hard Deformation for View Synthesis
NeX: Real-time View Synthesis with Neural Basis Expansion
😮oral🏠project📺video
利用神经基础扩展的实时视图合成技术

31.开放集识别

Counterfactual Zero-Shot and Open-Set Visual Recognition
⭐code
Few-shot Open-set Recognition by Transformation Consistency

30.新视角合成

DeRF: Decomposed Radiance Fields
🏠project
D-NeRF: Neural Radiance Fields for Dynamic Scenes
🏠project

29.姿态估计

PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers
📺video
通过消除 location-dependent 透视效果来改进3D人体姿势估计技术工作。
CanonPose: Self-supervised Monocular 3D Human Pose Estimation in the Wild
Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration
⭐code
Monocular Real-time Full Body Capture with Inter-part Correlations
📺video
Behavior-Driven Synthesis of Human Dynamics
⭐code🏠project
Context Modeling in 3D Human Pose Estimation: A Unified Perspective
Learning Compositional Representation for 4D Captures with Neural ODE

28.密集预测

Densely connected multidilated convolutional networks for dense prediction tasks
提出的D3Net在语义分割&音乐源分离任务上的表现优于SOTA网络
Dense Contrastive Learning for Self-Supervised Visual Pre-Training
⭐code

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning
⭐code

27.活体检测

Cross Modal Focal Loss for RGBD Face Anti-Spoofing

26.视频编解码

MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing
⭐code

25.三维视觉

A Deep Emulator for Secondary Motion of 3D Characters
Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction
😮oral🏠project📺video
Deep Implicit Templates for 3D Shape Representation
😮oral⭐code🏠project📺video
CVPR 2021 Oral，清华学者提出Deep Implicit Templates，极大扩展DIF能力
SMPLicit: Topology-aware Generative Model for Clothed People
🏠project

深度估计

24.强化学习

Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph
Unsupervised Learning for Robust Fitting:A Reinforcement Learning Approach

23.自动驾驶

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition
⭐code
ECCV 2020 Facebook Mapillary Visual Place Recognition Challenge 冠军方案

22.医学影像

3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management
用纯多模态 CT 影像可替代目前 JHMI 的需要做肿瘤化学检测和 DNA 测序+医学影像的综合多模态诊断流程，从诊断准确度上有可比较性，定量诊断精度更优
Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies
肿瘤影像里面智能 PACS 辅助医生读片的重要功能
Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-constrained Optimization
基于CT 影像的骨折/骨质疏松系统
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
⭐code
多机构合作，利用联合学习改进基于深度学习的磁共振图像重建技术
DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images
😮oral⭐code
DeepTag: 一种无监督的深度学习方法，用于心脏标记磁共振图像的运动跟踪
Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles

医学图像分割
- FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
- DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets
  ⭐code

21.Transformer

Transformer Interpretability Beyond Attention Visualization
⭐code
MIST: Multiple Instance Spatial Transformer Network
试图从热图中进行可微的top-K选择(MIST)（目前在自然图像上也有了一些结果；) 用它可以在没有任何定位监督的情况下进行检测和分类（并不是它唯一能做的事情!）

动作识别检测
- 3D Vision Transformers for Action Recognition
  用于动作识别的3D视觉Transformer
目标检测
- UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
  😮oral⭐code
图像处理
- Pre-Trained Image Processing Transformer
人机交互
- End-to-End Human Object Interaction Detection with HOI Transformer
  ⭐code
图像分割
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
  ⭐code🏠project
  基于Transformers从序列到序列的角度重新思考语义分割
  解读：16
  解读：Transformer 在语义分割中的应用，曾位ADE20K 榜首（44.42% mIoU）
- VisTR: End-to-End Video Instance Segmentation with Transformers
  😮oral⭐code

20.人员重识别

Meta Batch-Instance Normalization for Generalizable Person Re-Identification
Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification
Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification
⭐code
[Self-supervised 3D Reconstruction and Re-Projection for Texture Insensitive Person Re-identification]
基于自监督三维重建和重投影的纹理不敏感行人重识别
解读：12
Intra-Inter Camera Similarity for Unsupervised Person Re-Identification
⭐code

拥挤人群计数
- Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

19.量化、剪枝、蒸馏、模型压缩/扩展与优化

Learning Student Networks in the Wild
ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
⭐code
RepVGG: Making VGG-style ConvNets Great Again
⭐code
Coordinate Attention for Efficient Mobile Network Design
⭐code

剪枝
- Manifold Regularized Dynamic Network Pruning
模型扩展
- Fast and Accurate Model Scaling
  ⭐code
量化
- Learnable Companding Quantization for Accurate Low-bit Neural Networks
知识蒸馏
- Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation
  ⭐code

18.航空影像

Dogfight: Detecting Drones from Drone Videos

航空影像分割
- PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
  ⭐code
航空影像检测
- ReDet: A Rotation-equivariant Detector for Aerial Object Detection
  ⭐code

17.超分辨率

Data-Free Knowledge Distillation For Image Super-Resolution
AdderSR: Towards Energy Efficient Image Super-Resolution
⭐code
Cross-MPI: Cross-scale Stereo for Image Super-Resolution using Multiplane Images
🏠project📺video
CVPR 2021，Cross-MPI以底层场景结构为线索的端到端网络，在大分辨率（x8）差距下也可完成高保真的超分辨率
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
⭐code

Robust Reference-based Super-Resolution via C²-Matching
GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution
😮oral🏠project
BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond
⭐code🏠project
[Temporal Modulation Network for Controllable Space-Time Video Super-Resolution]
作者主页
基于时空特征可控插值的视频超分辨率网络
解读：18

16.视觉问答

Weakly-supervised Grounded Visual Question Answering using Capsules

Counterfactual VQA: A Cause-Effect Look at Language Bias
⭐code

15.GAN

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
⭐code🏠project
Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs
Image-to-image Translation via Hierarchical Style Disentanglement
⭐code
Efficient Conditional GAN Transfer with Knowledge Propagation across Classes
⭐code
Anycost GANs for Interactive Image Synthesis and Editing
⭐code🏠project📺video
Anycost GAN，可适应广泛的硬件和延迟要求，以及实现交互式图像编辑
TediGAN: Text-Guided Diverse Image Generation and Manipulation
⭐code🏠project📺video
Generative Hierarchical Features from Synthesizing Images
😮oral⭐code🏠project
作者称预训练 GAN 生成器可以当作是一种学习的多尺度损失。用它进行训练可以带来高度竞争的层次化和分离的视觉特征，称之为生成层次化特征（GH-Feat）。并进一步表明，GH-Feat不仅有利于生成性任务，更重要的是有利于分辨性任务，包括人脸验证、关键点检测、layout prediction、迁移学习、style mixing、图像编辑等。
Teachers Do More Than Teach: Compressing Image-to-Image Models
PISE: Person Image Synthesis and Editing with Decoupled GAN
⭐code
LOHO: Latent Optimization of Hairstyles via Orthogonalization
Image-to-image Translation via Hierarchical Style Disentanglement
😮oral⭐code
在图像到图像翻译上实现层次风格解耦
CoMoGAN: continuous model-guided image-to-image translation
😮oral⭐code
HumanGAN: A Generative Model of Humans Images
HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms
⭐code
DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
⭐code

14.小/零样本学习，域适应，域泛化

小样本学习
- Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning
- Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning
- [Learning Dynamic Alignment via Meta-filter for Few-shot Learning]
  作者主页
  通过元卷积核实现基于动态对齐的小样本学习
  解读：17
域泛化
- FSDR: Frequency Space Domain Randomization for Domain Generalization
  受 JPEG 将空间图像转换为多个频率分量(FCs)的启发，提出频率空间域随机化(FSDR)，通过保留域变量FCs(DIFs)和只随机化域变量FCs(DVFs)来随机化频率空间的图像。
- Domain Generalization via Inference-time Label-Preserving Target Projections
  😮 Oral
零样本学习
- Goal-Oriented Gaze Estimation for Zero-Shot Learning⭐code

13.图像检索

Probabilistic Embeddings for Cross-Modal Retrieval
QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval

12.图像增强

图像恢复
- Multi-Stage Progressive Image Restoration
  ⭐code
去阴影
- Auto-Exposure Fusion for Single-Image Shadow Removal
  ⭐code
去模糊
- DeFMO: Deblurring and Shape Recovery of Fast Moving Objects
  ⭐code📺video
- ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring
去反射
- Robust Reflection Removal with Reflection-free Flash-only Cues
  ⭐ccode
去雾
- Learning to Restore Hazy Video: A New Real-World Dataset and A New Method
  学习复原有雾视频：一种新的真实数据集及算法
  解读：9
- Contrastive Learning for Compact Single Image Dehazing
  基于对比学习的紧凑图像去雾方法
  解读：5
去噪
- Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images
去雨
- Semi-Supervised Video Deraining with Dynamic Rain Generator
曝光校正
- Learning Multi-Scale Photo Exposure Correction
  ⭐code

11. 人脸技术

人脸识别
- A 3D GAN for Improved Large-pose Facial Recognition
- When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
  ⭐github
- MagFace: A Universal Representation for Face Recognition and Quality Assessment
  😮oral⭐code
  人脸识别+质量，今年的Oral presentation。代码待整理
- WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition
  🏠project
- ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
  😮oral🏠project📺video
- Spherical Confidence Learning for Face Recognition
  😮oral
  基于超球流形置信度学习的人脸识别
- Consistent Instance False Positive Improves Fairness in Face Recognition
  基于实例误报一致性的人脸识别公平性提升方法
  解读：7
- CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement
- Cross-Domain Similarity Learning for Face Recognition in Unseen Domains
Deepfake检测
- Multi-attentional Deepfake Detection
人脸质量评估
- SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance
  基于相似度分布距离的无监督人脸质量评估
  解读：6
3D人脸重建
- Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection
  😮oral
  在开放的人像集合中学习3D人脸的聚合与特异化重建
- 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction
  ⭐code🏠project

10.神经架构搜索

AttentiveNAS: Improving Neural Architecture Search via Attentive
HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens
ReNAS: Relativistic Evaluation of Neural Architecture Search
OPANAS: One-Shot Path Aggregation Network Architecture Search for Object
Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search
北京大学人工智能研究院机器学习研究中心
Contrastive Neural Architecture Search with Neural Architecture Comparators
⭐code
Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator
⭐code

9.目标跟踪

Rotation Equivariant Siamese Networks for Tracking
Graph Attention Tracking
⭐code

多目标跟踪
- Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
- Track to Detect and Segment: An Online Multi-Object Tracker
  🏠project📺video
- Multiple Object Tracking with Correlation Learning
- Learning a Proposal Classifier for Multiple Object Tracking
  ⭐code

8.图像分割

Information-Theoretic Segmentation by Inpainting Error Maximization
Simultaneously Localize, Segment and Rank the Camouflaged Objects
⭐code
Capturing Omni-Range Context for Omnidirectional Segmentation
⭐code

全景分割
- 4D Panoptic LiDAR Segmentation
- Cross-View Regularization for Domain Adaptive Panoptic Segmentation
  😮oral
  用于域自适应全景分割的跨视图正则化方法
- Part-aware Panoptic Segmentation
- Toward Joint Thing-and-Stuff Mining for Weakly Supervised Panoptic Segmentation
  联合物体和物质挖掘的弱监督全景分割
  解读：15
语义分割
- PLOP: Learning without Forgetting for Continual Semantic Segmentation
  ⭐code
- Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
  ⭐dataset📺video
- Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation
- Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
- Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
  😮oral⭐code
- Learning Statistical Texture for Semantic Segmentation
- MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation
  ⭐code
  语义分割中的无监督域适应的域感知元损失校正
- Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations
- Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
  ⭐code
- [Rethinking BiSeNet For Real-time Semantic Segmentation]
  ⭐code
场景理解/场景解析
- Exploring Data Efficient 3D Scene Understanding with Contrastive Scene Contexts
  😮oral🏠project📺video
- Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis
  🏠project
  利用面向边缘的推理进行基于3D点的场景图分析---场景理解
- Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation
  场景图生成---场景解析
- Monte Carlo Scene Search for 3D Scene Understanding
抠图
- Real-Time High Resolution Background Matting
  😮oral⭐code🏠project📺video
  最新开源抠图技术，实时快速高分辨率，4k(30fps)、现代GPU（60fps）
  解读：单块GPU实现4K分辨率每秒30帧，华盛顿大学实时视频抠图再升级，毛发细节到位
   最新开源抠图技术，实时快速高分辨率，4k(30fps)、现代GPU（60fps）
视频动作分割
- Global2Local: Efficient Structure Search for Video Action Segmentation
  从全局到局部：面向视频动作分割的高效网络结构搜索
  解读：19
时序动作分割
- Temporal Action Segmentation from Timestamp Supervision
  ⭐code
雷达分割
- Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
  😮oral⭐code
  在 SemanticKITTI 榜单排名第一（until CVPR DDL），在 nuScenes 中获得 SOTA，并对其他基于激光雷达的任务保持了良好的泛化能力，包括激光雷达全景分割和激光雷达三维检测，其中就基于此工作，在 SemanticKITTI 全景分割榜单也排名第一。
视频目标分割
- Modular Interactive Video Object Segmentation:Interaction-to-Mask, Propagation and Difference-Aware Fusion
  😮oral⭐code🏠project📺video

7.目标检测

Multiple Instance Active Learning for Object Detection
⭐code
Positive-Unlabeled Data Purification in the Wild for Object Detection
Depth from Camera Motion and Object Detection
⭐github📺video
Towards Open World Object Detection
😮oral⭐code
General Instance Distillation for Object Detection
Distilling Object Detectors via Decoupled Features
MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection
Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection
😮oral
You Only Look One-level Feature (YOLOF)
⭐code
不需要 FPN 的有效目标检测器
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
⭐code

小样本目标检测
- Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection
  首个研究少样本检测任务的语义关系推理，并证明它可提升强基线的潜。
- Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection
  北京大学人工智能研究院机器学习研究中心
- FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding
  ⭐code
多目标检测
- There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
  🏠project
3D目标检测
- Categorical Depth Distribution Network for Monocular 3D Object Detection
  😮oral
- 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection
  ⭐code🏠project📺video
  更多：CVPR 2021|利用IoU预测进行半监督式3D目标检测
- ST3D: Self-training for Unsupervised Domain Adaptation on 3D ObjectDetection
  ⭐code
- Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection
旋转目标检测
- Dense Label Encoding for Boundary Discontinuity Free Rotation Detection
  ⭐code
目标定位
- Unveiling the Potential of Structure-Preserving for Weakly Supervised Object Localization
  基于结构信息保持的弱监督目标定位
  解读：13
密集目标检测
- Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
  ⭐code
  解读：目标检测无痛涨点之 Generalized Focal Loss V2

6.数据增广

KeepAugment: A Simple Information-Preserving Data Augmentation

5.异常检测

Multiresolution Knowledge Distillation for Anomaly Detection

4.自/半/弱监督学习

弱监督
- Weakly Supervised Learning of Rigid 3D Scene Flow
  ⭐code🏠project
半监督
- Adaptive Consistency Regularization for Semi-Supervised Transfer Learning
  ⭐code
自监督
- Self-supervised Geometric Perception
  😮oral⭐code
  作者称 SGP 是第一个在几何感知中进行特征学习的通用框架，不需要任何来自 ground-truth 几何标签的监督。SGP以EM方式运行，它迭代执行几何模型的鲁棒估计以生成伪标签，并在噪声伪标签的监督下进行特征学习。将 SGP 应用于相机姿势估计和点云配准，并证明在大规模真实数据集中，SGP 的性能等同于甚至优于监督的权威。

3.点云

Diffusion Probabilistic Models for 3D Point Cloud Generation
😮oral⭐code
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion
MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization
😮oral⭐code
TPCN: Temporal Point Cloud Networks for Motion Forecasting
用于运动预测的时空点云网络
PointGuard: Provably Robust 3D Point Cloud Classification
How Privacy-Preserving are Line Clouds? Recovering Scene Details from 3D Lines
⭐code

点云配准
点云补全
- Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding

2.图卷积网络GNN

Sequential Graph Convolutional Network for Active Learning
Quantifying Explainers of Graph Neural Networks in Computational Pathology
Binary Graph Neural Networks

1.未分类

Inverting the Inherence of Convolution for Visual Recognition
Representative Batch Normalization with Feature Calibration
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pretraining
Reconsidering Representation Alignment for Multi-view Clustering
Self-supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map
Instance Localization for Self-supervised Detection Pretraining
⭐code
Model-Contrastive Federated Learning
提出模型对比学习来解决联合学习中的非IID数据问题
Neural Geometric Level of Detail:Real-time Rendering with Implicit 3D Surfaces
😮Oral⭐code🏠project
Data-Free Model Extraction
⭐code
Single-Stage Instance Shadow Detection with Bidirectional Relation Learning
😮oral
⭐code
Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning
😮oral
PatchmatchNet: Learned Multi-View Patchmatch Stereo
😮oral⭐code
[Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning]
[Semantic Palette: Guiding Scene Generation with Class Proportions]
Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors
😮oral
POSEFusion:Pose-guided Selective Fusion for Single-view Human Volumetric Capture
😮oral
Multi-Objective Interpolation Training for Robustness to Label Noise
⭐code
Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations
⭐code
Simpler Certified Radius Maximization by Propagating Covariances
😮oral📺video
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
Discovering Hidden Physics Behind Transport Dynamics
😮oral
Soft-IntroVAE: Analyzing and Improving the Introspective Variational Autoencoder
😮oral⭐code🏠project
Deep Gradient Projection Networks for Pan-sharpening
⭐code
Consensus Maximisation Using Influences of Monotone Boolean Functions
😮oral

Forecasting Irreversible Disease via Progression Learning
Causal Hidden Markov Model for Time Series Disease Forecasting
Towards Unified Surgical Skill Assessment

Knowledge Evolution in Neural Networks
😮oral⭐code

RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words
RSTNet: 基于可区分视觉词和非视觉词的自适应注意力机制的图像描述生成模型
解读：14
Removing the Background by Adding the Background: Towards a Background Robust Self-supervised Video Representation Learning
通过添加背景来去除背景影响：背景鲁棒的自监督视频表征学习
解读：11
Representative Batch Normalization with Feature Calibration
😮oral
作者主页
基于特征校准的表征批规范化方法解读：4
Learning Compositional Representation for 4D Captures with Neural ODE
Involution: Inverting the Inherence of Convolution for Visual Recognition
⭐code
Spatially Consistent Representation Learning
Limitations of Post-Hoc Feature Alignment for Robustness
AutoDO: Robust AutoAugment for Biased Data with Label Noise via Scalable Probabilistic Implicit Differentiation
⭐code
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
⭐code
CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching
⭐code
Augmentation Strategies for Learning with Noisy Labels
⭐code
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
⭐code
CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching
⭐code
Augmentation Strategies for Learning with Noisy Labels
⭐code

Workshop 征稿ing

Visual Perception for Navigation in Human Environments
第二届人类环境导航视觉感知征稿 ⚠️4月15截止
UG 2 + Challenge
旨在通过应用图像恢复和增强算法提高分析性能，推动对 "difficult"图像的分析。参与者任务是开发新的算法，以改进对在问题条件下拍摄的图像分析。
👑10K美元奖金
- 低能见度环境下的目标检测
  - 雾霾条件下的(半)监督目标检测
  - (半)低光条件下的人脸检测
- 黑暗视频中的动作识别
  - 黑暗中进行完全监督动作识别
  - 黑暗中进行半监督动作识别
Continual Learning in Computer Vision 征稿中
旨在聚集学术界和工业界的研究人员和工程师，讨论持续学习的最新进展。
- Best paper award: 500 USD + 500 USD worth of Huawei cloud credits (HUAWEI)
- Overall Challenge winner: 1,000 USD + 500 USD worth of Huawei cloud credits (HUAWEI)
- Supervised-Learning track winner: 500 USD (HUAWEI)
- Reinforcement-Learning track winner: 500 USD (ServiceNow)
第四届UG2研讨会和竞赛：弥合计算成像与视觉识别之间的鸿沟
10万美元奖金！CVPR 2021 重磅赛事，安全AI挑战者计划
- CVPR 2021大赛，安全AI 之防御模型的「白盒对抗攻击」解析
- 还在刷榜ImageNet？找出模型的脆弱之处更有价值！
Responsible Computer Vision
⚠️3月25日截止
本次研讨会将广泛讨论计算机视觉背景下负责任的人工智能的三个主要方面：公平性；可解释性和透明度；以及隐私。
Holistic Video Understanding
目的是建立一个整合所有语义概念联合识别的视频基准，因为每个任务的单一类标签往往不足以描述视频的整体内容。
ThreeDWorld Transport Challenge
⚠️6月1截止
📺video
FGVC 8
第八届细粒度视觉分类研讨会（FGVC8）将通过细粒度视觉理解的视角，探讨细粒度学习、自监督学习、半监督学习、matching(匹配)、localization(定位)、域适应、迁移学习、小样本学习、机器教学、多模态学习（如音频和视频）、众包和分类学预测等相关话题。
⚠️论文截稿日期为4月2日
征稿主题包含以下几个方面
- Fine-grained categorization细粒度分类
  - Novel datasets and data collection strategies for fine-grained categorization用于细粒度分类的新型数据集和数据收集策略
  - Appropriate error metrics for fine-grained categorization细粒度分类的适当误差指标
  - Low/few shot learning少/小样本学习
  - Self-supervised learning自监督学习
  - Semi-supervised learning半监督学习
  - Transfer-learning from known to novel subcategories
  - Attribute and part based approaches
  - Taxonomic predictions
  - Addressing long-tailed distributions
- Human-in-the-loop
  - Fine-grained categorization with humans in the loop
  - Embedding human experts’ knowledge into computational models
  - Machine teaching
  - Interpretable fine-grained models
- Multi-modal learning
  - Using audio and video data
  - Using geographical priors
  - Learning shape
- Fine-grained applications
  - Product recognition
  - Animal biometrics and camera traps
  - Museum collections
  - Agricultural
  - Medical
  - Fashion
- 相关挑战赛如下（部分已在Kaggle网站开始）
  - GeoLifeCLEF2021
    利用观测结果与航空图像和环境特征配对，预测物种的存在
  - Semi-iNat2021
    由iNaturalist的数据组成的半监督细粒度图像分类
  - iNatChallenge2021
    对1万类动植物进行图像分类挑战赛
  - iMet2021
    对艺术品进行细粒度属性分类
  - iMat-Fashion2021未开始
    服装实例分割和细粒度属性分类
  - Hotel-ID 2021
    从图像中识别酒店房间
  - HerbariumChallenge2021
    从数据集中识别标本，该数据集包含来自美洲、大洋洲和太平洋地区的近66,000种 vascular plant species（维管束植物）的 2.5M 图像
  - iWildCam2021
    对图像序列中每个物种的动物数量计数
  - PlantPathologyChallenge2021未开始
    对病害植物的图像进行分类

Files

README.md

Latest commit

History