CVPR-2023-Papers

❣❣❣ CVPR 2023 论文分类整理已完成

📢📢📢获奖论文

🏆Best Paper

Planning-oriented Autonomous Driving
🏠project
Visual Programming: Compositional visual reasoning without training

🏆Best student Paper

3D Registration with Maximal Cliques

🏆Honorable Mention

DynIBaR: Neural Dynamic Image-Based Rendering
🏠project

🏆Honorable Mention(Student)

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
🏠project

历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~

2024 年论文分类汇总戳这里

↘️WACV-2024-Papers

2023 年论文分类汇总戳这里

↘️CVPR-2023-Papers ↘️WACV-2023-Papers ↘️ICCV-2023-Papers

2022 年论文分类汇总戳这里

2021 年论文分类汇总戳这里

2020 年论文分类汇总戳这里

🐱	🐶	🐯	🐺
1.其它	2.Image Segmentation(图像分割)	3.Image Progress(图像处理)	4.Image Captioning(图像字幕)
5.Object Detection(目标检测)	6.Object Tracking(目标跟踪)	7.Point Cloud(点云)	8.Action Detection(人体动作检测与识别)
9.Human Pose Estimation(人体姿态估计)	10.3D(三维视觉)	11.Face	12.Image-to-Image Translation(图像到图像翻译)
13.GAN	14.Video	15.Transformer	16.Semi/self-supervised learning(半/自监督)
17.Medical Image(医学影像)	18.Person Re-Identification(人员重识别)	19.Neural Architecture Search(神经架构搜索)	20.Autonomous vehicles(自动驾驶)
21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)	22.Image Synthesis/Generation(图像合成)	23.Image Retrieval(图像检索)	24.Super-Resolution(超分辨率)
25.Fine-Grained/Image Classification(细粒度/图像分类)	26.GCN/GNN	27.Pose Estimation(物体姿势估计)	28.Style Transfer(风格迁移)
29.Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)	30.Visual Answer Questions(视觉问答)	31.Vision-Language(视觉语言)	32.Data Augmentation(数据增强)
33.Human-Object Interaction(人物交互)	34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)	35.OCR	36.Optical Flow(光流估计)
37.Contrastive Learning(对比学习)	38.Meta-Learning(元学习)	39.Continual Learning(持续学习)	40.Adversarial Learning(对抗学习)
41.Incremental Learning(增量学习)	42.Metric Learning(度量学习)	43.Multi-Task Learning(多任务学习)	44.Federated Learning(联邦学习)
45.Dense Prediction(密集预测)	46.Scene Graph Generation(场景图生成)	47.Few/Zero-Shot Learning/DG/Adaptation(小/零样本/域泛化/适应)	48.NLP(自然语言处理)
49.Image Geo-localization(图像地理定位)	50.Anomaly Detection(异常检测)	51.光学、几何、光场成像	52.Human Motion Forecasting(人体运动预测)
53.Sign Language Translation(手语翻译)	54.Benchmark/Dataset(基准/数据集)	55.Novel View Synthesis(视图合成)	56.Sound
57.Gaze Estimation(视线估计)	58.Neural rendering(神经渲染)	59.Image\Video Compression(图像视频压缩)	60.Industrial Anomaly Detection(工业缺陷检测)
61.Object Re-identification(物体重识别)	62.Object Counting(物体计数)	63.edge detection(边缘检测)	64.Motion Retargeting(动作重定向)
65.Scene flow estimation(场景流估计)	66.Clustering(聚类)	67.Active Learning(主动学习)	68.Lifelong Learning(终身学习)
69.Reinforcement learning(强化学习)	70.Image Forgery Detection	71.visual reasoning(视觉推理)	72.open-set recognition(开集识别)
73.Neural Radiance Fields(神经辐射场)	74.Machine Learning(机器学习)	75.Semantic Scene Completion(语义场景补全)	76.IP protection(知识产权保护)
77.sketch(草图)	78.Image/Video Editing(图像/视频编辑)	79.thermal imaging technology(热敏成像技术)	80.计算机图形学

80.计算机图形学

Learning Anchor Transformations for 3D Garment Animation
⭐code
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
⭐code
CloSET: Modeling Clothed Humans on Continuous Surface with Explicit Template Decomposition
🏠project
FLEX: Full-Body Grasping Without Full-Body Grasps
🏠project

79.thermal imaging technology(热敏成像技术)

What Happened 3 Seconds Ago? Inferring the Past with Thermal Imaging
⭐code

78.Image/Video Editing(图像/视频编辑)

PREIM3D: 3D Consistent Precise Image Attribute Editing from a Single Image
🏠project
文本驱动的视频编辑
- Shape-aware Text-driven Layered Video Editing
  🏠project
Image Editing(图像编辑)

77.sketch(草图)

Photo Pre-Training, but for Sketch
⭐code
Restoration of Hand-Drawn Architectural Drawings Using Latent Space Mapping With Degradation Generator
SECAD-Net: Self-Supervised CAD Reconstruction by Learning Sketch-Extrude Operations
⭐code

76.IP protection(知识产权保护)

Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection
Effective Ambiguity Attack Against Passport-Based DNN Intellectual Property Protection Schemes Through Fully Connected Layer Substitution

75.Semantic Scene Completion(语义场景补全)

Semantic Scene Completion With Cleaner Self
VoxFormer: Sparse Voxel Transformer for Camera-Based 3D Semantic Scene Completion
⭐code

74.Machine Learning(机器学习)

Cooperation or Competition: Avoiding Player Domination for Multi-Target Robustness via Adaptive Budgets
Multi-Agent Automated Machine Learning
Towards Better Decision Forests: Forest Alternating Optimization
ERM-KTP: Knowledge-Level Machine Unlearning via Knowledge Transfer
⭐code
A Whac-a-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
⭐code
新类别发现
- Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery
  ⭐code
- Modeling Inter-Class and Intra-Class Constraints in Novel Class Discovery
迁移学习

73.Neural Radiance Fields(神经辐射场)

Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization
Discriminating Known From Unknown Objects via Structure-Enhanced Recurrent Variational AutoEncoder
Occlusion-Free Scene Recovery via Neural Radiance Fields
Grid-guided Neural Radiance Fields for Large Urban Scenes
🏠project
NeRFLight: Fast and Light Neural Radiance Fields using a Shared Feature Grid
GazeNeRF: 3D-Aware Gaze Redirection With Neural Radiance Fields
⭐code
SPARF: Neural Radiance Fields from Sparse and Noisy Poses
⭐code
Masked Wavelet Representation for Compact Neural Radiance Fields
⭐code
MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
⭐code
AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware Training
🏠project
JacobiNeRF: NeRF Shaping With Mutual Information Gradients
Robust Dynamic Radiance Fields
🏠project
Exact-NeRF: An Exploration of a Precise Volumetric Parameterization for Neural Radiance Fields
PaletteNeRF: Palette-Based Appearance Editing of Neural Radiance Fields
EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points
🏠project
SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene
🏠project
ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision
⭐code
Flow supervision for Deformable NeRF
Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields
🏠project
EventNeRF: Neural Radiance Fields From a Single Colour Event Camera
🏠project
SeaThru-NeRF: Neural Radiance Fields in Scattering Media
SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory
Complementary Intrinsics From Neural Radiance Fields and CNNs for Outdoor Scene Relighting
Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields
Removing Objects From Neural Radiance Fields
Grid-guided Neural Radiance Fields for Large Urban Scenes
⭐code
GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images
HandNeRF: Neural Radiance Fields for Animatable Interacting Hands
NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
⭐code
JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields
🏠project
Multi-Space Neural Radiance Fields
⭐code
DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields
⭐code
StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields
🏠project
Temporal Interpolation Is All You Need for Dynamic Neural Radiance Fields
🏠project
SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting With Neural Radiance Fields
F2-NeRF: Fast Neural Radiance Field Training With Free Camera Trajectories
🏠project
Clothed Human Performance Capture with a Double-layer Neural Radiance Fields
DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models
去模糊
- BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields
  ⭐code
- DP-NeRF: Deblurred Neural Radiance Field With Physical Scene Priors
  ⭐code
  🏠project

72.open-set recognition(开集识别)

Glocal Energy-based Learning for Few-Shot Open-Set Recognition

71.visual reasoning(视觉推理)

Visual Programming: Compositional visual reasoning without training
🏆Best Paper
Abstract Visual Reasoning: An Algebraic Approach for Solving Raven's Progressive Matrices
⭐code
Super-CLEVR: A Virtual Benchmark To Diagnose Domain Robustness in Visual Reasoning
⭐code
Unicode Analogies: An Anti-Objectivist Visual Reasoning Challenge

70.Image Forgery Detection

Hierarchical Fine-Grained Image Forgery Detection and Localization
⭐code
Detecting and Grounding Multi-Modal Media Manipulation
⭐code
⭐code虚假信息检测
Evading DeepFake Detectors via Adversarial Statistical Consistency
Edge-Aware Regional Message Passing Controller for Image Forgery Localization
TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization
🏠project
Towards Universal Fake Image Detectors That Generalize Across Generative Models
Deepfake Detection
- Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization
  ⭐code
- Dynamic Graph Learning With Content-Guided Spatial-Frequency Relation Reasoning for Deepfake Detection

69.Reinforcement learning(强化学习)

PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav
Local-Guided Global: Paired Similarity Representation for Visual Reinforcement Learning
Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning
⭐code
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-per-Second
⭐code
Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning
🏠project

68.Lifelong Learning(终身学习)

Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning
⭐code

67.Active Learning(主动学习)

Re-thinking Federated Active Learning based on Inter-class Diversity
Box-Level Active Detection
⭐code
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning
⭐code
Re-Thinking Federated Active Learning Based on Inter-Class Diversity

66.Clustering(聚类)

DivClust: Controlling Diversity in Deep Clustering
MVC

65.Scene flow estimation(场景流估计)

Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision
⭐code
Self-Supervised 3D Scene Flow Estimation Guided by Superpoints
Unsupervised Cumulative Domain Adaptation for Foggy Scene Optical Flow

64.Motion Retargeting(动作重定向)

Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry
⭐code

63.edge detection(边缘检测)

edge detection
- The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector
  ⭐code

62.Object Counting(物体计数)

Zero-shot Object Counting
⭐code
Indiscernible Object Counting in Underwater Scenes
⭐code

61.Object Re-identification(物体重识别)

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
⭐code
Large-scale Training Data Search for Object Re-identification
⭐code
Adaptive Sparse Pairwise Loss for Object Re-Identification
⭐code

60.Industrial Anomaly Detection(工业缺陷检测)

缺陷定位
- PyramidFlow: High-Resolution Defect Contrastive Localization using Pyramid Normalizing Flow
工业异常检测
- Multimodal Industrial Anomaly Detection via Hybrid Fusion
  ⭐code
- OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization
异常分割
- Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection
  ⭐code
  👍CVPR 2023 冠军解决方案，零样本异常分割新突破！

59.Image\Video Compression(图像视频压缩)

Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
Context-Based Trit-Plane Coding for Progressive Image Compression
⭐code
Learned Image Compression with Mixed Transformer-CNN Architectures
⭐code
LVQAC: Lattice Vector Quantization Coupled with Spatially Adaptive Companding for Efficient Learned Image Compression
Optimization-Inspired Cross-Attention Transformer for Compressive Sensing
⭐code
Multi-Realism Image Compression With a Conditional Generator
AccelIR: Task-aware Image Compression for Accelerating Neural Restoration
视频压缩
矢量量化
- NVTC: Nonlinear Vector Transform Coding
  ⭐code

58.Neural rendering(神经渲染)

TMO: Textured Mesh Acquisition of Objects With a Mobile Device by Using Differentiable Rendering
Tensor4D: Efficient Neural 4D Decomposition for High-Fidelity Dynamic Reconstruction and Rendering
Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur
🏠project
NeUDF: Leaning Neural Unsigned Distance Fields With Volume Rendering
DiffRF: Rendering-Guided 3D Radiance Field Diffusion
🏠project
Unsupervised Continual Semantic Adaptation Through Neural Rendering
Neural Fields Meet Explicit Geometric Representations for Inverse Rendering of Urban Scenes
🏠project
UV Volumes for Real-Time Rendering of Editable Free-View Human Performance
🏠project
Inverse Rendering of Translucent Objects Using Physical and Neural Renderers
ORCa: Glossy Objects As Radiance-Field Cameras
🏠project
MAIR: Multi-View Attention Inverse Rendering With 3D Spatially-Varying Lighting Estimation
🏠project
FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views
🏠project
Learning To Render Novel Views From Wide-Baseline Stereo Pairs
🏠project
NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer
🏠project
FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization
🏠project
Local Implicit Ray Function for Generalizable Radiance Field Representation
⭐code
FitMe: Deep Photorealistic 3D Morphable Model Avatars
⭐code
Pointersect: Neural Rendering with Cloud-Ray Intersection
Inverse Rendering of Translucent Objects using Physical and Neural Renderers
Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention
⭐code
ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field
WildLight: In-the-wild Inverse Rendering with a Flashlight
⭐code
FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views
⭐code
NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field Indirect Illumination
MonoHuman: Animatable Human Neural Field from Monocular Video
⭐code
Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos
⭐code
PlenVDB: Memory Efficient VDB-Based Radiance Fields for Fast Training and Rendering
在 iPhone12 手机上达到了对于输出 1280x720 分辨率的画面每秒 30 帧的速率。
NeFII: Inverse Rendering for Reflectance Decomposition With Near-Field Indirect Illumination

57.Gaze Estimation(视线估计)

NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation
Source-free Adaptive Gaze Estimation by Uncertainty Reduction
⭐code
ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection

56.Sound + Vision(声音与视觉)

Conditional Generation of Audio from Video via Foley Analogies
⭐code
Vision Transformers Are Parameter-Efficient Audio-Visual Learners
扬声器检测
- A Light Weight Model for Active Speaker Detection
  ⭐code
视听语音识别
视听定位
- Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
  ⭐code
- Audio-Visual Grouping Network for Sound Localization from Mixtures
  ⭐code
音频源分离
- Language-Guided Audio-Visual Source Separation via Trimodal Consistency
- iQuery: Instruments As Queries for Audio-Visual Sound Separation
声音合成
- Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
  ⭐code
- ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration
电影音频描述
- AutoAD: Movie Description in Context
  🏠project
从声音中生成场景图像
- Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
视听异常检测
- Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
  ⭐code
电影配音
- Learning To Dub Movies via Hierarchical Prosody Models
舞蹈生成
- EDGE: Editable Dance Generation From Music
  🏠project
- Music-Driven Group Choreography
视频显著性预测
- CASP-Net: Rethinking Video Saliency Prediction From an Audio-Visual Consistency Perceptual Perspective
音频驱动的肖像动画
- DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation
听觉定位
- Egocentric Auditory Attention Localization in Conversations

55.Novel View Synthesis(视图合成)

Neural Pixel Composition for 3D-4D View Synthesis From Multi-Views
Consistent View Synthesis With Pose-Guided Diffusion Models
MixNeRF: Modeling a Ray with Mixture Density for Novel View Synthesis from Sparse Inputs
🏠project
NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis
🏠project
NeRDi: Single-View NeRF Synthesis With Language-Guided Diffusion As General Image Priors
Novel-View Acoustic Synthesis
🏠project
Cross-Guided Optimization of Radiance Fields With Multi-View Image Super-Resolution for High-Resolution Novel View Synthesis
Frequency-Modulated Point Cloud Rendering with Easy Editing
⭐code
Learning Neural Duplex Radiance Fields for Real-Time View Synthesis
🏠project
ReLight My NeRF: A Dataset for Novel View Synthesis and Relighting of Real World Objects
⭐code
Balanced Spherical Grid for Egocentric View Synthesis
Progressively Optimized Local Radiance Fields for Robust View Synthesis
⭐code
F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories
⭐code
Enhanced Stable View Synthesis
Consistent View Synthesis with Pose-Guided Diffusion Models
⭐code
Learning to Render Novel Views from Wide-Baseline Stereo Pairs
⭐code
Painting 3D Nature in 2D: View Synthesis of Natural Scenes From a Single Semantic Mask
🏠project
NoPe-NeRF: Optimising Neural Radiance Field With No Pose Prior
🏠project
Multiscale Tensor Decomposition and Rendering Equation Encoding for View Synthesis
⭐code
Efficient View Synthesis and 3D-Based Multi-Frame Denoising With Multiplane Feature Representations
NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds
DINER: Depth-aware Image-based NEural Radiance fields
🏠project
RefSR-NeRF: Towards High Fidelity and Super Resolution View Synthesis
⭐code
VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization
⭐code
DynIBaR: Neural Dynamic Image-Based Rendering
🏠project
🏆Honorable Mention
Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories

54.Benchmark/Dataset(基准/数据集)

Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset
A New Dataset Based on Images Taken by Blind People for Testing the Robustness of Image Classification Models Trained for ImageNet Categories
Benchmarking Self-Supervised Learning on Diverse Pathology Datasets
Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline
Towards Artistic Image Aesthetics Assessment: A Large-Scale Dataset and a New Method
ScaleDet: A Scalable Multi-Dataset Object Detector
JRDB-Pose: A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking
🌻dataset
Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning
DF-Platter: Multi-Face Heterogeneous Deepfake Dataset
🌻dataset
HandsOff: Labeled Dataset Generation With No Additional Human Annotations
🌻dataset
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis
⭐code
ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations
🌻dataset
NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation
⭐code
MISC210K: A Large-Scale Dataset for Multi-Instance Semantic Correspondence
⭐code
StarCraftImage: A Dataset for Prototyping Spatial Reasoning Methods for Multi-Agent Environments
🏠project
Habitat-Matterport 3D Semantics Dataset
CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset
⭐code
大规模公共中文视频文本数据集
FLAG3D: A 3D Fitness Activity Dataset With Language Instruction
🏠project
Multi-Label Compound Expression Recognition: C-EXPR Database & Network
ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation
🏠project
手物体操作的数据集
xFBD: Focused Building Damage Dataset and Analysis
建筑物损坏数据集
Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo
🌻dataset
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling
🌻dataset
CUDA: Convolution-based Unlearnable Datasets
🌻dataset
MVImgNet: A Large-scale Dataset of Multi-view Images
🌻dataset
V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception
🌻dataset
Vehicle-to-Vehicle(V2V)感知
Polynomial Implicit Neural Representations For Large Diverse Datasets
🌻dataset
MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset
🌻dataset
RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset
🌻dataset
Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts
⭐code
ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data
⭐code
CelebV-Text: A Large-Scale Facial Text-Video Dataset
⭐code
人脸文本到视频生成
Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method
⭐code
艺术图像美学评估
CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions
🏠project
攀爬动作数据集
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection
⭐code
🏠project公共短视频镜头边界检测数据集
V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting
⭐code
WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models
⭐code用于极端天气条件下的物体检测和天气分类任务的合成数据集
CLOTH4D: A Dataset for Clothed Human Reconstruction
🌻dataset
用于穿衣服人体重建的数据集
OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images
🌻dataset
从多层次和多视图图像中获取全能城市理解的新数据集。
RealImpact: A Dataset of Impact Sound Fields for Real Objects
⭐code
BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion
🏠project
GFIE:A Dataset and Baseline for Gaze-Following From 2D to 3D in Indoor Environments
🏠project
Benchmark(基准)
- A Soma Segmentation Benchmark in Full Adult Fly Brain
  ⭐code
- A New Comprehensive Benchmark for Semi-Supervised Video Anomaly Detection and Anticipation
- A Large-Scale Homography Benchmark
- Toward RAW Object Detection: A New Benchmark and a New Model
- MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding
- Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild
  🏠project
- Advancing Visual Grounding With Scene Knowledge: Benchmark and Method
  ⭐code
- The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects
  ⭐code
- Meta Omnium: A Benchmark for General-Purpose Learning-to-Learn
  ⭐code
- A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
  ⭐code
- GeoNet: Benchmarking Unsupervised Adaptation across Geographies
  ⭐code
- PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout
  ⭐code
- Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
- ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
  🏠project
- Image Similarity
  - GeneCIS: A Benchmark for General Conditional Image Similarity
    🏠project
- ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
  ⭐code
- Ultra-High Resolution Segmentation with Ultra-Rich Context: A Novel Benchmark
  ⭐code
- NewsNet: A Novel Benchmark for Hierarchical Temporal Segmentation
  ⭐code
- Ultra-High Resolution Segmentation With Ultra-Rich Context: A Novel Benchmark
  ⭐code
- PosterLayout: A New Benchmark and Approach for Content-Aware Visual-Textual Presentation Layout
  ⭐code
- Meta Omnium: A Benchmark for General-Purpose Learning-To-Learn
  ⭐code
- RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
  🏠project

53.Sign Language (手语)

Ham2Pose: Animating Sign Language Notation Into Pose Sequences
🏠project
手语翻译
- Gloss Attention for Gloss-Free Sign Language Translation
  ⭐code
手语识别
手语检索
- CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
  ⭐code

52.Human Motion(人体运动)

Semi-Weakly Supervised Object Kinematic Motion Prediction
The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction
MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion
人体运动预测
人体运动合成
3D HM
- Generating Holistic 3D Human Motion from Speech
  🏠project

51.Computed Imaging(计算成像，如光学、几何、光场成像等)

Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography
⭐code
TRACE: 5D Temporal Regression of Avatars With Dynamic Cameras in 3D Environments
⭐code
High-Fidelity Event-Radiance Recovery via Transient Event Frequency
⭐code
Real-Time Neural Light Field on Mobile Devices
🏠project
Accidental Light Probes
🏠project
DyLiN: Making Light Field Networks Dynamic
⭐code
Learning Rotation-Equivariant Features for Visual Correspondence
🏠project
Role of Transients in Two-Bounce Non-Line-of-Sight Imaging
Revisiting Rolling Shutter Bundle Adjustment: Toward Accurate and Fast Solution
相机姿势估计
快门校正
- EvShutter: Transforming Events for Unconstrained Rolling Shutter Correction
  ⭐code
相机校准
- Perspective Fields for Single Image Camera Calibration
  🏠project
几何估计
- Adaptive Annealing for Robust Geometric Estimation
相机定位 *NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization
⭐code

50.Anomaly Detection(异常检测)

Revisiting Reverse Distillation for Anomaly Detection
SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection
Prototypical Residual Networks for Anomaly Detection and Localization
OpenMix: Exploring Outlier Samples for Misclassification Detection
⭐code
Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection
⭐code
Diversity-Measurable Anomaly Detection
SimpleNet: A Simple Network for Image Anomaly Detection and Localization
⭐code
DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection
WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
OOD

49.Image Geo-localization(图像地理位置识别)

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

48.NLP(自然语言处理)

Images Speak in Images: A Generalist Painter for In-Context Visual Learning
⭐code
CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes From Natural Language
反讽检测(检测文本（或图像，如漫画等其他模态）中是否存在讽刺)
- DIP: Dual Incongruity Perceiving Network for Sarcasm Detection
  ⭐code
NLQ
- NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory
  ⭐code
Visual Grounding(视觉指代)
- Language Adaptive Weight Generation for Multi-Task Visual Grounding
Referring Expression Comprehension(指代表达理解)
- RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
  🏠project

47.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/域适应)

DG
DA
ZSL
FSL

46.Scene Graph Generation(场景图生成)

Unbiased Scene Graph Generation in Videos
Prototype-Based Embedding Network for Scene Graph Generation
IS-GGT: Iterative Scene Graph Generation With Generative Transformers
Prototype-based Embedding Network for Scene Graph Generation
⭐code
Devil's on the Edges: Selective Quad Attention for Scene Graph Generation
🏠project
Learning To Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space
Panoptic Video Scene Graph Generation
Fast Contextual Scene Graph Generation With Unbiased Context Augmentation

45.Dense Prediction(密集预测)

Ensemble-Based Blackbox Attacks on Dense Prediction
⭐code
DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
Ensemble-based Blackbox Attacks on Dense Prediction
⭐code
Probabilistic Prompt Learning for Dense Prediction
1% VS 100%: Parameter-Efficient Low Rank Adapter for Dense Predictions
DPF: Learning Dense Prediction Fields With Weak Supervision
⭐code
密集检测
- One-to-Few Label Assignment for End-to-End Dense Detection
  ⭐code
密集目标定位
- Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization
  ⭐code

44.Federated Learning(联邦学习)

Confidence-Aware Personalized Federated Learning via Variational Expectation Maximization
Federated Learning With Data-Agnostic Distribution Fusion
How To Prevent the Poor Performance Clients for Personalized Federated Learning?
GradMA: A Gradient-Memory-Based Accelerated Federated Learning With Alleviated Catastrophic Forgetting
Bias-Eliminating Augmentation Learning for Debiased Federated Learning
Make Landscape Flatter in Differentially Private Federated Learning
The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning
Rethinking Federated Learning With Domain Shift: A Prototype View
⭐code
On the Effectiveness of Partial Variance Reduction in Federated Learning With Heterogeneous Data
Elastic Aggregation for Federated Optimization
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning
Adaptive Channel Sparsity for Federated Learning Under System Heterogeneity
ScaleFL: Resource-Adaptive Federated Learning With Heterogeneous Clients
Reliable and Interpretable Personalized Federated Learning

43.Multi-Task Learning(多任务学习)

Independent Component Alignment for Multi-Task Learning
Dynamic Neural Network for Multi-Task Learning Searching Across Diverse Network Topologies
AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task Learning
⭐code
Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
🏠project
Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing With Non-Learnable Primitives
⭐code
Hierarchical Prompt Learning for Multi-Task Learning

42.Metric Learning(度量学习)

Advancing Deep Metric Learning Through Multiple Batch Norms And Multi-Targeted Adversarial Examples
Deep Factorized Metric Learning
⭐code
Deep Semi-Supervised Metric Learning With Mixed Label Propagation
Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning

41.Incremental Learning(增量学习)

Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning
⭐code
AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning
⭐code
GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task
类增量学习

40.Adversarial Learning(对抗学习)

Adversarial Robustness via Random Projection Filters
Seasoning Model Soups for Robustness to Adversarial and Natural Distribution Shifts
Dynamic Generative Targeted Attacks With Pattern Injection
FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits
Enhancing the Self-Universality for Transferable Targeted Attacks
⭐code
Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization
🏠project
Revisiting Residual Networks for Adversarial Robustness
⭐code
Feature Separation and Recalibration for Adversarial Robustness
⭐code
CFA: Class-wise Calibrated Fair Adversarial Training
⭐code
Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations
🏠project
Efficient Loss Function by Minimizing the Detrimental Effect of Floating-Point Errors on Gradient-Based Attacks
黑盒
对抗样本
后门攻击
对抗攻击
后门防御
- Backdoor Defense via Deconfounded Representation Learning
  ⭐code
对抗训练

39.Continual Learning(持续学习)

Dealing With Cross-Task Class Discrimination in Online Continual Learning
Heterogeneous Continual Learning
Batch Model Consolidation: A Multi-Task Model Consolidation Framework
CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning
⭐code
Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
⭐code
Computationally Budgeted Continual Learning: What Does Matter?
⭐code
Achieving a Better Stability-Plasticity Trade-Off via Auxiliary Networks in Continual Learning
Preserving Linear Separability in Continual Learning by Backward Feature Projection
Regularizing Second-Order Influences for Continual Learning
⭐code
Rethinking Gradient Projection Continual Learning: Stability / Plasticity Feature Space Decoupling
MetaMix: Towards Corruption-Robust Continual Learning With Temporally Self-Adaptive Data Transformation
Exploring Data Geometry for Continual Learning
PCR: Proxy-based Contrastive Replay for Online Class-Incremental Continual Learning
Bilateral Memory Consolidation for Continual Learning
Adaptive Plasticity Improvement for Continual Learning
Real-Time Evaluation in Online Continual Learning: A New Hope
PIVOT: Prompting for Video Continual Learning

38.Meta-Learning(元学习)

Meta-Learning with a Geometry-Adaptive Preconditioner
⭐code元学习
Improving Generalization of Meta-Learning With Inverted Regularization at Inner-Level
Ground-Truth Free Meta-Learning for Deep Compressive Sampling
HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization
Panoptic Compositional Feature Field for Editable Scene Rendering With Network-Inferred Labels via Metric Learning

37.Contrastive Learning(对比学习)

Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning
Difficulty-Based Sampling for Debiased Contrastive Representation Learning
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
⭐code
Twin Contrastive Learning with Noisy Labels
⭐code
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
Best of Both Worlds: Multimodal Contrastive Learning With Tabular and Imaging Data
CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose
ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-Real Novel View Synthesis via Contrastive Learning
Hyperbolic Contrastive Learning for Visual Representations beyond Objects
⭐code
非对比学习
- Non-Contrastive Learning Meets Language-Image Pre-Training

36.Optical Flow(光流估计)

Rethinking Optical Flow from Geometric Matching Consistent Perspective
⭐code
DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling
AnyFlow: Arbitrary Scale Optical Flow with Implicit Neural Representation
TransFlow: Transformer as Flow Learner
Tangentially Elongated Gaussian Belief Propagation for Event-Based Incremental Optical Flow Estimation
⭐code
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation

35.OCR

文本识别
- Self-Supervised Implicit Glyph Attention for Text Recognition
场景文本检测
- Turning a CLIP Model into a Scene Text Detector
  ⭐code
表格结构识别
- Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling
字体生成
手写文本生成
- Disentangling Writer and Character Styles for Handwriting Generation
  ⭐code
- Handwritten Text Generation from Visual Archetypes
矢量字体合成
- DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality
  ⭐code
生成图形文档
- Towards Flexible Multi-modal Document Models
  ⭐code
文本检测
- Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution
  ⭐code
文档处理
- Unifying Vision, Text, and Layout for Universal Document Processing
Scene Text Spotting

34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)

Network Expansion for Practical Training Acceleration
⭐code
Accelerating Dataset Distillation via Model Augmentation
Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks
⭐code
量化
剪枝
MC
- Hard Sample Matters a Lot in Zero-Shot Quantization
KD
轻量级网络
- FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network
  ⭐code
去量化
- ABCD: Arbitrary Bitwise Coefficient for De-Quantization

33.Human-Object Interaction(人物交互)

Visibility Aware Human-Object Interaction Tracking From Single RGB Camera
Affordance Diffusion: Synthesizing Hand-Object Interactions
HOICLIP: Efficient Knowledge Transfer for HOI Detection With Vision-Language Models
⭐code
ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection
⭐code
Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework
Detecting Human-Object Contact in Images
🏠project
Category Query Learning for Human-Object Interaction Classification
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
Relational Context Learning for Human-Object Interaction Detection
HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
⭐code
ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection
⭐code
Visibility Aware Human-Object Interaction Tracking from Single RGB Camera
🏠project
Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream
A Neural Modeling Pipeline on Multi-View Human-Object Interactions
双手交互
- Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes
  ⭐code
手物交互

32.Data Augmentation(数据增强)

Full or Weak annotations? An adaptive strategy for budget-constrained annotation campaigns
SLACK: Stable Learning of Augmentations With Cold-Start and KL Regularization
⭐code
学习库
- PyPose: A Library for Robot Learning With Physics-Based Optimization
关键点定位
- Few-shot Geometry-Aware Keypoint Localization
  ⭐code
关键点检测
- Continuous Landmark Detection With 3D Queries

31.Vision-Language(视觉语言)

Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods
Learning To Exploit Temporal Structure for Biomedical Vision-Language Processing
REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory
Policy Adaptation from Foundation Model Feedback
🏠project
Learning Visual Representations via Language-Guided Sampling
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models
Scaling Language-Image Pre-Training via Masking
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
⭐code
Few-Shot Learning With Visual Distribution Calibration and Cross-Modal Distribution Alignment
⭐code
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning
⭐code
Teaching Structured Vision & Language Concepts to Vision & Language Models
⭐code
Leveraging per Image-Token Consistency for Vision-Language Pre-Training
Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks
🏠project
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Open-vocabulary Attribute Detection
🏠project
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
⭐code
FashionSAP: Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training
Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language
Task Residual for Tuning Vision-Language Models
⭐code
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator
Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization
Position-Guided Text Prompt for Vision-Language Pre-Training
⭐code
RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
⭐code
Seeing What You Miss: Vision-Language Pre-Training With Semantic Completion Learning
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
DeAR: Debiasing Vision-Language Models with Additive Residuals
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
⭐code
VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
MAGVLT: Masked Generative Vision-and-Language Transformer
Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
Top-Down Visual Attention from Analysis by Synthesis
🏠project
Accelerating Vision-Language Pretraining with Free Language Modeling
⭐code
Multi-Modal Representation Learning with Text-Driven Soft Masks
Fine-tuned CLIP models are efficient video learners
⭐code
MaPLe: Multi-modal Prompt Learning
⭐code
Learning to Name Classes for Vision and Language Models
Dynamic Inference With Grounding Based Vision and Language Models
Connecting Vision and Language with Video Localized Narratives
🏠project
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models
⭐code
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
⭐code
VILA: Learning Image Aesthetics From User Comments With Vision-Language Pretraining
⭐code
VLN
- Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
  🏠project
- Lana: A Language-Capable Navigator for Instruction Following and Generation
  ⭐code
- LANA: A Language-Capable Navigator for Instruction Following and Generation
- KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
  ⭐code
- Improving Vision-and-Language Navigation by Generating Future-View Image Semantics
  ⭐code
- Iterative Vision-and-Language Navigation
- Behavioral Analysis of Vision-and-Language Navigation Agents
- Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation
- GeoVLN: Learning Geometry-Enhanced Visual Representation With Slot Attention for Vision-and-Language Navigation
- A New Path: Scaling Vision-and-Language Navigation With Synthetic Instructions and Imitation Learning
- Layout-Based Causal Inference for Object Navigation
  ⭐code
- KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
视频语言
- Test of Time: Instilling Video-Language Models with a Sense of Time
  🏠project
- All in One: Exploring Unified Video-Language Pre-Training
  ⭐code
- HierVL: Learning Hierarchical Video-Language Embeddings
- An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling
  ⭐code
- Clover: Towards A Unified Video-Language Alignment and Fusion Model
  ⭐code
  Clover 视频-文本预训练模型在 DiDeMo、MSRVTT 和 LSMDC 三个文本-视频检索任务上取得了 zero-shot 及 finetune performance 的最佳表现；在 8 个主流的视频问答 benchmark 上也达到了新的 state-of-the-art。
- VindLU: A Recipe for Effective Video-and-Language Pretraining
  ⭐code
LLM
- Learning Video Representations From Large Language Models
visual grounding
- EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
  ⭐code
视觉对话
- The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training

30.Visual Answer Questions(视觉问答)

VQA
Video-QA

29.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)

机器人
SLAM
虚拟试穿
AR/VR
混合现实
- MixSim: A Hierarchical Framework for Mixed Reality Traffic Simulation
  🏠project
Visual Localization(视觉定位)
VPR(Visual Place Recognition)
- StructVPR: Distill Structural Knowledge With Weighting Samples for Visual Place Recognition
- Data-efficient Large Scale Place Recognition with Graded Similarity Supervision
  ⭐code
视觉里程计
- PVO: Panoptic Visual Odometry
  ⭐code
- Modality-Invariant Visual Odometry for Embodied Vision

28.Style Transfer(风格迁移)

CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer
StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer
⭐code
Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer
⭐code
Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer
Neural Preset for Color Style Transfer
🏠project
Learning Dynamic Style Kernels for Artistic Style Transfer
Inversion-Based Style Transfer With Diffusion Models
⭐code
QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity
⭐code
文本驱动的室内风格化
- Text2Scene: Text-Driven Indoor Scene Stylization With Part-Aware Details

27.Pose Estimation(物体姿势估计)

物体姿势估计
6D
4D
- Transfer4D: A Framework for Frugal Motion Capture and Deformation Transfer
动物姿态估计

26.GCN/GNN

GNN
- Turning Strengths Into Weaknesses: A Certified Robustness Inspired Attack Framework Against Graph Neural Networks
- From Node Interaction To Hop Interaction: New Effective and Scalable Graph Learning Paradigm
  ⭐code

25.Fine-Grained/Image Classification(细粒度/图像分类)

Quantum-Inspired Spectral-Spatial Pyramid Network for Hyperspectral Image Classification
Learning Partial Correlation Based Deep Visual Representation for Image Classification
iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
Soft Augmentation for Image Classification
⭐code
Explaining Image Classifiers With Multiscale Directional Image Representation
Equiangular Basis Vectors
⭐code
Prefix Conditioning Unifies Language and Label Supervision
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Boosting Verified Training for Robust Image Classifications via Abstraction
⭐code
Semantic Prompt for Few-Shot Image Recognition
Regularization of polynomial networks for image recognition
⭐code
Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm
⭐code
Dynamic Conceptional Contrastive Learning for Generalized Category Discovery
⭐code
Learning Bottleneck Concepts in Image Classification
🏠project
⭐code
Learning Partial Correlation based Deep Visual Representation for Image Classification
PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification
⭐code
Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification
小样本图像分类
- ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification
小样本分类
- Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
  ⭐code
- Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings
  ⭐code
- Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation
细粒度
- Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems
  ⭐code
- Fine-Grained Classification with Noisy Labels
- An Erudite Fine-Grained Visual Classification Model
  ⭐code
- Weakly Supervised Posture Mining for Fine-Grained Classification
- Learning Attribute and Class-Specific Representation Duet for Fine-Grained Fashion Analysis
视觉识别
- Adapting Shortcut With Normalizing Flow: An Efficient Tuning Framework for Visual Recognition
长尾分类
- Curvature-Balanced Feature Manifold Learning for Long-Tailed Classification
长尾视觉识别
- SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail
- Balanced Product of Calibrated Experts for Long-Tailed Recognition
  ⭐code
- FCC: Feature Clusters Compression for Long-Tailed Visual Recognition
  ⭐code
- Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment
  ⭐code
- Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions
  ⭐code
- Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
- Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition
  ⭐code
- No One Left Behind: Improving the Worst Categories in Long-Tailed Learning
多标签分类
- Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification
  ⭐code
多标签识别
- Exploring Structured Semantic Prior for Multi Label Recognition With Incomplete Labels
  ⭐code
- Texts as Images in Prompt Tuning for Multi-Label Image Recognition
  ⭐code
多视觉分类
- Exploring and Exploiting Uncertainty for Incomplete Multi-View Classification
Superclass Learning(超类学习)
- Superclass Learning with Representation Enhancement
材料分类
- Thermal Spread Functions (TSF): Physics-Guided Material Classification

24.Super-Resolution(超分辨率)

Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution
Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
⭐code
Perception-Oriented Single Image Super-Resolution Using Optimal Objective Estimation
⭐code
Toward Stable, Interpretable, and Lightweight Hyperspectral Super-Resolution
CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution
Zero-Shot Dual-Lens Super-Resolution
⭐code
Non-Line-of-Sight Imaging With Signal Superresolution Network
Kernel Aware Resampler
RobustNeRF: Ignoring Distractors With Robust Losses
光场超分辨率
- CutMIB: Boosting Light Field Super-Resolution via Multi-View Image Blending
  ⭐code
ISR
VSR
文本图像超分辨率
- Learning Generative Structure Prior for Blind Text Image Super-resolution
  ⭐code
Image Resampling(图像重采样)
- Learning Steerable Function for Efficient Image Resampling

23.Image Retrieval(图像检索)

Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval
Asymmetric Feature Fusion for Image Retrieval
Improving Image Recognition by Retrieving From Web-Scale Image-Text Data
Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval
⭐code
Revisiting Self-Similarity: Structural Embedding for Image Retrieval
⭐code
Train/Test-Time Adaptation With Retrieval
Pic2Word: Mapping Pictures to Words for Zero-Shot Composed Image Retrieval
⭐code
基于草图的图像检索
视频-文本检索
- Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
  ⭐code
- Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval
视频-文本
- SViTT: Temporal Learning of Sparse Video-Text Transformers
  🏠project视频文本检索和问答
多模态检索
- ImageBind: One Embedding Space To Bind Them All
  🏠project
  ⭐code
- Pix2map: Cross-Modal Retrieval for Inferring Street Maps From Images
跨模态检索
文本-图像匹配
- Learning Semantic Relationship among Instances for Image-Text Matching
  ⭐code
- Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network
  ⭐code
图像文本检索
- Multilateral Semantic Relations Modeling for Image Text Retrieval
- ViLEM: Visual-Language Error Modeling for Image-Text Retrieval
文本-视频检索
- Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
  ⭐code
视频语言检索
- CLIPPING: Distilling CLIP-Based Models With a Student Base for Video-Language Retrieval
- Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Language Retrieval

22.Image Synthesis/Generation(图像合成)

LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation
⭐code
Zero-shot Generative Model Adaptation via Image-specific Prompt Learning
⭐code
TopNet: Transformer-based Object Placement Network for Image Compositing
基于草图生成
- Picture that Sketch: Photorealistic Image Generation from Abstract Sketches
  🏠project
图像-视频合成
- Conditional Image-to-Video Generation with Latent Flow Diffusion Models
  ⭐code
海报生成
- Unsupervised Domain Adaption with Pixel-level Discriminator for Image-aware Layout Generation
文本-图像合成
prompting
- Diversity-Aware Meta Visual Prompting
  ⭐code
图像生成
视频生成
- Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
  🏠project
- 文本驱动的视频合成
  - Tell Me What Happened: Unifying Text-Guided Video Completion via Multimodal Masked Video Generation
Image Synthesis(图像合成)
文本-运动生成
- Being Comes From Not-Being: Open-Vocabulary Text-to-Motion Generation With Wordless Training
  🏠project
纹理合成
- Neural Texture Synthesis With Guided Correspondence

21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)

TopDiG: Class-Agnostic Topological Directional Graph Extraction From Remote Sensing Images
Change-Aware Sampling and Contrastive Learning for Satellite Images
MethaneMapper: Spectral Absorption aware Hyperspectral Transformer for Methane Detection
ViTs for SITS: Vision Transformers for Satellite Image Time Series
Adaptive Sparse Convolutional Networks With Global Context Enhancement for Faster Object Detection on Drone Images
⭐code
图像检测
- Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images
  ⭐code
跟踪
- Resource-Efficient RGBD Aerial Tracking
  ⭐code
雷达定位
- SGLoc: Scene Geometry Encoding for Outdoor LiDAR Localization
无人机目标检测
- Generalized UAV Object Detection via Frequency Domain Disentanglement

20.Autonomous vehicles(自动驾驶)

自动驾驶
MSeg3D: Multi-Modal 3D Semantic Segmentation for Autonomous Driving
⭐code
轨迹预测
Place Recognition
- Data-efficient Large Scale Place Recognition with Graded Similarity Supervision
  ⭐code
- R2Former: Unified Retrieval and Reranking Transformer for Place Recognition
  ⭐code
车道线检测
- BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points
  ⭐code
- Anchor3DLane: Learning To Regress 3D Anchors for Monocular 3D Lane Detection
  ⭐code
鸟瞰识别
- BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
- SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images
  🏠project

19.Neural Architecture Search(神经架构搜索)

PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
⭐code
Differentiable Architecture Search With Random Features
Adversarially Robust Neural Architecture Search for Graph Neural Networks
MDL-NAS: A Joint Multi-Domain Learning Framework for Vision Transformer
HOTNAS: Hierarchical Optimal Transport for Neural Architecture Search
EMT-NAS:Transferring Architectural Knowledge Between Tasks From Different Datasets

18.Person Re-Identification(人员重识别)

Towards Modality-Agnostic Person Re-Identification With Descriptive Query
Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning
⭐code
Patch-Wise High-Frequency Augmentation for Transformer-Based Person Re-Identification
⭐code
TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification
⭐code
人员检索
- Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
  ⭐code
- Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling
换衣重识别
可见光-红外人员重识别(VIReID)
G-ReID
- Similarity Metric Learning for RGB-Infrared Group Re-Identification
  ⭐code
行人检测
- VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision
  ⭐code
- Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection
人群计数
步态识别

17.Medical Image(医学影像)

Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-Training
Interventional Bag Multi-Instance Learning on Whole-Slide Pathological Images
⭐code
Causally-Aware Intraoperative Imputation for Overall Survival Time Prediction
Flexible-Cm GAN: Towards Precise 3D Dose Prediction in Radiotherapy
Towards Trustable Skin Cancer Diagnosis via Rewriting Model’s Decision
Hierarchical discriminative learning improves visual representations of biomedical microscopy
🏠project
Topology-Guided Multi-Class Cell Context Generation for Digital Pathology
Image Quality-aware Diagnosis via Meta-knowledge Co-embedding
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens医学诊断
3D医学
- Geometric Visual Similarity Learning in 3D Medical Image Self-supervised Pre-training
  ⭐code
图像配准
- Indescribable Multi-modal Spatial Evaluator
  ⭐code
图像分类
- ask-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification
  ⭐code
- RankMix: Data Augmentation for Weakly Supervised Learning of Classifying Whole Slide Images With Diverse Sizes and Imbalanced Categories
- Grounding Counterfactual Explanation of Image Classifiers to Textual Concept Space
- PEFAT: Boosting Semi-Supervised Medical Image Classification via Pseudo-Loss Estimation and Feature Adversarial Training
  ⭐code
- Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology Whole Slide Image Classification
  ⭐code
- A Loopback Network for Explainable Microvascular Invasion Classification
报告生成
- Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation
  ⭐code
- Interactive and Explainable Region-Guided Radiology Report Generation
- METransformer: Radiology Report Generation by Transformer With Multiple Learnable Expert Tokens
  - KiUT: Knowledge-Injected U-Transformer for Radiology Report Generation
医学影像分割
- Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation
- SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation
- Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation
  ⭐code
- Fair Federated Medical Image Segmentation via Client Contribution Estimation
- Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation
  ⭐code
- Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization
- Weakly supervised segmentation with point annotations for histopathology images via contrast-based variational model
- MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation
  ⭐code
- Rethinking Few-Shot Medical Segmentation: A Vector Quantization View
- Devil Is in the Queries: Advancing Mask Transformers for Real-World Medical Image Segmentation and Out-of-Distribution Localization
- MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery
  ⭐code
- Ambiguous Medical Image Segmentation Using Diffusion Models
- Directional Connectivity-Based Segmentation of Medical Images
医学影像分析
- Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data
- Directional Connectivity-based Segmentation of Medical Images
  ⭐code
肿瘤分割
- Label-Free Liver Tumor Segmentation
医学影像报告生成
- Interactive and Explainable Region-guided Radiology Report Generation
  ⭐code自动生成放射学报告
切片分析
- Histopathology Whole Slide Image Analysis With Heterogeneous Graph Representation Learning
  ⭐code
细胞检测、跟踪与计数
- DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting
- Overlapped Cell on Tissue Dataset for Histopathology
- Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses
  ⭐code
- Masked Autoencoder Guided Segmentation at Pixel Resolution for Accurate, Self-Supervised Subcellular Structure Recognition
  ⭐code
单目内窥镜跟踪
- Constrained Evolutionary Diffusion Filter for Monocular Endoscope Tracking
皮肤癌诊断
- Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision
MRI 重建
- Learning Federated Visual Prompt in Null Space for MRI Reconstruction
生物医学
- Hierarchical Discriminative Learning Improves Visual Representations of Biomedical Microscopy

16.Semi/self-supervised learning(半/自监督)

无监督学习
自监督
半监督
弱监督
- Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding
  ⭐code

15.Vision Transformers

Transformer-Based Learned Optimization
Teaching Matters: Investigating the Role of Supervision in Vision Transformers
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers
NLOST: Non-Line-of-Sight Imaging with Transformer
SVGformer: Representation Learning for Continuous Vector Graphics Using Transformers
Adversarial Normalization: I Can visualize Everything (ICE)
⭐code
Hint-Aug: Drawing Hints From Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning
⭐code
PanoSwin: A Pano-Style Swin Transformer for Panorama Understanding
D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers
NAR-Former: Neural Architecture Representation Learning Towards Holistic Attributes Prediction
⭐code
DropKey for Vision Transformer
Integrally Pre-Trained Transformer Pyramid Networks
⭐code
DSVT: Dynamic Sparse Voxel Transformer With Rotated Sets
Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
Trade-Off Between Robustness and Accuracy of Vision Transformers
A Light Touch Approach to Teaching Transformers Multi-view Geometry
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
⭐code
RGB no more: Minimally-decoded JPEG Vision Transformers
Making Vision Transformers Efficient from A Token Sparsification View
⭐code
Blur Interpolation Transformer for Real-World Motion from Blur
⭐code
Neighborhood Attention Transformer
⭐code
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
⭐code
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers
🏠project
Improving Robustness of Vision Transformers by Reducing Sensitivity To Patch Corruptions
Latency Matters: Real-Time Action Forecasting Transformer
⭐code
OmniMAE: Single Model Masked Pretraining on Images and Videos
⭐code
MAGVIT: Masked Generative Video Transformer
🏠project
Learning Imbalanced Data with Vision Transformers
⭐code
Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves
🏠project
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
⭐code
Generic-to-Specific Distillation of Masked Autoencoders
⭐code
BiFormer: Vision Transformer with Bi-Level Routing Attention
⭐code
Making Vision Transformers Efficient from A Token Sparsification View
Dual-path Adaptation from Image to Video Transformers
⭐code
Spherical Transformer for LiDAR-based 3D Recognition
⭐code
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
⭐code
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Learning Expressive Prompting With Residuals for Vision Transformers
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
🏠project
Visual Dependency Transformers: Dependency Tree Emerges from Reversed AttentionTransformer
Token Boosting for Robust Self-Supervised Visual Transformer Pre-trainingTransformer
Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
⭐code
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
⭐code
DropKey
👍CVPR 2023｜两行代码高效缓解视觉Transformer过拟合，美图&国科大联合提出正则化方法DropKey
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
⭐code
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
⭐code
TrojViT: Trojan Insertion in Vision Transformers
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
🏠project
ResFormer: Scaling ViTs with Multi-Resolution Training
⭐code
Vision Transformer With Super Token Sampling
⭐code
Vision Transformers Are Good Mask Auto-Labelers

14.Video

PointAvatar: Deformable Point-Based Head Avatars From Videos
Video Probabilistic Diffusion Models in Projected Latent Space
Masked Motion Encoding for Self-Supervised Video Representation Learning
⭐code
Modular Memorability: Tiered Representations for Video Memorability Prediction
⭐code
Language-Guided Music Recommendation for Video via Prompt Analogies
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
1000 FPS HDR Video With a Spike-RGB Hybrid Camera
🏠project
Egocentric Video Task Translatio
🏠project
Relational Space-Time Query in Long-Form Videos
Spatial-Then-Temporal Self-Supervised Learning for Video Correspondence
⭐code
Few-Shot Referring Relationships in Videos
🏠project
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
🏠project
3D Video Loops From Asynchronous Input
🏠project
VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking
⭐code
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
⭐code
StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Implicit View-Time Interpolation of Stereo Videos using Multi-Plane Disparities and Non-Uniform Coordinates
🏠project
📺video
How You Feelin'? Learning Emotions and Mental States in Movie Scenes
🏠project
视频时刻检索
- Towards Generalisable Video Moment Retrieval:Visual-Dynamic Injection to Image-Text Pre-Training
- Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
  ⭐code
- Hierarchical Video-Moment Retrieval and Step-Captioning
  ⭐code
视频高亮检测
- Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies
  ⭐code
视频帧插值
- Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation
  ⭐code
- AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
  ⭐code
- Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation
- Exploring Discontinuity for Video Frame Interpolation
- Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields
  ⭐code
- A Unified Pyramid Recurrent Network for Video Frame Interpolation
  ⭐code
- Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time
  ⭐code
- BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation
  ⭐code视频帧插值
- Frame Interpolation Transformer and Uncertainty Guidance
- Range-Nullspace Video Frame Interpolation With Focalized Motion Estimation
视频合成
- Decomposed Diffusion Models for High-Quality Video Generation
- MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
  ⭐code
- Align Your Latents: High-Resolution Video Synthesis With Latent Diffusion Models
  🏠project
视频预测
- MOSO: Decomposing MOtion, Scene and Object for Video Prediction
  ⭐code
- A Dynamic Multi-Scale Voxel Flow Network for Video Prediction
  ⭐code
视频理解
- Selective Structured State-Spaces for Long-Form Video Understanding
- How you feelin'? Learning Emotions and Mental States in Movie Scenes
  ⭐code
- System-status-aware Adaptive Network for Online Streaming Video Understanding
- LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling
  ⭐code
- System-Status-Aware Adaptive Network for Online Streaming Video Understanding
- Therbligs in Action: Video Understanding Through Motion Primitives
- Streaming Video Model
  ⭐code
- Procedure-Aware Pretraining for Instructional Video Understanding
  ⭐code
- Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
  ⭐code
- Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring
  ⭐code
视频分类
- Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
  ⭐code
- Class Prototypes Based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos
  🏠project
视频描述
- Fine-grained Audible Video Description
视频摘要
- Align and Attend: Multimodal Summarization with Dual Contrastive Losses
  ⭐code
视频识别
- Frame Flexible Network
  ⭐code
  👍CVPR-2023 | FFN: 针对视频识别的通用Once-For-All框架
- Use Your Head: Improving Long-Tail Video Recognition
Video Deflickering(去闪烁)
- Blind Video Deflickering by Neural Filtering with a Flawed Atlas
  ⭐code
时间句子定位(TSG)
- You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos
- Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training
VAD
- Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection
- Video Event Restoration Based on Keyframes for Video Anomaly Detection
- Generating Anomalies for Video Anomaly Detection With Prompt-Based Feature Mapping
- Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
- Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection
- Look Around for Anomalies: Weakly-Supervised Anomaly Detection via Context-Motion Relational Learning
视频异常定位
- EVAL: Explainable Video Anomaly Localization
视频镜像检测
- Learning To Detect Mirrors From Videos via Dual Correspondences
  🏠project
- 视频表示学习
Video Paragraph Grounding
- Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding
Video Grounding
- Text-Visual Prompting for Efficient 2D Temporal Video Grounding
- WINNER: Weakly-Supervised hIerarchical decompositioN and aligNment for Spatio-tEmporal Video gRounding
- Iterative Proposal Refinement for Weakly-Supervised Video Grounding
- Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding
- ProTeGe: Untrimmed Pretraining for Video Temporal Grounding by Video Temporal Grounding
视频阴影检测
- A Transformer Video Shadow Detection Framework
  🏠project
视频关键点检测
- Recurrence Without Recurrence: Stable Video Landmark Detection With Deep Equilibrium Models
视频情感检测
- Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network
场景检测
- Efficient Movie Scene Detection Using State-Space Transformers

13.GAN

AdaptiveMix: Improving GAN Training via Feature Space Shrinkage
Masked Auto-Encoders Meet Generative Adversarial Networks and Beyond
Spider GAN: Leveraging Friendly Neighbors To Accelerate GAN Training
Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection
⭐code
MoStGAN-V: Video Generation With Temporal Motion Styles
⭐code
Sequential Training of GANs Against GAN-Classifiers Reveals Correlated "Knowledge Gaps" Present Among Independently Trained GAN Instances
Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration
⭐code
HumanGen: Generating Human Radiance Fields With Explicit Priors
Bi-Directional Feature Fusion Generative Adversarial Network for Ultra-High Resolution Pathological Image Virtual Re-Staining
GlassesGAN: Eyewear Personalization Using Synthetic Appearance Discovery and Targeted Subspace Modeling
Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint
🏠project
3DAvatarGAN: Bridging Domains for Personalized Editable Avatars
🏠project
GLeaD: Improving GANs With a Generator-Leading Task
🏠project
Transforming the Residuals for Real Image Editing With StyleGAN
⭐code
Improving GAN Training via Feature Space Shrinkage
⭐code
Spider GAN: Leveraging Friendly Neighbors to Accelerate GAN Training
NoisyTwins: Class-Consistent and Diverse Image Generation through StyleGANs
⭐code
Graph Transformer GANs for Graph-Constrained House Generation
Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models
⭐code
Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis
⭐code
VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs
⭐code
Discriminator-Cooperated Feature Map Distillation for GAN Compression
⭐code
Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
⭐code
图像-文本合成
- Scaling up GANs for Text-to-Image Synthesis
  🏠project
扩散模型
- How to Backdoor Diffusion Models?
  ⭐code
- Diffusion Probabilistic Model Made Slim
- VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models
- Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
- Seeing Beyond the Brain: Conditional Diffusion Model With Sparse Masked Modeling for Vision Decoding
- Self-Guided Diffusion Models
- ObjectStitch: Object Compositing With Diffusion Model
- Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models
- Parallel Diffusion Models of Operator and Image for Blind Inverse Problems
- RGBD2: Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models
  🏠project
- Dimensionality-Varying Diffusion Process
- TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets
  ⭐code
- Towards Practical Plug-and-Play Diffusion Models
  ⭐code
- All Are Worth Words: A ViT Backbone for Diffusion Models
- Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
  🏠project
- Binary Latent Diffusion
- Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
- Lookahead Diffusion Probabilistic Models for Refining Mean Estimation
- EDICT: Exact Diffusion Inversion via Coupled Transformations
  ⭐code
- ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts
GAN 逆映射
- 3D GAN Inversion With Facial Symmetry Prior
  🏠project
- NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-Shot Real Image Animation
- Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion
  🏠project
- High-Fidelity 3D GAN Inversion by Pseudo-Multi-View Optimization
  🏠project

12.Image-to-Image Translation(图像到图像翻译)

3D-Aware Multi-Class Image-to-Image Translation With NeRFs
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
DSI2I: Dense Style for Unpaired Image-to-Image Translation
Fix the Noise: Disentangling Source Feature for Controllable Domain Translation
⭐code
3D-Aware Multi-Class Image-to-Image Translation with NeRFs
LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data
🏠project
Unpaired Image-to-Image Translation With Shortest Path Regularization
⭐code
BBDM: Image-to-Image Translation With Brownian Bridge Diffusion Models
图像翻译
- Masked and Adaptive Transformer for Exemplar Based Image Translation
  ⭐code
视频翻译
- On the Difficulty of Unpaired Infrared-to-Visible Video Translation: Fine-Grained Content-Rich Patches Transfer
  ⭐code

11.Face(人脸)

Rethinking Feature-Based Knowledge Distillation for Face Recognition
Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning
Learning a 3D Morphable Face Reflectance Model From Low-Cost Data
CLIP2Protect: Protecting Facial Privacy Using Text-Guided Makeup via Adversarial Latent Search
Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis From Monocular Image
🏠project
Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in the Wild
Evading Forensic Classifiers With Attribute-Conditioned Adversarial Faces
⭐code
Improving Fairness in Facial Albedo Estimation via Visual-Textual Cues
Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization
⭐code
Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation
⭐code
Privacy-Preserving Adversarial Facial Features
BioNet: A Biologically-Inspired Network for Face Recognition
⭐code
High-Res Facial Appearance Capture From Polarized Smartphone Images
MARLIN: Masked Autoencoder for Facial Video Representation LearnINg
⭐code
Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition
🏠project
Disentanglement of Pose and Expression for General Video Portrait Editing
BlendFields: Few-Shot Example-Driven Facial Modeling
⭐code
Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
👍CVPR 2023 | 人脸识别路漫漫：清华、北大等提出AT3D人脸识别系统攻击方法
Collaborative Diffusion for Multi-Modal Face Generation and Editing
⭐code
⭐code
👍CVPR 2023 | Collaborative Diffusion 怎样让不同的扩散模型合作？
Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation
DiffusionRig: Learning Personalized Priors for Facial Appearance Editing
⭐code
Probabilistic Knowledge Distillation of Face Ensembles
DCFace: Synthetic Face Generation with Dual Condition Diffusion Model
⭐code
Discrete Point-Wise Attack Is Not Enough: Generalized Manifold Adversarial Attack for Face Recognition
3D 人脸
- Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D Images
- Physical-World Optical Adversarial Attacks on 3D Face Recognition
- Learning a 3D Morphable Face Reflectance Model from Low-cost Data
  🏠project
- NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images
  ⭐code
- FaceLit: Neural 3D Relightable Faces
- Learning Neural Proto-face Field for Disentangled 3D Face Modeling In the Wild
- High-Fidelity 3D Face Generation From Natural Language Descriptions
  ⭐code
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
  ⭐code
- PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360deg
人脸重建
- A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images
  🏠project
- Graphics Capsule: Learning Hierarchical 3D Face Representations From 2D Images
- FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction
  ⭐code
- Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation
- AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction
  🏠project
人脸恢复
- DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration
人脸对齐
- [* DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment
  ⭐code]
人脸匿名化
- Attribute-preserving Face Dataset Anonymization via Latent Code Optimization
  ⭐code
人脸超分辨率
- Spatial-Frequency Mutual Learning for Face Super-Resolution
  ⭐code
裸眼年龄识别
- DAA: A Delta Age AdaIN operation for age estimation via binary code transformer
情绪识别
- Context De-confounded Emotion Recognition
- Decoupled Multimodal Distilling for Emotion Recognition
  ⭐code
- Multivariate, Multi-Frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation
- Learning Emotion Representations from Verbal and Nonverbal Communication
  ⭐code
人像照明
- LightPainter: Interactive Portrait Relighting with Freehand Scribble
人脸活体检测
- Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment
- Instance-Aware Domain Generalization for Face Anti-Spoofing
  ⭐code
说话头
- OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering
  ⭐code
- High-Fidelity Generalized Emotional Talking Face Generation With Multi-Modal Emotion Space Learning
- Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis
- LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook
- SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
  ⭐code
- Implicit Neural Head Synthesis via Controllable Local Deformation Fields
- Identity-Preserving Talking Face Generation with Landmark and Appearance Priors
  ⭐code
- Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
- High-Fidelity and Freely Controllable Talking Head Video Generation
  🏠project
- High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning
- GANHead: Towards Generative Animatable Neural Head Avatars
  ⭐code
- One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
  🏠project
- MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
  🏠project
人脸分割
- Parameter Efficient Local Implicit Image Function Network for Face Segmentation
眨眼检测
- Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video
三维头像生成
- Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
  ⭐code
- Instant Volumetric Head Avatars
  🏠project
- Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars
  🏠project
- OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis
- PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360◦
人脸表情识别
- Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition
微表情识别
- Micron-BERT: BERT-based Facial Micro-Expression Recognition
  ⭐code微表情识别
- Feature Representation Learning With Adaptive Displacement Generation and Transformer Fusion for Micro-Expression Recognition
- SelfME: Self-Supervised Motion Learning for Micro-Expression Recognition
人脸合成
- High-Fidelity 3D Face Generation from Natural Language Descriptions
  🏠project
- StyleGene: Crossover and Mutation of Region-Level Facial Genes for Kinship Face Synthesis
假脸检测
- AUNet: Learning Relations Between Action Units for Face Forgery Detection
- AltFreezing for More General Video Face Forgery Detection
Facial Action Unit Detection
- Biomechanics-guided Facial Action Unit Detection through Force Modeling
人脸视频编辑
- Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding
  🏠project
人脸质量评估
- Face Image Quality Assessment by Learning Sample Relative Classifiability
  ⭐code
人脸交换
- 3D-Aware Face Swapping
  ⭐code
- Implicit Identity Driven Deepfake Face Swapping Detection
- StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping
  ⭐code
- Fine-Grained Face Swapping via Regional GAN Inversion
  🏠project
- DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion
人脸聚类
- Local Connectivity-Based Density Estimation for Face Clustering
  ⭐code
人脸修饰
- Blemish-Aware and Progressive Face Retouching With Limited Paired Data
三维数字头像
- A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
  🏠project
- High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors
音频驱动的人脸重演
- Parametric Implicit Face Representation for Audio-Driven Facial Reenactment
隐私保护
- DartBlur: Privacy Preservation With Detection Artifact Suppression
  ⭐code
人脸关键点检测
- 3D-Aware Facial Landmark Detection via Multi-View Consistent Training on Synthetic Data
- STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection
头部捕获
- Instant Multi-View Head Capture Through Learnable Registration
  🏠project
年龄估计
- DAA: A Delta Age AdaIN Operation for Age Estimation via Binary Code Transformer

10.3D(三维重建\三维视觉)

Structured 3D Features for Reconstructing Controllable Avatars
🏠project
In-Hand 3D Object Scanning from an RGB Sequence
Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs
⭐code
3D Concept Learning and Reasoning from Multi-View Images
🏠project
LP-DIF: Learning Local Pattern-Specific Deep Implicit Function for 3D Objects and Scenes
🏠project
DynamicStereo: Consistent Dynamic Depth From Stereo Videos
🏠project
ARO-Net: Learning Implicit Fields from Anchored Radial Observations
G-MSM:Unsupervised Multi-Shape Matching With Graph-Based Affinity Priors
⭐code
Magic3D: High-Resolution Text-to-3D Content Creation
🏠project
PointListNet: Deep Learning on 3D Point Lists
Omnimatte3D: Associating Objects and Their Effects in Unconstrained Monocular Video
HexPlane: A Fast Representation for Dynamic Scenes
🏠project
Energy-Efficient Adaptive 3D Sensing
🏠project
Objaverse: A Universe of Annotated 3D Objects
🏠project
Level-S2fM: Structure from Motion on Neural Level Set of Implicit Surfaces
🏠project
3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions
⭐code
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation
🏠project
👍CVPR 2023 Award Candidate | 真实高精三维物体数据集OmniObject3D
Neural Scene Chronology
🏠project
3D Neural Field Generation Using Triplane Diffusion
🏠project
Learning Adaptive Dense Event Stereo From the Image Domain
GANmouflage: 3D Object Nondetection With Texture Fields
🏠project
Learning Accurate 3D Shape Based on Stereo Polarimetric Imaging
Sphere-Guided Training of Neural Implicit Surfaces
🏠project
PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision
🏠project
Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
⭐code
Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching
⭐code
SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field
⭐code
3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process
⭐code
DynamicStereo: Consistent Dynamic Depth from Stereo Videos
🏠project
3D Concept Learning and Reasoning from Multi-View Images
🏠project
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360$^{\circ}$
⭐code
Persistent Nature: A Generative Model of Unbounded 3D Worlds
🏠project
TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization
Robust Outlier Rejection for 3D Registration With Variational Bayes
⭐code
On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
⭐code
SUDS: Scalable Urban Dynamic Scenes
🏠project
Understanding and Improving Features Learned in Deep Functional Maps
⭐code
TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering
⭐code
Generalizable Local Feature Pre-training for Deformable Shape Analysis
⭐code
CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects
🏠project
CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes
🏠project
HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images
⭐code
Multi-View Azimuth Stereo via Tangent Space Consistency
⭐code
3D Line Mapping Revisited
⭐code
NeRF-Supervised Deep Stereo
⭐code
⭐code
Robust Outlier Rejection for 3D Registration with Variational Bayes三维
Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
Stereo Matching
- Iterative Geometry Encoding Volume for Stereo Matching
  ⭐code
- Masked representation learning for domain generalized stereo matching
- Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation
- Domain Generalized Stereo Matching via Hierarchical Visual Transformation
- Unsupervised Deep Asymmetric Stereo Matching With Spatially-Adaptive Self-Similarity
- High-frequency Stereo Matching Network
  ⭐code
三维视觉
- Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training
  ⭐code
三维重建
- Neural Lens Modeling
  ⭐code
- Self-Supervised Super-Plane for Neural 3D Reconstruction
  ⭐code
- Multi-Sensor Large-Scale Dataset for Multi-View 3D Reconstruction
- ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction
- Towards Unbiased Volume Rendering of Neural Implicit Surfaces With Geometry Priors
- Multiview Compressive Coding for 3D Reconstruction
  🏠project
- Multi-View Reconstruction Using Signed Ray Distance Functions (SRDF)
- PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction
  🏠project
- RealFusion: 360deg Reconstruction of Any Object From a Single Image
  🏠project
- Deep Polarization Reconstruction With PDAVIS Events
  ⭐code
- RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation
  ⭐code
- Distilling Neural Fields for Real-Time Articulated Shape Reconstruction
  🏠project
- High-Fidelity Clothed Avatar Reconstruction from a Single Image
- Efficient Second-Order Plane Adjustment
- SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
  🏠project
- Reconstructing Animatable Categories From Videos
  🏠project
- OReX: Object Reconstruction From Planar Cross-Sections Using Neural Fields
- Learning Articulated Shape with Keypoint Pseudo-labels from Web Images
  ⭐code
- SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction
  🏠project
- 3D Shape Reconstruction of Semi-Transparent Worms
- Power Bundle Adjustment for Large-Scale 3D Reconstruction
- PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces
  ⭐code
- AutoRecon: Automated 3D Object Discovery and Reconstruction
  ⭐code
- 3D Registration with Maximal Cliques
- 3D shape reconstruction of semi-transparent worms
- VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos
  ⭐code
- NeUDF: Leaning Neural Unsigned Distance Fields with Volume Rendering
  🏠project
- ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-based Consistency
  🏠project
- BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
  ⭐code
- PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters
  ⭐code
- Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly
- MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices
  🏠project
- Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container
  ⭐code
- SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates
  ⭐code
- MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision
  🏠project
- Scalable, Detailed and Mask-Free Universal Photometric Stereo
  ⭐code
- Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
- NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images
  🏠project
- Behind the Scenes: Density Fields for Single View Reconstruction
  🏠project
- VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction
- Surface Reconstruction(曲面重建)
  - NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
  - Octree Guided Unoriented Surface Reconstruction
  - Neuralangelo: High-Fidelity Neural Surface Reconstruction
    🏠project
  - Neural Kernel Surface Reconstruction
  - Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections
    ⭐code
深度估计
- Fully Self-Supervised Depth Estimation from Defocus Clue
  ⭐code
- Gated Stereo: Joint Depth Estimation From Gated and Wide-Baseline Active Stereo Cues
  🏠project
- OmniVidar: Omnidirectional Depth Estimation From Multi-Fisheye Images
- Learning To Fuse Monocular and Multi-View Cues for Multi-Frame Depth E
  ⭐code
- SfM-TTR: Using Structure From Motion for Test-Time Refinement of Single-View Depth Networks
  ⭐code
- Shakes on a Plane: Unsupervised Depth Estimation From Unstabilized Photography
  🏠project
- Depth Estimation From Camera Image and mmWave Radar Point Cloud
  ⭐code
- Deep Depth Estimation From Thermal Image
  ⭐code
- LightedDepth: Video Depth Estimation in Light of Limited Inference View Angles
  ⭐code
- Trap Attention: Monocular Depth Estimation With Manual Traps
  ⭐code
- PlaneDepth: Self-supervised Depth Estimation via Orthogonal Planes
  ⭐code
- Depth Estimation From Indoor Panoramas With Neural Scene Representation
  ⭐code
- Polarimetric iToF:Measuring High-Fidelity Depth Through Scattering Media
- SCADE: NeRFs from Space Carving With Ambiguity-Aware Depth Estimates
  ⭐code
- iDisc: Internal Discretization for Monocular Depth Estimation
  🏠project
- HRDFuse: Monocular 360°Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions
  🏠project
- Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes
  ⭐code
- Temporally Consistent Online Depth Estimation Using Point-Based Fusion
  🏠project
- DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium
  ⭐code
  ⭐code
- Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
  ⭐code
  👍CVPR2023 | 轻量高效的自监督深度估计框架Lite-Mono
深度补全
- CompletionFormer: Depth Completion with Convolutions and Vision Transformers
  ⭐code
  ⭐code
- BEV@DC: Bird’s-Eye View Assisted Training for Depth Completion
室内场景重建
- I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs
  🏠project
- SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes
- U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation
场景重建
- Neural Fields meet Explicit Geometric Representation for Inverse Rendering of Urban Scenes
  ⭐code
- Fast Monocular Scene Reconstruction With Global-Sparse Local-Dense Grids
- BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From a Single Image
3D场景生成
- Patch-Based 3D Natural Scene Generation From a Single Example
  🏠project
- Diffusion-Based Generation, Optimization, and Planning in 3D Scenes
- MIME: Human-Aware 3D Scene Generation
  🏠project
MVS
- Multi-View Stereo Representation Revist: Region-Aware MVSNet
- Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo
  ⭐code
- RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo
- GeoMVSNet: Learning Multi-View Stereo with Geometry Perception
  ⭐code
三维形状分类
- Robust 3D Shape Classification via Non-Local Graph Attention Network
三维图像
- Seeing a Rose in Five Thousand Ways
三维形状
- Conjugate Product Graphs for Globally Optimal 2D-3D Shape Matching
  ⭐code
- Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures
  ⭐code
- Diffusion-SDF: Text-To-Shape via Voxelized Diffusion
  ⭐code
三维形状生成 *Diffusion-Based Signed Distance Fields for 3D Shape Generation
- TAPS3D: Text-Guided 3D Textured Shape Generation From Pseudo Supervision
三维形状重建
- Teleidoscopic Imaging System for Microscale 3D Shape Reconstruction
- What You Can Reconstruct From a Shadow
3D动画
- RaBit: Parametric Modeling of 3D Biped Cartoon Characters With a Topological-Consistent Dataset
  🏠project
- MagicPony: Learning Articulated 3D Animals in the Wild
室内布局
- Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
  ⭐code
视频重建
- Learning Event Guided High Dynamic Range Video Reconstruction
  🏠project

9.Human Pose Estimation(人体姿态估计)

手势
人体
多人姿态预测
- Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting
  ⭐code
人体解析
- Semantic Human Parsing via Scalable Semantic Transfer over Multiple Label Domains
  ⭐code
姿势迁移
- Zero-shot Pose Transfer for Unrigged Stylized 3D Characters
  🏠project
Avatar
- X-Avatar: Expressive Human Avatars
  🏠project
- Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition
  🏠project

8.Action Detection(人体动作检测与识别)

Video Test-Time Adaptation for Action Recognition
A Large-Scale Robustness Analysis of Video Action Recognition Models
How Can Objects Help Action Recognitio
MMG-Ego4D: Multimodal Generalization in Egocentric Action Recognition
⭐code
Dual-Path Adaptation From Image to Video Transformers
⭐code
Hybrid Active Learning via Deep Clustering for Video Action Detection
🏠project
Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
Learning Action Changes by Measuring Verb-Adverb Textual Relationships
⭐code
STMixer: A One-Stage Sparse Action Detector
AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation
Search-Map-Search: A Frame Selection Paradigm for Action Recognition
On the Benefits of 3D Pose and Tracking for Human Action Recognition
⭐code
MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition
⭐code
SVFormer: Semi-Supervised Video Transformer for Action Recognition
基于骨架的动作识别
- Learning Discriminative Representations for Skeleton Based Action Recognition
- Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition
  🏠project
- 3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
- HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions
  ⭐code
- Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition
基于关键点的动作识别
- Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
时序动作识别
- TriDet: Temporal Action Detection with Relative Boundary Modeling
  ⭐code
- Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization
  ⭐code
- Post-Processing Temporal Action Detection
  ⭐code
- Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection
- PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization
开集动作识别
- Open Set Action Recognition via Multi-Label Evidential Learning
- Enlarging Instance-specific and Class-specific Information for Open-set Action Recognition
  ⭐code
基于MoCap的动作识别
- STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
  ⭐code
小样本动作识别
- MoLo: Motion-augmented Long-short Contrastive Learning for Few-shot Action Recognition
  ⭐code
- Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition
半监督动作识别
- TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
  ⭐code
时序动作定位
- Boosting Weakly-Supervised Temporal Action Localization with Text Information
  ⭐code
- Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks
- Two-Stream Networks for Weakly-Supervised Temporal Action Localization With Semantic-Aware Mechanisms
- Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization
- Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels
- Distilling Vision-Language Pre-Training To Collaborate With Weakly-Supervised Temporal Action Localization
- AdamsFormer for Spatial Action Localization in the Future
- Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
群组动作质量评估
- LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
  ⭐code
群体动作识别
- An Actor-Centric Causality Graph for Asynchronous Temporal Inference in Group Activity

7.Point Cloud(点云)

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent With Learned Distance Functions
Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning
Unsupervised Inference of Signed Distance Functions From Single Sparse Point Clouds Without Learning Priors
PointVector: A Vector Representation in Point Cloud Analysis
CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data
PointClustering: Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering
Adversarially Masking Synthetic To Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation
Parts2Words: Learning Joint Embedding of Point Clouds and Texts by Bidirectional Matching Between Parts and Words
Attention-Based Point Cloud Edge Sampling
Meta Architecture for Point Cloud Analysis
Building Rearticulable Models for Arbitrary 3D Objects From 4D Point Clouds
🏠project
Implicit Surface Contrastive Clustering for LiDAR Point Clouds
Poly-PC: A Polyhedral Network for Multiple Point Cloud Tasks at Once
TriVol: Point Cloud Rendering via Triple Volumes
PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations
PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds
Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
⭐code
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
SE-ORNet: Self-Ensembling Orientation-Aware Network fhttpsor Unsupervised Point Cloud Shape Correspondence
GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training
Neural Intrinsic Embedding for Non-rigid Point Cloud Matching
3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud
⭐code
SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds
GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training
SCPNet: Semantic Scene Completion on Point Cloud
NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds
⭐code
Rotation-Invariant Transformer for Point Cloud Matching
Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants with no False Negatives and no False Positives
🏠project
PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
⭐code
Unsupervised Inference of Signed Distance Functions from Single Sparse Point Clouds without Learning Priors
⭐code
Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent with Learned Distance Functions
⭐code
Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
Spatiotemporal Self-supervised Learning for Point Clouds in the Wild
⭐code
NerVE: Neural Volumetric Edges for Parametric Curve Extraction from Point Cloud
⭐code
IterativePFN: True Iterative Point Cloud Filtering
⭐code
Fast Point Cloud Generation With Straight Flows
GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds
3D点云
- Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
  ⭐code
- ToThePoint: Efficient Contrastive Learning of 3D Point Clouds via Recycling
- PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models
  🏠project
- Starting From Non-Parametric Networks for 3D Point Cloud Analysis
  ⭐code
- Learnable Skeleton-Aware 3D Point Cloud Sampling
- GraVoS: Voxel Selection for 3D Point-Cloud Detection
- MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds
  ⭐code
- NeuralPCI: Spatio-temporal Neural Field for 3D Point Cloud Multi-frame Non-linear Interpolation
  ⭐code
  ⭐code
- Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation
  ⭐code
点云实例分割
- ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution
点云分类
- PointCert: Point Cloud Classification with Deterministic Certified Robustness Guarantees
- CAP: Robust Point Cloud Classification via Semantic and Structural Modeling
- ViewNet: A Novel Projection-Based Backbone With View Pooling for Few-Shot Point Cloud Classification
点云补全
- ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer
  ⭐code
- Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion
- ACL-SPC: Adaptive Closed-Loop system for Self-Supervised Point Cloud Completion
  ⭐code
- AnchorFormer: Point Cloud Completion From Discriminative Nodes
  ⭐code
- Hyperspherical Embedding for Point Cloud Completion
点云配准
- Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration
  ⭐code
- PEAL: Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration
- Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration
- Robust Multiview Point Cloud Registration with Reliable Pose Graph Initialization and History Reweighting
  ⭐code
- BUFFER: Balancing Accuracy, Efficiency, and Generalizability in Point Cloud Registration
  ⭐code
点云理解
- Self-positioning Point-based Transformer for Point Cloud Understanding
  ⭐code
点云重建
- Learning To Measure the Point Cloud Reconstruction Loss in a Representation Space
点云匹配
- Neural Intrinsic Embedding for Non-Rigid Point Cloud Matching
点云分割 *Improving Graph Representation for Point Cloud Segmentation via Attentive Filtering
点云压缩
- Efficient Hierarchical Entropy Model for Learned Point Cloud Compression

6.Object Tracking(目标跟踪)

Data-Driven Feature Tracking for Event Cameras
Autoregressive Visual Tracking
⭐code
Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking
🏠project
Unifying Short and Long-Term Tracking With Graph Hierarchies
🏠project
VideoTrack: Learning To Track Objects via Video Transformer
Tracking Through Containers and Occluders in the Wild
🏠project
Frame-Event Alignment and Fusion Network for High Frame Rate Tracking
Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking
⭐code
Joint Visual Grounding and Tracking with Natural Language Specification
⭐code
Generalized Relation Modeling for Transformer Tracking
⭐code
SeqTrack: Sequence to Sequence Learning for Visual Object Tracking
Tracking through Containers and Occluders in the Wild
🏠project
DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks
⭐code
CXTrack: Improving 3D Point Cloud Tracking With Contextual Information
Representation Learning for Visual Object Tracking by Masked Appearance Transfer
⭐code
3D-POP - An Automated Annotation Approach to Facilitate Markerless 2D-3D Tracking of Freely Moving Birds With Marker-Based Motion Capture
多目标跟踪
多模态跟踪
- Visual Prompt Multi-Modal Tracking
  ⭐code
RGB-T tracking(可见光图像（RGB）和热红外图像（T）结合起来进行目标追踪)
- Bridging Search Region Interaction With Template for RGB-T Tracking
  ⭐code
- Efficient RGB-T Tracking via Cross-Modality Distillation

5.Object Detection(目标检测)

Angelic Patches for Improving Third-Party Object Detector Performance
STDLens: Model Hijacking-Resilient Federated Learning for Object Detection
Enhanced Training of Query-Based Object Detection via Selective Query Recollection
The Differentiable Lens: Compound Lens Search Over Glass Surfaces and Materials for Object Detection
Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains
⭐code
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
NeRF-RPN: A General Framework for Object Detection in NeRFs
⭐code
Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors
Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration
⭐code
Gaussian Label Distribution Learning for Spherical Image Object Detection
Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments
Towards Unsupervised Object Detection From LiDAR Point Clouds
🏠project
Mask DINO: Towards a Unified Transformer-Based Framework for Object Detection and Segmentation
⭐code
T-SEA: Transfer-Based Self-Ensemble Attack on Object Detection
⭐code
Recurrent Vision Transformers for Object Detection With Event Cameras
Learned Two-Plane Perspective Prior Based Image Resampling for Efficient Object Detection
Normalizing Flow Based Feature Synthesis for Outlier-Aware Object Detection
YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors
⭐code
MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding From Object Detection
⭐code
Doubly Right Object Recognition: A Why Prompt for Visual Rationales
Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection
⭐code
Unbalanced Optimal Transport: A Unified Framework for Object Detection
CLIP the Gap: A Single Domain Generalization Approach for Object Detection
Learning Transformations To Reduce the Geometric Shift in Object Detection
Object Detection With Self-Supervised Scene Adaptation
⭐code
Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR
⭐code
SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency
⭐code
Multiclass Confidence and Localization Calibration for Object Detection
⭐code
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
DynamicDet: A Unified Dynamic Architecture for Object Detection
⭐code
ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection
⭐code
Curricular Object Manipulation in LiDAR-based Object Detection
⭐code
STDLens: Model Hijacking-resilient Federated Learning for Object Detection
⭐code
What Can Human Sketches Do for Object Detection?
⭐code
Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects
⭐code
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
⭐code
Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
T-SEA: Transfer-based Self-Ensemble Attack on Object Detection
⭐code
👍CVPR 2023 | 北大提出T-SEA: 自集成策略实现更强的黑盒攻击迁移性
Knowledge Combination to Learn Rotated Detection Without Rotated Annotation
Universal Instance Perception as Object Discovery and Retrieval
⭐code
Continual Detection Transformer for Incremental Object Detection目标检测
Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains
⭐code目标检测
开放词汇目标检测
- Aligning Bag of Regions for Open-Vocabulary Object Detection
  ⭐code
- Region-Aware Pretraining for Open-Vocabulary Object Detection With Vision Transformers
- DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training via Word-Region Alignment
- OvarNet: Towards Open-vocabulary Object Attribute Recognition
  👍CVPR2023｜小红书提出 OvarNet 模型：开集预测的新SOTA，“万物识别”有了新玩法
- Learning To Detect and Segment for Open Vocabulary Object Detection
- Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
- Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
  ⭐code
- CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
- DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
开放世界目标检测
- Annealing-Based Label-Transfer Learning for Open World Object Detection
  ⭐code
- CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
- PROB: Probabilistic Objectness for Open World Object Detection
  ⭐code
- CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection
- Detecting Everything in the Open World: Towards Universal Object Detection
  ⭐code
  👍CVPR 2023 | 标注500类，检测7000类！清华大学等提出通用目标检测算法UniDetector
目标定位
- LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding
  🏠project
- Egocentric Audio-Visual Object Localization
- Unsupervised Object Localization: Observing the Background To Discover Objects
  ⭐code
- NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization
3D OD
- Virtual Sparse Convolution for Multimodal 3D Object Detection
  ⭐code
- Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection
- MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection
- BEVHeight: A Robust Framework for Vision-Based Roadside 3D Object Detection
- UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View
- PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection
  ⭐code
- AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection via Transformers
  ⭐code
- BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks
- 3D Video Object Detection With Learnable Object-Centric Global Optimization
  ⭐code
- ConQueR: Query Contrast Voxel-DETR for 3D Object Detection
  🏠project
- Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
  ⭐code
- Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection
  ⭐code
- Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection
  ⭐code
- Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark
  ⭐code
- Deep Dive Into Gradients: Better Optimization for 3D Object Detection With Gradient-Corrected IoU Supervision
  ⭐code
- AeDet: Azimuth-invariant Multi-view 3D Object Detection
  ⭐code
- FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection
  ⭐code
- PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer
- itKD: Interchange Transfer-Based Knowledge Distillation for 3D Object Detection
- OcTr: Octree-Based Transformer for 3D Object Detection
- MoDAR: Using Motion Forecasting for 3D Object Detection in Point Cloud Sequences
- Semi-Supervised Stereo-Based 3D Object Detection via Cross-View Consensus
- LinK: Linear Kernel for LiDAR-based 3D Perception
  ⭐code
- PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds
- PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer
  ⭐code
- 3D Video Object Detection with Learnable Object-Centric Global Optimization
  ⭐code
- Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection
  ⭐code
- X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection
  ⭐code
- Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving
- Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency
  ⭐code
- Viewpoint Equivariance for Multi-View 3D Object Detection
  ⭐code
- Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving
  ⭐code
- Collaboration Helps Camera Overtake LiDAR in 3D Detection
  ⭐code
  ⭐code
- OcTr: Octree-based Transformer for 3D Object Detection
- MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences
  ⭐code
- MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
- MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
  ⭐code
- NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
- VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
  ⭐code
- Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection
  ⭐code
- LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
  ⭐code
- PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
  ⭐code
- CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
  ⭐code
- Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection
  ⭐code
- Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection
  ⭐code3D目标检测
端到端目标检测
- Dense Distinct Query for End-to-End Object Detection
  ⭐code
半监督目标检测
- Active Teacher for Semi-Supervised Object Detection
  ⭐code
- Semi-DETR: Semi-Supervised Object Detection With Detection Transformers
- Consistent-Teacher: Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection
  ⭐code
- SOOD: Towards Semi-Supervised Oriented Object Detection
  ⭐code
- MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection
  ⭐code
弱监督目标检测
- DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection
小样本目标检测
- NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging
- Generating Features with Increased Crop-related Diversity for Few-Shot Object Detection
- Meta-tuning Loss Functions and Data Augmentation for Few-shot Object Detection
- DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
  ⭐code
域适应目标检测
- 2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection
- AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection
  ⭐code
- CIGAR: Cross-Modality Graph Reasoning for Domain Adaptive Object Detection
- Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection
  🏠project
- Domain Adaptive Detection Transformer With Information Fusion
- Harmonious Teacher for Cross-Domain Object Detection
- Contrastive Mean Teacher for Domain Adaptive Object Detectors
弱样本目标检测
- Weak-Shot Object Detection Through Mutual Knowledge Transfe
显著目标检测
- Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings
- Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection
- Modeling the Distributional Uncertainty for Salient Object Detection Models
  ⭐code
- Test Time Adaptation With Regularized Loss for Weakly Supervised Salient Object Detection
- Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection
红外目标检测
- Physically Adversarial Infrared Patches with Learnable Shapes and Locations
- TOPLight: Lightweight Neural Networks With Task-Oriented Pretraining for Visible-Infrared Recognition
伪装目标检测
- Feature Shrinkage Pyramid for Camouflaged Object Detection with Transformers
  ⭐code
  ⭐code
- Camouflaged Object Detection With Feature Decomposition and Edge Reconstruction
  ⭐code
密集目标检测
- Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection
  ⭐code
协同目标检测
- Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection
  ⭐code
- Co-Salient Object Detection With Uncertainty-Aware Group Exchange-Masking
点云目标检测
- Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation
目标发现
- Object Discovery from Motion-Guided Tokens
视频目标检测
- Feature Aggregated Queries for Transformer-Based Video Object Detectors
小目标检测
- Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection
  ⭐code
- Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection With Single Point Supervision
  ⭐code
- Distilling Scale-Aware Knowledge in Small Object Detector
- LSTFE-Net:Long Short-Term Feature Enhancement Network for Video Small Object Detection
  ⭐code
- 红外小目标检测
  - Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection with Single Point Supervision
    ⭐code
线段检测
- DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients
  ⭐code
目标导航
- CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation

4.Image Captioning(图像字幕生成)

视频字幕
图像字幕
story generation(视觉故事生成)
- Make-A-Story: Visual Memory Conditioned Consistent Story Generation
3D密集字幕
- End-to-End 3D Dense Captioning With Vote2Cap-DETR

3.Image Progress(低层图像处理、质量评价)

Initialization Noise in Image Gradients and Saliency Maps
Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models
⭐code
Tunable Convolutions with Parametric Multi-Loss Optimization
图像着色
- L-CoIns: Language-based Colorization with Instance Awareness
- 色彩恢复
  - GamutMLP: A Lightweight MLP for Color Loss Recovery
    🏠project
阴影去除
- ShadowDiffusion: When Degradation Prior Meets Diffusion Model for Shadow Removal
- Document Image Shadow Removal Guided by Color-Aware Background
- DANI-Net: Uncalibrated Photometric Stereo by Differentiable Shadow Handling, Anisotropic Reflectance Modeling, and Neural Inverse Rendering
图像恢复
- Efficient and Explicit Modelling of Image Hierarchies for Image Restoration
  ⭐code
- Visual Recognition-Driven Image Restoration for Multiple Degradation With Intrinsic Semantics Recovery
- Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions
- Generating Aligned Pseudo-Supervision From Non-Aligned Data for Image Restoration in Under-Display Camera
  ⭐code
- Comprehensive and Delicate: An Efficient Transformer for Image Restoration
- Ingredient-Oriented Multi-Degradation Learning for Image Restoration
- All-in-One Image Restoration for Unknown Degradations Using Adaptive Discriminative Filters for Specific Degradations
- Contrastive Semi-Supervised Learning for Underwater Image Restoration via Reliable Bank
  ⭐code
- Burstormer: Burst Image Restoration and Enhancement Transformer
- Generating Aligned Pseudo-Supervision from Non-Aligned Data for Image Restoration in Under-Display Camera
  ⭐code
- Generative Diffusion Prior for Unified Image Restoration and Enhancement
- Bitstream-Corrupted JPEG Images are Restorable: Two-stage Compensation and Alignment Framework for Image Restoration
  ⭐code
- Learning Distortion Invariant Representation for Image Restoration From a Causality Perspective
  ⭐code
- Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack
- Robust Unsupervised StyleGAN Image Restoration
  🏠project
图像修复
- NUWA-LIP: Language-guided Image Inpainting with Defect-free VQGAN
  ⭐code
- Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
- SmartBrush: Text and Shape Guided Object Inpainting With Diffusion Model
视频恢复
- A Simple Baseline for Video Restoration With Grouped Spatial-Temporal Shift
  ⭐code
视频修复
- Deep Stereo Video Inpainting
- Semi-Supervised Video Inpainting With Cycle Consistency Constraints
图像照明
- Controllable Light Diffusion for Portraits
图像质量评估
- Quality-aware Pre-trained Models for Blind Image Quality Assessment
- Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
  ⭐code
- Quality-Aware Pre-Trained Models for Blind Image Quality Assessment
- An Image Quality Assessment Dataset for Portraits
  ⭐code
- Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild
去雾
- Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior
  ⭐code
- Curricular Contrastive Regularization for Physics-aware Single Image Dehazing
- Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing
- Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring
  ⭐code
- RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors
去雨
- Learning A Sparse Transformer Network for Effective Image Deraining
  ⭐code
- SmartAssign: Learning a Smart Knowledge Assignment Strategy for Deraining and Desnowing
  🏠project
去噪
- Masked Image Training for Generalizable Deep Image Denoising
- Real-Time Controllable Denoising for Image and Video
- Patch-Craft Self-Supervised Training for Correlated Image Denoising
- Polarized Color Image Denoising
- sRGB Real Noise Synthesizing With Neighboring Correlation-Aware Noise Model
  ⭐code
- Zero-Shot Noise2Noise: Efficient Image Denoising Without Any Data
  🏠project
- HouseDiffusion: Vector Floorplan Generation via a Diffusion Model With Discrete and Continuous Denoising
  🏠project
- Structure Aggregation for Cross-Spectral Stereo Image Guided Denoising
  ⭐code
- Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising
  ⭐code
- Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising
  ⭐code
- Real-time Controllable Denoising for Image and Video
- LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising
  ⭐code
- Efficient View Synthesis and 3D-based Multi-Frame Denoising with Multiplane Feature Representations
- Learning with Noisy labels via Self-supervised Adversarial Noisy Masking去噪
- Learning from Noisy Labels with Decoupled Meta Label Purifier去噪
去模糊
- HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering
  ⭐code
- Neumann Network With Recursive Kernels for Single Image Defocus Deblurring
- K3DN: Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring
- Uncertainty-Aware Unsupervised Image Deblurring With Deep Residual Prior
- $\text{DC}^2$: Dual-Camera Defocus Control by Learning to Refocus
  ⭐code去模糊
- Self-Supervised Non-Uniform Kernel Estimation With Flow-Based Motion Prior for Blind Image Deblurring
  🏠project
- Joint Video Multi-Frame Interpolation and Deblurring Under Unknown Exposure Time
  ⭐code
- Event-Based Frame Interpolation With Ad-Hoc Deblurring
- Deep Discriminative Spatial and Temporal Network for Efficient Video Deblurring
去鬼影
- A Unified HDR Imaging Method with Pixel and Patch Level
- SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders
去反射光斑
- Nighttime Smartphone Reflective Flare Removal Using Optical Center Symmetry Prior
  ⭐code
image deweathering
- WeatherStream: Light Transport Automation of Single Image Deweathering
  🏠project
图像缩放
- HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization
  ⭐code
- Real-time 6K Image Rescaling with Rate-distortion Optimization
  ⭐code
瞬间恢复与增强
- Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement
图像增强
- Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement
- Realistic Saliency Guided Image Enhancement
- Learning a Simple Low-Light Image Enhancer From Paired Low-Light Instances
  ⭐code
- Low-Light Image Enhancement via Structure Modeling and Guidance
- You Do Not Need Additional Priors or Regularizers in Retinex-Based Low-Light Image Enhancement
图像和谐化
- LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization
- Semi-supervised Parametric Real-world Image Harmonization
- PCT-Net: Full Resolution Image Harmonization Using Pixel-Wise Color Transformations
图像曝光校正
- Decoupling-and-Aggregating for Image Exposure Correction
物体移除
- Automatic High Resolution Wire Segmentation and Removal
  ⭐code
Image Decomposition
- Light Source Separation and Intrinsic Image Decomposition Under AC Illumination
- Context-aware Pretraining for Efficient Blind Image Decomposition
- Unsupervised Intrinsic Image Decomposition With LiDAR Intensity
图像重建
- Raw Image Reconstruction With Learned Compact Metadata
  ⭐code
- Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder
- High-Resolution Image Reconstruction With Latent Diffusion Models From Human Brain Activity
  🏠project
- PermutoSDF: Fast Multi-View Reconstruction with Implicit Surfaces using Permutohedral Lattices
  🏠project
文本驱动的图像处理
- DeltaEdit: Exploring Text-Free Training for Text-Driven Image Manipulation
  ⭐code
运动模糊
- Self-supervised Blind Motion Deblurring with Deep Expectation Maximization
图像裁剪
- Image Cropping With Spatial-Aware Feature and Rank Consistency
图像重照明
- Weakly-supervised Single-view Image Relighting
  🏠project
- SunStage: Portrait Reconstruction and Relighting Using the Sun as a Light Stage
模糊帧插值
- Event-Based Blurry Frame Interpolation Under Blind Exposure

2.Image Segmentation(图像分割)

MED-VT: Multiscale Encoder-Decoder Video Transformer With Application To Object Segmentation
SimpSON: Simplifying Photo Cleanup With Single-Click Distracting Object Segmentation Network
Towards Open-World Segmentation of Parts
Heat Diffusion Based Multi-Scale and Geometric Structure-Aware Transformer for Mesh Segmentation
MOVES: Manipulated Objects in Video Enable Segmentation
Decoupled Semantic Prototypes Enable Learning From Diverse Annotation Types for Semi-Weakly Segmentation in Expert-Driven Domains
Compositor: Bottom-Up Clustering and Compositing for Robust Part and Object Segmentation
VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation
⭐code
Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervisio
OneFormer: One Transformer To Rule Universal Image Segmentation
🏠project
PanelNet: Understanding 360 Indoor Environment via Panel Representation
AutoFocusFormer: Image Segmentation off the Grid
MP-Former: Mask-Piloted Transformer for Image Segmentation
⭐code
Explicit Visual Prompting for Low-Level Structure Segmentations
⭐code
Focused and Collaborative Feedback Integration for Interactive Image Segmentation
⭐code
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
🏠project
在 VIS、VOS、MOTS 三个下游视频分割任务的五个数据集上，将 InstMove 插入到现有 SOTA 模型可以进一步带来 1~5 个点的提升。
MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation分割
零样本分割
- Primitive Generation and Semantic-Related Alignment for Universal Zero-Shot Segmentation
  🏠project
  👍CVPR23 | 浙大、NTU提出零样本通用分割框架PADing
3D分割
- EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision
  🏠project
全景分割
- CoMFormer: Continual Learning in Semantic and Panoptic Segmentation
- Center Focusing Network for Real-Time LiDAR Panoptic Segmentation
- Context-Aware Relative Object Queries To Unify Video Instance and Panoptic Segmentation
- 实时全景分割
  - You Only Segment Once: Towards Real-Time Panoptic Segmentation
    ⭐code
- 域适应全景分割
  - UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration
- 开放词汇全景分割
  - Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models
    ⭐code
实例分割
- DynaMask: Dynamic Mask Selection for Instance Segmentation
  ⭐code
- Tree Instance Segmentation With Temporal Contour Graph
- Hi4D: 4D Instance Segmentation of Close Human Interaction
- Beyond mAP: Towards Better Evaluation of Instance Segmentation
- Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt
- Cut and Learn for Unsupervised Object Detection and Instance Segmentation
  ⭐code
- PartDistillation: Learning Parts From Instance Segmentation
  ⭐code
- Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections
  ⭐code
- AttentionShift: Iteratively Estimated Part-Based Attention Map for Pointly Supervised Instance Segmentation
- DoNet: Deep De-overlapping Network for Cytology Instance Segmentation
  ⭐code
- FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
  ⭐code
- Camouflaged Instance Segmentation via Explicit De-Camouflaging
- 无监督实例分割
  - Exemplar-FreeSOLO: Enhancing Unsupervised Instance Segmentation With Exemplars
- 弱监督实例分割
- 开放词汇实例分割
  - Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
    ⭐code
- 零样本实例分割
  - Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation
    🏠project
语义分割
- IFSeg: Image-free Semantic Segmentation via Vision-Language Model
  ⭐code
- Transformer Scale Gate for Semantic Segmentation
- Towards Better Stability and Adaptability: Improve Online Self-Training for Model Adaptation in Semantic Segmentation
- BAEFormer: Bi-Directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation
- Combining Implicit-Explicit View Correlation for Light Field Semantic Segmentation
- Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on the Edge
- Less Is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
- SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation
- PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers
- Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions
- PeakConv: Learning Peak Receptive Field for Radar Semantic Segmentation
  ⭐code
- Understanding Imbalanced Semantic Segmentation Through Neural Collapse
  ⭐code
- Geometry and Uncertainty-Aware 3D Point Cloud Class-Incremental Semantic Segmentation
  ⭐code
- Single Domain Generalization for LiDAR Semantic Segmentation
  ⭐code
- FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation
- Proximal Splitting Adversarial Attack for Semantic Segmentation
  ⭐code
- On Calibrating Semantic Segmentation Models: Analyses and an Algorithm
- Incrementer: Transformer for Class-Incremental Semantic Segmentation With Knowledge Distillation Focusing on Old Class
- Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers
- Endpoints Weight Fusion for Class Incremental Semantic Segmentation
- Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures
  ⭐code
- ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation
- Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation
- Dynamic Focus-Aware Positional Queries for Semantic Segmentation
  ⭐code
- Continual Semantic Segmentation With Automatic Memory Sample Selection
- Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision
- Dynamically Instance-Guided Adaptation: A Backward-Free Approach for Test-Time Domain Adaptive Semantic Segmentation
  ⭐code
- Federated Incremental Semantic Segmentation
  ⭐code
- Delivering Arbitrary-Modal Semantic Segmentation
  ⭐code
- Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
- A Simple Framework for Text-Supervised Semantic Segmentation
  ⭐code
  在 PASCAL VOC 2012、PASCAL Context 和 COCO 数据集上的表现明显优于之前最先进的方法。
- Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
- Generative Semantic Segmentation
  ⭐code
- Reliability in Semantic Segmentation: Are We on the Right Track?
  ⭐code
- Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
- Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
  ⭐code
- Instant Domain Augmentation for LiDAR Semantic Segmentation
  🏠project
- Delving into Shape-aware Zero-shot Semantic Segmentation
  ⭐code
- 开放词汇语义分割
  - Open Vocabulary Semantic Segmentation With Patch Aligned Contrastive Learning
  - Open-Vocabulary Semantic Segmentation With Mask-Adapted CLIP
    🏠project
  - Side Adapter Network for Open-Vocabulary Semantic Segmentation
    ⭐code
    👍CVPR2023 Highlight | Side Adapter Network – 极致轻薄却性能强劲的开放词汇语义分割器
- 开放世界语义分割
  - Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
    ⭐code
- 域适应语义分割
  - DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation
    ⭐code
  - Weakly-Supervised Domain Adaptive Semantic Segmentation With Prototypical Contrastive Learning
  - Continuous Pseudo-Label Rectified Domain Adaptive Semantic Segmentation With Implicit Neural Representations
    ⭐code
- 域泛化语义分割
  - HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation
    ⭐code
  - Style Projected Clustering for Domain Generalized Semantic Segmentation
    ⭐code
- 无监督语义分割
  - Leveraging Hidden Positives for Unsupervised Semantic Segmentation
    ⭐code
  - Network-Free, Unsupervised Semantic Segmentation With Synthetic Images
- 半监督语义分割
  - Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation
    ⭐code
  - Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation
  - Hunting Sparsity: Density-Guided Contrastive Learning for Semi-Supervised Semantic Segmentation
    ⭐code
  - Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation
  - LaserMix for Semi-Supervised LiDAR Semantic Segmentation
    ⭐code
  - Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation
  - Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation
- 弱监督语义分割
  - Token Contrast for Weakly-Supervised Semantic Segmentation
    ⭐code
  - CLIP Is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation
  - Boundary-Enhanced Co-Training for Weakly Supervised Semantic Segmentation
    ⭐code
  - Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation
  - Weakly Supervised Semantic Segmentation via Adversarial Learning of Classifier and Reconstructor
    ⭐code
- 自监督语义分割
  - CLIP-S4: Language-Guided Self-Supervised Semantic Segmentation
- 点云语义分割
  - Novel Class Discovery for 3D Point Cloud Semantic Segmentation
    ⭐code
- 零样本语义分割
  - Delving Into Shape-Aware Zero-Shot Semantic Segmentation
    ⭐code
  - ZegCLIP: Towards Adapting CLIP for Zero-Shot Semantic Segmentation
- 小样本语义分割
  - MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation
    ⭐code
  - A Strong Baseline for Generalized Few-Shot Semantic Segmentation
    ⭐code
  - Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation
- 长尾语义分割
  - Balancing Logit Variation for Long-Tailed Semantic Segmentation
    ⭐code
- 3D 语义分割
  - Seg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving
    ⭐code
  - 3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds
    ⭐code
- 开集语义分割
  - Open-Set Semantic Segmentation for Point Clouds via Adversarial Prototype Framework
交互式分割
- Interactive Segmentation as Gaussian Process Classification
  ⭐code
- Interactive Segmentation of Radiance Fields
  🏠project
- Efficient Mask Correction for Click-Based Interactive Image Segmentation
  ⭐code
小样本分割
- Hierarchical Dense Correlation Distillation for Few-Shot Segmentation
  ⭐code
- Rethinking the Correlation in Few-Shot Segmentation: A Buoys View
VSS
- Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
  ⭐code
- Simultaneously Short- and Long-Term Temporal Modeling for Semi-Supervised Video Semantic Segmentation
- Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation
  ⭐code
VOS
- InstMove: Instance Motion for Object-centric Video Segmentation
  ⭐code
- Breaking the "Object" in Video Object Segmentation
- Look Before You Match: Instance Understanding Matters in Video Object Segmentation
- MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
- Boosting Video Object Segmentation via Space-time Correspondence Learning
  ⭐code
- Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual GroupingVOS
- Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
  ⭐code
- Two-shot Video Object Segmentation
  ⭐code
- Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping
VIS
- Mask-Free Video Instance Segmentation
  ⭐code
  🏠project
  ⭐code
- MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos
  ⭐code
- A Generalized Framework for Video Instance Segmentation
  ⭐code
场景理解
- FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding
- SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
  🏠project
- Movies2Scenes: Using Movie Metadata To Learn Scene Representation
- Seeing With Sound: Long-range Acoustic Beamforming for Multimodal Scene Understanding
- Single View Scene Scale Estimation Using Scale Field
- Neural Part Priors: Learning To Optimize Part-Based Object Completion in RGB-D Scans
- 3D 场景理解
抠图
- Adaptive Human Matting for Dynamic Videos
  ⭐code
- Mask-Guided Matting in the Wild
- Ultrahigh Resolution Image/Video Matting With Spatio-Temporal Sparsity
  ⭐code
- End-to-End Video Matting With Trimap Propagation
  ⭐code
- Referring Image Matting
  ⭐code
指代图像分割
- PolyFormer: Referring Image Segmentation As Sequential Polygon Generation
  🏠project
- Zero-shot Referring Image Segmentation with Global-Local Context Features
  ⭐code
- Contrastive Grouping With Transformer for Referring Image Segmentation
引用表达分割
- GRES: Generalized Referring Expression Segmentation
  🏠project
  👍CVPR23 Highlight 多模态新任务、新数据集：NTU提出广义引用分割问题GRES
- Meta Compositional Referring Expression Segmentation
- Learning to Segment Every Referring Object Point by Point
  ⭐code
运动分割
- Unsupervised Space-Time Network for Temporally-Consistent Segmentation of Multiple Motions
视频分割
- TarViS: A Unified Approach for Target-Based Video Segmentation
  ⭐code
- Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
动作分割
- ASPnet: Action Segmentation With Shared-Private Representation of Multiple Data Sources
- Reducing the Label Bias for Timestamp Supervised Temporal Action Segmentation

1.other(其它,待分类)

CIRCLE: Capture in Rich Contextual Environments
Trainable Projected Gradient Method for Robust Fine-Tuning
HDR Imaging With Spatially Varying Signal-to-Noise Ratios
Are Deep Neural Networks SMARTer Than Second Graders?
Blowing in the Wind: CycleNet for Human Cinemagraphs From Still Images
Uncertainty-Aware Vision-Based Metric Cross-View Geolocalization
pCON: Polarimetric Coordinate Networks for Neural Scene Representations
Two-Stage Co-Segmentation Network Based on Discriminative Representation for Recovering Human Mesh From Videos
Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate
Implicit View-Time Interpolation of Stereo Videos Using Multi-Plane Disparities and Non-Uniform Coordinates
LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction
Stare at What You See: Masked Image Modeling Without Reconstruction
Neural Kaleidoscopic Space Sculpting
HyperCUT: Video Sequence From a Single Blurry Image Using Unsupervised Ordering
Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders
Edges to Shapes to Concepts: Adversarial Augmentation for Robust Vision
Improved Distribution Matching for Dataset Condensation
Slimmable Dataset Condensation
LEGO-Net: Learning Regular Rearrangements of Objects in Rooms
Neuralizer: General Neuroimage Analysis Without Re-Training
DETRs With Hybrid Matching
⭐code
A Rotation-Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization
A-La-Carte Prompt Tuning (APT): Combining Distinct Data via Composable Prompting
Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery From Sparse Image Ensemble
Decentralized Learning With Multi-Headed Distillation
On the Convergence of IRLS and Its Variants in Outlier-Robust Estimation
Learning Joint Latent Space EBM Prior Model for Multi-Layer Generator
Knowledge Combination To Learn Rotated Detection Without Rotated Annotation
FlowGrad: Controlling the Output of Generative ODEs With Gradients
Recurrent Homography Estimation Using Homography-Guided Image Warping and Focus Transformer
Multi-View Inverse Rendering for Large-Scale Real-World Indoor Scenes
Boosting Transductive Few-Shot Fine-Tuning With Margin-Based Uncertainty Weighting and Probability Regularization
BiasAdv: Bias-Adversarial Augmentation for Model Debiasing
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion
Why Is the Winner the Best?
HGNet: Learning Hierarchical Geometry From Points, Edges, and Surfaces
Revisiting the P3P Problem
RiDDLE: Reversible and Diversified De-Identification With Latent Encryptor
BASiS: Batch Aligned Spectral Embedding Space
CRAFT: Concept Recursive Activation FacTorization for Explainability
Infinite Photorealistic Worlds using Procedural Generation
All-in-Focus Imaging From Event Focal Stack
Learning 3D Scene Priors With 2D Supervision
NeuWigs: A Neural Dynamic Model for Volumetric Hair Capture and Animation
CLIPPO: Image-and-Language Understanding from Pixels Only
⭐code
Towards Bridging the Performance Gaps of Joint Energy-Based Models
expOSE: Accurate Initialization-Free Projective Factorization Using Exponential Regularization
Learning Debiased Representations via Conditional Attribute Interpolation
Learning Neural Volumetric Representations of Dynamic Humans in Minutes
Bayesian Posterior Approximation With Stochastic Ensembles
RILS: Masked Visual Reconstruction in Language Semantic Space
RepMode: Learning to Re-Parameterize Diverse Experts for Subcellular Structure Prediction
Zero-Shot Model Diagnosis
Improving Visual Grounding by Encouraging Consistent Gradient-Based Explanations
⭐code
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning With Masked Autoencoders
Understanding and Improving Visual Prompting: A Label-Mapping Perspective
DegAE: A New Pretraining Paradigm for Low-Level Vision
LiDAR-in-the-Loop Hyperparameter Optimization
Understanding Deep Generative Models With Generalized Empirical Likelihoods
Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning
Compressing Volumetric Radiance Fields to 1 MB
⭐code
Label Information Bottleneck for Label Enhancement
⭐code
DNF: Decouple and Feedback Network for Seeing in the Dark
Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-World
How To Prevent the Continuous Damage of Noises To Model Training?
ActMAD: Activation Matching To Align Distributions for Test-Time-Training
🏠project
Leveraging Temporal Context in Low Representational Power Regimes
🏠project
Guided Recommendation for Model Fine-Tuning
OT-Filter: An Optimal Transport Filter for Learning With Noisy Labels
E2PN: Efficient SE(3)-Equivariant Point Network
⭐code
Understanding Masked Image Modeling via Learning Occlusion Invariant Feature
Fine-Tuned CLIP Models Are Efficient Video Learners
⭐code
Visual Recognition by Request
Stitchable Neural Networks
🏠project
RUST: Latent Neural Scene Representations From Unposed Imagery
⭐code
Spatio-Focal Bidirectional Disparity Estimation From a Dual-Pixel Image
Four-View Geometry With Unknown Radial Distortion
Learning Optical Expansion From Scale Matching
⭐code
Don't Lie to Me! Robust and Efficient Explainability With Verified Perturbation Analysis
⭐code
Learning Transformation-Predictive Representations for Detection and Description of Local Features
Two-Way Multi-Label Loss
⭐code
Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
⭐code
Dionysus: Recovering Scene Structures by Dividing Into Semantic Pieces
Noisy Correspondence Learning With Meta Similarity Correction
HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics
Modeling Entities As Semantic Points for Visual Information Extraction in the Wild
🏠project
NeAT: Learning Neural Implicit Surfaces With Arbitrary Topologies From Multi-View Images
Learning a Deep Color Difference Metric for Photographic Images
DINN360: Deformable Invertible Neural Network for Latitude-Aware 360deg Image Rescaling
⭐code
Finetune Like You Pretrain: Improved Finetuning of Zero-Shot Vision Models
⭐code
Learning a Practical SDR-to-HDRTV Up-Conversion Using New Dataset and Degradation Models
⭐code
DynaFed: Tackling Client Data Heterogeneity With Global Dynamics
CUF: Continuous Upsampling Filters
Learning Decorrelated Representations Efficiently Using Fast Fourier Transform
Practical Network Acceleration With Tiny Sets
AstroNet: When Astrocyte Meets Artificial Neural Network
NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360deg Views
⭐code
Command-Driven Articulated Object Understanding and Manipulation
⭐code
HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes With Iterative Intertwined Regularization
⭐code
Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction
⭐code
Gradient-Based Uncertainty Attribution for Explainable Bayesian Deep Learning
Class Adaptive Network Calibration
⭐code
OCTET: Object-Aware Counterfactual Explanations
⭐code
DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos
FFF: Fragment-Guided Flexible Fitting for Building Complete Protein Structures
Open-Set Representation Learning Through Combinatorial Embedding
A Unified HDR Imaging Method With Pixel and Patch Level
Accelerated Coordinate Encoding: Learning to Relocalize in Minutes Using RGB and Poses
⭐code
Switchable Representation Learning Framework With Self-Compatibility
Exploring and Utilizing Pattern Imbalance
Top-Down Visual Attention From Analysis by Synthesis
🏠project
Interactive Cartoonization With Controllable Perceptual Factors
Regularize Implicit Neural Representation by Itself
Delving Into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
Re-Basin via Implicit Sinkhorn Differentiation
⭐code
Towards Effective Visual Representations for Partial-Label Learning
Samples With Low Loss Curvature Improve Data Efficiency
⭐code
Learning Correspondence Uncertainty via Differentiable Nonlinear Least Squares
Tunable Convolutions With Parametric Multi-Loss Optimization
RelightableHands: Efficient Neural Relighting of Articulated Hand Models
🏠project
DyNCA: Real-Time Dynamic Texture Synthesis Using Neural Cellular Automata
🏠project
Token Turing Machines
⭐code
Probabilistic Debiasing of Scene Graphs
⭐code
Few-Shot Non-Line-of-Sight Imaging With Signal-Surface Collaborative Regularization
The Dark Side of Dynamic Routing Neural Networks: Towards Efficiency Backdoor Injection
Generalized Decoding for Pixel, Image, and Language
🏠project
EC2: Emergent Communication for Embodied Control
Generalizable Local Feature Pre-Training for Deformable Shape Analysis
⭐code
On-the-Fly Category Discovery
⭐code
PyramidFlow: High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow
Efficient Verification of Neural Networks Against LVM-Based Specifications
TensoIR: Tensorial Inverse Rendering
🏠project
Learning From Unique Perspectives: User-Aware Saliency Modeling
LargeKernel3D: Scaling Up Kernels in 3D Sparse CNNs
⭐code
Learning Transferable Spatiotemporal Representations From Natural Script Knowledge
⭐code
FFCV: Accelerating Training by Removing Data Bottlenecks
🏠project
Semidefinite Relaxations for Robust Multiview Triangulation
GradICON: Approximate Diffeomorphisms via Gradient Inverse Consistency
⭐code
Polynomial Implicit Neural Representations for Large Diverse Datasets
⭐code
Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption
Learning To Zoom and Unzoom
🏠project
Masked Image Modeling With Local Multi-Scale Reconstruction
Neural Vector Fields: Implicit Representation by Explicit Learning
⭐code
Rate Gradient Approximation Attack Threats Deep Spiking Neural Networks
⭐code
Critical Learning Periods for Multisensory Integration in Deep Networks
Imitation Learning as State Matching via Differentiable Physics
⭐code
Probing Sentiment-Oriented Pre-Training Inspired by Human Sentiment Perception Mechanism
⭐code
Relightable Neural Human Assets From Multi-View Gradient Illuminations
⭐code
DINER: Disorder-Invariant Implicit Neural Representation
Robust Mean Teacher for Continual and Gradual Test-Time Adaptation
⭐code
A Probabilistic Framework for Lifelong Test-Time Adaptation
⭐code
Probing Neural Representations of Scene Perception in a Hippocampally Dependent Task Using Artificial Neural Networks
Decoupling Human and Camera Motion From Videos in the Wild
🏠project
DISC: Learning From Noisy Labels via Dynamic Instance-Specific Selection and Correction
⭐code
DC2: Dual-Camera Defocus Control by Learning To Refocus
FJMP: Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction Graphs
"Seeing" Electric Network Frequency From Events
🏠project
Confidential and Private Decentralized Learning Based on Encryption-Friendly Distillation Loss
⭐code
Revealing the Dark Secrets of Masked Image Modeling
RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer
Adaptive Graph Convolutional Subspace Clustering
Graph Representation for Order-Aware Visual Transformation
Train-Once-for-All Personalization
Learning Sample Relationship for Exposure Correction
EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata
🏠project
Gradient norm aware minimization seeks first-order flatness and improves generalization
⭐code
👍CVPR2023｜清华大学提出GAM：神经网络“一阶平滑优化器”，显著提升模型“泛化能力”
EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata
🏠project
InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts
Deep Deterministic Uncertainty: A New Simple Baseline
WIRE: Wavelet Implicit Neural Representations
Learning From Noisy Labels With Decoupled Meta Label Purifier
Architectural Backdoors in Neural Networks
Event-Based Shape From Polarization
Deep Hashing With Minimal-Distance-Separated Hash Centers
Progressive Spatio-Temporal Alignment for Efficient Event-Based Motion Estimation
⭐code
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation
🏠project
MetaCLUE: Towards Comprehensive Visual Metaphors Research
🏠project
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
⭐code
Sliced Optimal Partial Transport
Deep Learning of Partial Graph Matching via Differentiable Top-K
⭐code
Unsupervised Volumetric Animation
🏠project
Passive Micron-Scale Time-of-Flight With Sunlight Interferometry
Generalizable Implicit Neural Representations via Instance Pattern Composers
⭐code
On the Pitfall of Mixup for Uncertainty Calibration
UMat: Uncertainty-Aware Single Image High Resolution Material Capture
On Data Scaling in Masked Image Modeling
End-to-End Vectorized HD-Map Construction With Piecewise Bezier Curve
⭐code
Boundary Unlearning: Rapid Forgetting of Deep Networks via Shifting the Decision Boundary
MobileOne: An Improved One millisecond Mobile Backbone
⭐code
Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization
Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning
⭐code
Residual Degradation Learning Unfolding Framework With Mixing Priors Across Spectral and Spatial for Compressive Spectral Imaging
Robust and Scalable Gaussian Process Regression and Its Applications
⭐code
NeuralUDF: Learning Unsigned Distance Fields for Multi-View Reconstruction of Surfaces With Arbitrary Topologies
🏠project
Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations
Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations
⭐code
Multiplicative Fourier Level of Detail
VGFlow: Visibility guided Flow Network for Human Reposing
Neural Dependencies Emerging From Learning Massive Categories
MaLP: Manipulation Localization Using a Proactive Scheme
🏠project
Efficient Robust Principal Component Analysis via Block Krylov Iteration and CUR Decomposition
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
⭐code
Learning 3D Representations From 2D Pre-Trained Models via Image-to-Point Masked Autoencoders
⭐code
MEGANE: Morphable Eyeglass and Avatar Network
🏠project
Solving relaxations of MAP-MRF problems: Combinatorial in-face Frank-Wolfe directions
EXCALIBUR: Encouraging and Evaluating Embodied Exploration
Learning To Predict Scene-Level Implicit 3D From Posed RGBD Data
SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries
🏠project
Learning Neural Parametric Head Models
🏠project
Integral Neural Networks
Simulated Annealing in Early Layers Leads to Better Generalization
Fresnel Microfacet BRDF: Unification of Polari-Radiometric Surface-Body Reflection
Improving Visual Representation Learning Through Perceptual Understanding
Probability-Based Global Cross-Modal Upsampling for Pansharpening
⭐code
SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy
Megahertz Light Steering Without Moving Parts
TempSAL - Uncovering Temporal Information for Deep Saliency Prediction
🏠project
Affection: Learning Affective Explanations for Real-World Visual Data
🏠project
Metadata-Based RAW Reconstruction via Implicit Neural Functions
Coaching a Teachable Student
Progressive Transformation Learning for Leveraging Virtual Images in Training
NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling
Spatial-Temporal Concept Based Explanation of 3D ConvNets
Overlooked Factors in Concept-Based Explanations: Dataset Choice, Concept Learnability, and Human Capability
⭐code
Neural Fourier Filter Bank
⭐code
ECON: Explicit Clothed Humans Optimized via Normal Integration
⭐code
Autonomous Manipulation Learning for Similar Deformable Objects via Only One Demonstration
Plateau-Reduced Differentiable Path Tracing
🏠project
Test Time Adaptation With Transformation Invariance
⭐code
Learning To Exploit the Sequence-Specific Prior Knowledge for Image Processing Pipelines Optimization
Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric
🏠project
CUDA: Convolution-based Unlearnable Datasets
Efficient On-Device Training via Gradient Filtering
Transfer Knowledge From Head to Tail: Uncertainty Calibration Under Long-Tailed Distribution
Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning
Disentangled Representation Learning for Unsupervised Neural Quantization
DA Wand: Distortion-Aware Selection Using Neural Mesh Parameterization
⭐code
🏠project
On Distillation of Guided Diffusion Models
Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes
⭐code
K-Planes: Explicit Radiance Fields in Space, Time, and Appearance
🏠project
Understanding Masked Autoencoders via Hierarchical Latent Variable Models
Co-Training 2L Submodels for Visual Recognition
Masked Images Are Counterfactual Samples for Robust Fine-Tuning
⭐code
Learning Customized Visual Models With Retrieval-Augmented Knowledge
A Unified Spatial-Angular Structured Light for Single-View Acquisition of Shape and Reflectance
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery
⭐code
Reproducible Scaling Laws for Contrastive Language-Image Learning
⭐code
Intrinsic Physical Concepts Discovery With Object-Centric Predictive Models
Invertible Neural Skinning
🏠project
Multi-Object Manipulation via Object-Centric Neural Scattering Functions
Fair Scratch Tickets: Finding Fair Sparse Networks Without Weight Training
Backdoor Cleansing With Unlabeled Data
⭐code
Full or Weak Annotations? An Adaptive Strategy for Budget-Constrained Annotation Campaigns
Extracting Class Activation Maps From Non-Discriminative Features As Well
Executing Your Commands via Motion Diffusion in Latent Space
Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations
🏠project
Learning To Generate Image Embeddings With User-Level Differential Privacy
Revisiting the Stack-Based Inverse Tone Mapping
PACO: Parts and Attributes of Common Objects
⭐code
Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models
A General Regret Bound of Preconditioned Gradient Method for DNN Training
⭐code
A Practical Upper Bound for the Worst-Case Attribution Deviations
Perception and Semantic Aware Regularization for Sequential Confidence Calibration
⭐code
Deep Random Projector: Accelerated Deep Image Prior
⭐[code](https://github.com/sun- umn/DeepRandom-Projector)
Bias Mimicking: A Simple Sampling Approach for Bias Mitigation
⭐code
DeCo: Decomposition and Reconstruction for Compositional Temporal Grounding via Coarse-To-Fine Contrastive Ranking
Structured Kernel Estimation for Photon-Limited Deconvolution
⭐code
FlexiViT: One Model for All Patch Sizes
⭐code
BiasBed - Rigorous Texture Bias Evaluation
⭐code
GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction
⭐code
Finding Geometric Models by Clustering in the Consensus Space
⭐code
Hierarchical Neural Memory Network for Low Latency Event Processing
🏠project
Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries
⭐code
PointConvFormer: Revenge of the Point-Based Convolution
A Practical Stereo Depth System for Smart Glasses
Differentiable Shadow Mapping for Efficient Inverse Graphics
Multi Domain Learning for Motion Magnification
⭐code
Re-Thinking Model Inversion Attacks Against Deep Neural Networks
⭐code
DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects
🏠project
Two-View Geometry Scoring Without Correspondences
🏠project
ScanDMM: A Deep Markov Model of Scanpath Prediction for 360deg Images
⭐code
Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation
Analyzing Physical Impacts Using Transient Surface Wave Imaging
Adaptive Global Decay Process for Event Cameras
⭐code
Leveraging Inter-Rater Agreement for Classification in the Presence of Noisy Labels
Towards Better Gradient Consistency for Neural Signed Distance Functions via Level Set Alignment
⭐code
Swept-Angle Synthetic Wavelength Interferometry
Shape, Pose, and Appearance From a Single Image via Bootstrapped Radiance Field Inversion
🏠project
Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples
⭐code
3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification
EcoTTA: Memory-Efficient Continual Test-Time Adaptation via Self-Distilled Regularization
Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation
Minimizing the Accumulated Trajectory Error To Improve Dataset Distillation
⭐code
DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis
🏠project
Virtual Occlusions Through Implicit Depth
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator
⭐code
Putting People in Their Place: Affordance-Aware Human Insertion into Scenes
⭐code
Inverting the Imaging Process by Learning an Implicit Camera Model
⭐code
Visual DNA: Representing and Comparing Images using Distributions of Neuron Activations
⭐code
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
⭐code
Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning
Noisy Correspondence Learning with Meta Similarity Correction
Efficient Multimodal Fusion via Interactive Prompting
Representing Volumetric Videos as Dynamic MLP Maps
⭐code
Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization
Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
A Meta-Learning Approach to Predicting Performance and Data Requirements
Multimodal Prompting with Missing Modalities for Visual Recognition
⭐code
Masked Images Are Counterfactual Samples for Robust Fine-tuning
UniHCP: A Unified Model for Human-Centric Perceptions
⭐code
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
⭐code
Progressive Open Space Expansion for Open-Set Model Attribution
⭐code
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
⭐code
HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
⭐code
3D Cinemagraphy from a Single Image
🏠project
Masked Image Modeling with Local Multi-Scale Reconstruction
⭐code
Revisiting Rotation Averaging: Uncertainties and Robust Losses
⭐code
Unifying Layout Generation with a Decoupled Diffusion Model
Adversarial Counterfactual Visual Explanations
⭐code
Trainable Projected Gradient Method for Robust Fine-tuning
⭐code
Partial Network Cloning
⭐code
Extracting Class Activation Maps from Non-Discriminative Features as well
⭐code
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization
⭐code
Visibility Constrained Wide-band Illumination Spectrum Design for Seeing-in-the-Dark
⭐code
PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment
⭐code
Boundary Unlearning
🏠project
ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals
VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions
Learning a Depth Covariance Function
⭐code
A Bag-of-Prototypes Representation for Dataset-Level Applications
CrOC: Cross-View Online Clustering for Dense Visual Representation Learning
⭐code
Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels
⭐code
Marching-Primitives: Shape Abstraction from Signed Distance Function
⭐code
Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
Robust Test-Time Adaptation in Dynamic Scenarios
⭐code
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck
⭐code
IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients
Compacting Binary Neural Networks by Sparse Kernel Selection
PDPP:Projected Diffusion for Procedure Planning in Instructional Videos
⭐code
Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph
⭐code
Quantum Multi-Model Fitting
⭐code
Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
PMatch: Paired Masked Image Modeling for Dense Geometric Matching
⭐code
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing
⭐code
Single Image Depth Prediction Made Better: A Multivariate Gaussian Take
Why is the winner the best?
Disorder-invariant Implicit Neural Representation
⭐code
HypLiLoc: Towards Effective LiDAR Pose Regression with Hyperbolic Fusion
⭐code
Enhancing Deformable Local Features by Jointly Learning to Detect and Describe Keypoints
🏠project
SMPConv: Self-moving Point Representations for Continuous Convolution
⭐code
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
⭐code
Delving into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
Wide-Angle Rectification via Content-Aware Conformal Mapping
🏠project
Large-capacity and Flexible Video Steganography via Invertible Neural Network
⭐code
SketchXAI: A First Look at Explainability for Human Sketches
⭐code
Hard Patches Mining for Masked Image Modeling
👍CVPR 2023 | HPM：在掩码学习中挖掘困难样本，带来稳固性能提升！
Learning Geometry-aware Representations by Sketching
DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training
⭐code
Investigating the Nature of 3D Generalization in Deep Neural Networks
⭐code
EC^2: Emergent Communication for Embodied Control
Generalizing Dataset Distillation via Deep Generative Prior
⭐code
🏠project
Learning Locally Editable Virtual Humans
🏠project
Class-Balancing Diffusion Models
SFD2: Semantic-guided Feature Detection and Description
⭐code
Computational Flash Photography Through Intrinsics
Deep Graph Reprogramming
LayoutDM: Transformer-based Diffusion Model for Layout Generation
MetaViewer: Towards a Unified Multi-View Representation
Learning Compact Representations for LiDAR Completion and Generation
🏠project
多模态
- Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning
- PMR: Prototypical Modal Rebalance for Multimodal Learning
- Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling
- Towards Flexible Multi-Modal Document Models
- Multi-Modal Representation Learning With Text-Driven Soft Masks
- Align and Attend: Multimodal Summarization With Dual Contrastive Losses
  🏠project
- Improving Zero-Shot Generalization and Robustness of Multi-Modal Models
  ⭐code
- BEV-Guided Multi-Modality Fusion for Driving Perception
  ⭐code
- BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency
- Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information
  ⭐code
- Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce多模态预训练
- MMANet: Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning
  ⭐code
Affordance Learning(启示学习)
- Leverage Interactive Affinity for Affordance Learning
  ⭐code
Feature Matching(特征匹配)
- PATS: Patch Area Transportation with Subdivision for Local Feature Matching
  🏠project
- Adaptive Spot-Guided Transformer for Consistent Local Feature Matching
  ⭐code
  ⭐code
- Adaptive Assignment for Geometry Aware Local Feature Matching
  ⭐code特征匹配
- DKM: Dense Kernelized Feature Matching for Geometry Estimation
  ⭐code
紫外线预测
- Normal-Guided Garment UV Prediction for Human Re-Texturing
vector quantization(矢量量化)
- Vector Quantization With Self-Attention for Quality-Independent Representation Learning
  🏠project

2020 年论文分类汇总戳这里

↘️CVPR-2020-Papers ↘️ECCV-2020-Papers

2021 年论文分类汇总戳这里

↘️ICCV-2021-Papers ↘️CVPR-2021-Papers

2022 年论文分类汇总戳这里

↘️CVPR-2022-Papers ↘️WACV-2022-Papers ↘️ECCV-2022-Papers

Files

README.md

Latest commit

History

README.md

File metadata and controls

CVPR-2023-Papers

❣❣❣ CVPR 2023 论文分类整理已完成

📢📢📢获奖论文

🏆Best Paper

🏆Best student Paper

🏆Honorable Mention

🏆Honorable Mention(Student)

历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~

2024 年论文分类汇总戳这里

2023 年论文分类汇总戳这里

2022 年论文分类汇总戳这里

2021 年论文分类汇总戳这里

2020 年论文分类汇总戳这里

目录

80.计算机图形学

79.thermal imaging technology(热敏成像技术)

78.Image/Video Editing(图像/视频编辑)

77.sketch(草图)

76.IP protection(知识产权保护)

75.Semantic Scene Completion(语义场景补全)

74.Machine Learning(机器学习)

73.Neural Radiance Fields(神经辐射场)

72.open-set recognition(开集识别)

71.visual reasoning(视觉推理)

70.Image Forgery Detection

69.Reinforcement learning(强化学习)

68.Lifelong Learning(终身学习)

67.Active Learning(主动学习)

66.Clustering(聚类)

65.Scene flow estimation(场景流估计)

64.Motion Retargeting(动作重定向)

63.edge detection(边缘检测)

62.Object Counting(物体计数)

61.Object Re-identification(物体重识别)

60.Industrial Anomaly Detection(工业缺陷检测)

59.Image\Video Compression(图像视频压缩)

58.Neural rendering(神经渲染)

57.Gaze Estimation(视线估计)

56.Sound + Vision(声音与视觉)

55.Novel View Synthesis(视图合成)

54.Benchmark/Dataset(基准/数据集)

53.Sign Language (手语)

52.Human Motion(人体运动)

51.Computed Imaging(计算成像，如光学、几何、光场成像等)

50.Anomaly Detection(异常检测)

49.Image Geo-localization(图像地理位置识别)

48.NLP(自然语言处理)

47.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/域适应)

46.Scene Graph Generation(场景图生成)

45.Dense Prediction(密集预测)

44.Federated Learning(联邦学习)

43.Multi-Task Learning(多任务学习)

42.Metric Learning(度量学习)

41.Incremental Learning(增量学习)

40.Adversarial Learning(对抗学习)

39.Continual Learning(持续学习)

38.Meta-Learning(元学习)

37.Contrastive Learning(对比学习)

36.Optical Flow(光流估计)

35.OCR

34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)

33.Human-Object Interaction(人物交互)

32.Data Augmentation(数据增强)

31.Vision-Language(视觉语言)

30.Visual Answer Questions(视觉问答)

29.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)

28.Style Transfer(风格迁移)

27.Pose Estimation(物体姿势估计)

26.GCN/GNN

25.Fine-Grained/Image Classification(细粒度/图像分类)

24.Super-Resolution(超分辨率)

23.Image Retrieval(图像检索)

22.Image Synthesis/Generation(图像合成)

21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)