Automated report

deep-diver · Dec 5, 2024 · ac55c11 · ac55c11
1 parent f503cfe
commit ac55c11
Show file tree

Hide file tree

Showing 18 changed files with 162 additions and 0 deletions.
diff --git a/current/2024-12-04 Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion.yaml b/current/2024-12-04 Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Shengyuan Zhang
+title: Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
+thumbnail: ""
+link: https://huggingface.co/papers/2412.03515
+summary: This paper presents a new method called ScoreLiDAR that improves the speed and quality of 3D LiDAR scene completion models used in autonomous vehicles. It reduces completion time by more than 5 times on SemanticKITTI and outperforms existing models....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-04 Imagine360: Immersive 360 Video Generation from Perspective Anchor.yaml b/current/2024-12-04 Imagine360: Immersive 360 Video Generation from Perspective Anchor.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Jing Tan
+title: 'Imagine360: Immersive 360 Video Generation from Perspective Anchor'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.03552
+summary: Imagine360 is a new framework that converts standard perspective videos into 360-degree equirectangular videos with rich and diverse motion patterns. It uses dual-branch design, antipodal mask, and elevation-aware designs to capture long-range motion dependencies and handle diverse perspective video inputs. Imagine360 outperforms other 360-degree video generation methods in terms of graphics quality and motion coherence....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...nt/2024-12-04 MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation.yaml b/...nt/2024-12-04 MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Zehuan Huang
+title: 'MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.03558
+summary: This paper presents MIDI, a new approach for creating 3D scenes from a single image. Unlike other methods that require multiple steps or complex processes, MIDI uses a multi-instance attention mechanism to capture interactions and spatial coherence between objects. The method models object completion during 3D generation and supervises interactions between 3D instances using limited scene-level data. MIDI achieves state-of-the-art performance in image-to-scene generation....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...nt/2024-12-04 Mimir: Improving Video Diffusion Models for Precise Text Understanding.yaml b/...nt/2024-12-04 Mimir: Improving Video Diffusion Models for Precise Text Understanding.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Shuai Tan
+title: 'Mimir: Improving Video Diffusion Models for Precise Text Understanding'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.03085
+summary: Mimir is a new framework for generating videos from text descriptions that improves the precision of text understanding by combining the strengths of video diffusion models and large language models. It does this by using a special token fuser to blend the features from both types of models, allowing the video generation model to benefit from the learned video priors and the text-related capabilities of the language model. This results in higher quality videos with better text comprehension, esp...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-04 One Shot, One Talk: Whole-body Talking Avatar from a Single Image.yaml b/current/2024-12-04 One Shot, One Talk: Whole-body Talking Avatar from a Single Image.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Jun Xiang
+title: 'One Shot, One Talk: Whole-body Talking Avatar from a Single Image'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.01106
+summary: This paper introduces a novel pipeline to create a whole-body talking avatar from a single image by addressing complex dynamic modeling and generalization to novel gestures and expressions. They use pose-guided image-to-video diffusion models to generate imperfect video frames as pseudo-labels and a 3DGS-mesh hybrid avatar representation to overcome dynamic modeling challenges. Experiments show that their method produces photorealistic, animatable, and expressive avatars from a single image....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...12-04 TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation.yaml b/...12-04 TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Liao Qu
+title: 'TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.03069
+summary: TokenFlow is a new way to turn images into smaller pieces of information that can be used for understanding and creating things. It helps with finding the right words and pictures when you're trying to understand or make something new. It's better than other ways because it can handle both big ideas and small details at the same time. And it can even be better at understanding things than some big computer systems....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-05 CleanDIFT: Diffusion Features without Noise.yaml b/current/2024-12-05 CleanDIFT: Diffusion Features without Noise.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Nick Stracke
+title: 'CleanDIFT: Diffusion Features without Noise'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.03439
+summary: We present a method to fine-tune diffusion models to produce high-quality, noise-free semantic features for various tasks, outperforming previous methods and ensemble-based approaches at a lower cost....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...ting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning.yaml b/...ting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Wujian Peng
+title: 'Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.03565
+summary: This paper proposes a method called Inst-IT to improve the understanding of specific elements in images and videos by Large Multimodal Models. It uses explicit visual prompts and instruction tuning, and the results show that it improves the models' performance on various image and video understanding tasks....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...-12-05 LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting.yaml b/...-12-05 LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Xiaoyan Xing
+title: 'LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.00177
+summary: LumiNet is a new method that uses generative models and hidden properties of light to change the lighting in a scene. It uses a StyleGAN-based model to create a new image that has the lighting of a target image but keeps the shape and color of the original image. This is done by processing hidden properties from both images and using a learned adaptor to transfer the lighting. LumiNet is better than other methods at transferring complex lighting effects like specular highlights and indirect illu...
+opinion: placeholder
+tags:
+    - ML
diff --git a/...er: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images.yaml b/...er: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Lingen Li
+title: 'NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.03517
+summary: We present NVComposer, a new method for creating realistic images from different viewpoints using multiple sparse and unposed images. Our approach doesn't require external alignment and improves the flexibility and accessibility of existing methods. NVComposer uses a dual-stream diffusion model and a geometry-aware feature alignment module to generate target novel views and condition camera poses. The approach achieves state-of-the-art performance in generative multi-view NVS tasks and shows imp...
+opinion: placeholder
+tags:
+    - ML
diff --git a/...itroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training.yaml b/...itroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Dar-Yen Chen
+title: 'NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.02030
+summary: NitroFusion, a new method for creating high-quality images in a single step, uses a group of specialized judges to provide feedback on different aspects of the image, such as composition, color, and technique. This helps to improve the quality of the generated images and allows users to choose the number of steps for the best balance of quality and speed....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-05 PaliGemma 2: A Family of Versatile VLMs for Transfer.yaml b/current/2024-12-05 PaliGemma 2: A Family of Versatile VLMs for Transfer.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Andreas Steiner
+title: 'PaliGemma 2: A Family of Versatile VLMs for Transfer'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.03555
+summary: PaliGemma 2 is an improved version of the PaliGemma open Vision-Language Model (VLM) that uses the Gemma 2 family of language models and the SigLIP-So400m vision encoder. The models are trained at different resolutions to improve knowledge transfer. The family of models allows for analysis of factors impacting transfer performance and includes more and broader transfer tasks than PaliGemma, achieving state-of-the-art results in some tasks....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...2024-12-05 SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance.yaml b/...2024-12-05 SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Viet Nguyen
+title: 'SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.02687
+summary: This paper introduces SNOOPI, a new framework that improves training stability and adds support for negative prompt guidance in one-step diffusion models. It uses a random-scale classifier-free guidance approach and a training-free method called Negative-Away Steer Attention (NASA). SNOOPI significantly improves baseline models across various metrics and sets a new state-of-the-art benchmark for one-step diffusion models with an HPSv2 score of 31.08....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...s of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models.yaml b/...s of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Alex Havrilla
+title: Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
+thumbnail: ""
+link: https://huggingface.co/papers/2412.02980
+summary: The paper investigates the impact of quality, diversity, and complexity in synthetic data generated by Large Language Models. It finds that quality is crucial for in-distribution generalization, diversity is essential for out-of-distribution generalization, and complexity is beneficial for both. The paper also highlights the trade-offs between these characteristics and their influence on model performance. Additionally, it discusses the importance of balancing these trade-offs for efficient rein...
+opinion: placeholder
+tags:
+    - ML
diff --git a/...2-05 U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs.yaml b/...2-05 U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Konstantin Chernyshev
+title: 'U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.03205
+summary: The abstract introduces U-MATH, a new benchmark for evaluating mathematical skills in language models, which consists of 1,100 unpublished open-ended university-level problems. The benchmark is balanced across six core subjects and includes 20% multimodal problems. The evaluation of various language models on U-MATH reveals that they struggle with both text-based and visual problems, achieving a maximum accuracy of only 63% and 45% respectively. The solution assessment is also challenging for la...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-05 VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models.yaml b/current/2024-12-05 VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Jeongho Ju
+title: 'VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models'
+thumbnail: ""
+link: https://huggingface.co/papers/2411.19103
+summary: VARCO-VISION is an open-source Korean-English vision-language model that can understand and generate images and text in both languages. It performs well in various tasks and is available for researchers to use....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...ideo-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding.yaml b/...ideo-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Duo Zheng
+title: 'Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.00493
+summary: The paper introduces a new model called Video-3D LLM that improves 3D scene understanding by treating 3D scenes as videos and adding 3D position information to the model's representations. The model also uses a technique called maximum coverage sampling to balance computational costs and performance. The authors show that their model outperforms other models on several 3D scene understanding tasks....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-05 Weighted-Reward Preference Optimization for Implicit Model Fusion.yaml b/current/2024-12-05 Weighted-Reward Preference Optimization for Implicit Model Fusion.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-05"
+author: Ziyi Yang
+title: Weighted-Reward Preference Optimization for Implicit Model Fusion
+thumbnail: ""
+link: https://huggingface.co/papers/2412.03187
+summary: This paper proposes a method called Weighted-Reward Preference Optimization (WRPO) for combining different language models without needing to align their words or combine their numbers. It can improve the performance of the combined model and even beat other powerful models on some tests....
+opinion: placeholder
+tags:
+    - ML