Automated report

deep-diver · Dec 25, 2024 · 612d771 · 612d771
1 parent fb8ea8a
commit 612d771
Show file tree

Hide file tree

Showing 10 changed files with 90 additions and 0 deletions.
diff --git a/current/2024-12-24 DepthLab: From Partial to Complete.yaml b/current/2024-12-24 DepthLab: From Partial to Complete.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-24"
+author: Zhiheng Liu
+title: 'DepthLab: From Partial to Complete'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.18153
+summary: DepthLab is a foundation depth inpainting model that can complete missing depth data and preserve scale consistency. It can be used in various downstream tasks and outperforms current solutions in both performance and visual quality....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...sition Embedding: Enhancing Attention's Periodic Extension for Length Generalization.yaml b/...sition Embedding: Enhancing Attention's Periodic Extension for Length Generalization.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-24"
+author: Ermo Hua
+title: 'Fourier Position Embedding: Enhancing Attention''s Periodic Extension for Length Generalization'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.17739
+summary: Fourier Position Embedding (FoPE) is introduced as an enhancement to Rotary Position Embedding (RoPE) in Language Models (LMs). FoPE improves the periodic extension and length generalization of RoPE-based attention by addressing the adverse effects of linear layers and activation functions outside of attention, as well as insufficiently trained frequency components caused by time-domain truncation. FoPE constructs Fourier Series and zero-outs destructive frequency components, increasing model ro...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-24 ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing.yaml b/current/2024-12-24 ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-24"
+author: Ziteng Wang
+title: 'ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.14711
+summary: ReMoE is a fully differentiable Mixture-of-Experts architecture that uses ReLU as the router instead of TopK+Softmax routing. It offers efficient dynamic allocation of computation across tokens and layers, and exhibits domain specialization. ReMoE consistently outperforms vanilla TopK-routed MoE across various model sizes, expert counts, and levels of granularity, and exhibits superior scalability with respect to the number of experts....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...2-24 SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval.yaml b/...2-24 SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-24"
+author: Aakash Mahalingam
+title: 'SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.15443
+summary: SKETCH is a new method that improves the process of finding information from large datasets by combining text retrieval and knowledge graphs. It helps to create more accurate and relevant responses compared to traditional methods, as shown in four different datasets....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...hLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding.yaml b/...hLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-25"
+author: Tatiana Zemskova
+title: '3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.18450
+summary: A 3D scene graph represents a compact scene model, storing information about the objects and the semantic relationships between them, making its use promising for robotic tasks. When interacting with a user, an embodied intelligent agent should be capable of responding to various queries about the scene formulated in natural language. Large Language Models (LLMs) are beneficial solutions for user-robot interaction due to their natural language understanding and reasoning abilities. Recent method...
+opinion: placeholder
+tags:
+    - ML
diff --git a/...lti-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation.yaml b/...lti-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-25"
+author: Minghong Cai
+title: 'DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.18597
+summary: We propose DiTCtrl, a method for generating videos with multiple sequential prompts using a Multi-Modal Diffusion Transformer (MM-DiT) architecture. Our method analyzes the attention mechanism of MM-DiT and utilizes mask-guided precise semantic control across different prompts with attention sharing to achieve smooth transitions and consistent object motion. We also introduce MPVBench, a new benchmark for evaluating multi-prompt video generation performance....
+opinion: placeholder
+tags:
+    - ML
diff --git a/... Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning.yaml b/... Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-25"
+author: Sungjin Park
+title: Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning
+thumbnail: ""
+link: https://huggingface.co/papers/2412.15797
+summary: We introduce a new way to use many language models together to solve complex problems, called LE-MCTS. This method helps open-source models perform better on challenging reasoning tasks by choosing the best answer from different models based on a reward system. Our approach outperforms other methods and improves performance by up to 4.3% on certain math reasoning datasets....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-25 In Case You Missed It: ARC 'Challenge' Is Not That Challenging.yaml b/current/2024-12-25 In Case You Missed It: ARC 'Challenge' Is Not That Challenging.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-25"
+author: Łukasz Borchmann
+title: 'In Case You Missed It: ARC ''Challenge'' Is Not That Challenging'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.17758
+summary: The paper discusses how the evaluation setup of the ARC Challenge makes it seem more difficult than it actually is for modern LLMs. The paper also highlights how similar evaluation practices can lead to false assumptions about reasoning deficits in other benchmarks and offers guidelines to ensure that multiple-choice evaluations accurately reflect actual model capabilities....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-25 MotiF: Making Text Count in Image Animation with Motion Focal Loss.yaml b/current/2024-12-25 MotiF: Making Text Count in Image Animation with Motion Focal Loss.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-25"
+author: Shijie Wang
+title: 'MotiF: Making Text Count in Image Animation with Motion Focal Loss'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.16153
+summary: The paper proposes MotiF, an approach to improve text alignment and motion generation in text-guided image animation by focusing on regions with more motion. They also introduce TI2V Bench, a dataset for evaluating text-guided image animation, and conduct a human evaluation protocol. MotiF outperforms nine open-sourced models on TI2V Bench, achieving an average preference of 72%....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...artGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models.yaml b/...artGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-25"
+author: Minghao Chen
+title: 'PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.18608
+summary: PartGen is a method that separates 3D objects into meaningful parts and reconstructs them using multi-view diffusion models. This method can generate 3D objects from text or images and can complete or hallucinate missing parts....
+opinion: placeholder
+tags:
+    - ML