Automated report

deep-diver · Dec 4, 2024 · 5184297 · 5184297
1 parent 6b4c97b
commit 5184297
Show file tree

Hide file tree

Showing 15 changed files with 135 additions and 0 deletions.
diff --git a/...-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?.yaml b/...-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-03"
+author: Kaixiong Gong
+title: 'AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.02611
+summary: This paper introduces AV-Odyssey Bench, a comprehensive audio-visual benchmark to evaluate the understanding of audio-visual information by multimodal large language models. It consists of 4,555 multiple-choice questions that require models to leverage clues from both visual and audio inputs. The benchmark aims to provide insights for future dataset collection and model development by revealing the limitations of current models....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability.yaml b/...Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-03"
+author: Zicheng Lin
+title: 'Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM''s Reasoning Capability'
+thumbnail: ""
+link: https://huggingface.co/papers/2411.19943
+summary: In this paper, we introduce cDPO, a novel approach that identifies and rewards critical tokens during the alignment process in Large Language Models (LLMs) to improve their reasoning capability. We use a contrastive estimation approach to identify critical tokens and extend the conventional DPO algorithms to token-level DPO for better alignment with critical token information. Our approach is evaluated on GSM8K and MATH500 benchmarks with Llama-3 (8B and 70B) and deepseek-math (7B) models, demon...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-03 Free Process Rewards without Process Labels.yaml b/current/2024-12-03 Free Process Rewards without Process Labels.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-03"
+author: Lifan Yuan
+title: Free Process Rewards without Process Labels
+thumbnail: ""
+link: https://huggingface.co/papers/2412.01981
+summary: This paper introduces a new method for training a process reward model (PRM) without the need for manually annotated labels at every intermediate step. The method trains an outcome reward model (ORM) on cheaper response-level labels and shows that it outperforms a strong baseline using less than 1/38 of the training data. The performance can be further improved with majority voting and by scaling up instructions and responses. The method is more data-efficient and can keep improving generation m...
+opinion: placeholder
+tags:
+    - ML
diff --git a/... LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences.yaml b/... LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-03"
+author: Hongyan Zhi
+title: 'LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.01292
+summary: Researchers propose LSceneLLM, a framework that uses an LLM's visual preference to identify task-relevant areas in large 3D scenes, then uses a scene magnifier module to capture fine-grained details in these areas. They also introduce XR-Scene, a benchmark for large scene understanding tasks, and show that their method outperforms existing methods....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...askRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation.yaml b/...askRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-03"
+author: Minhyun Lee
+title: 'MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation'
+thumbnail: ""
+link: https://huggingface.co/papers/2411.19067
+summary: This paper introduces a new way to improve the performance of Referring Image Segmentation (RIS) by using a method called Masked Referring Image Segmentation (MaskRIS). This method uses image and text masking, followed by Distortion-aware Contextual Learning (DCL) to make the model more robust to things like occlusions and incomplete information. The authors demonstrate that their method outperforms existing methods in both fully supervised and weakly supervised settings, and achieves new state-...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-03 Scaling Image Tokenizers with Grouped Spherical Quantization.yaml b/current/2024-12-03 Scaling Image Tokenizers with Grouped Spherical Quantization.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-03"
+author: Jiangtao Wang
+title: Scaling Image Tokenizers with Grouped Spherical Quantization
+thumbnail: ""
+link: https://huggingface.co/papers/2412.02632
+summary: We propose Grouped Spherical Quantization (GSQ) for image tokenizers, which improves reconstruction quality and enables efficient scaling. Our findings reveal distinct behaviors at high and low spatial compression levels, and we show that GSQ can represent high-dimensional latent spaces more efficiently....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...12-03 VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation.yaml b/...12-03 VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-03"
+author: Mingzhe Zheng
+title: 'VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.02259
+summary: VideoGen-of-Thought (VGoT) is a new method for creating multi-shot videos that are cohesive and have a logical storyline. It does this by breaking the video creation process into smaller steps, including generating scripts, keyframes, and individual shots, and by ensuring that the characters and story remain consistent throughout the video....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...ent/2024-12-04 A dynamic parallel method for performance optimization on hybrid CPUs.yaml b/...ent/2024-12-04 A dynamic parallel method for performance optimization on hybrid CPUs.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Luo Yu
+title: A dynamic parallel method for performance optimization on hybrid CPUs
+thumbnail: ""
+link: https://huggingface.co/papers/2411.19542
+summary: A new method for running AI models on hybrid CPUs has been introduced to balance the workload of each core and improve inference performance. This method allows Neural Speed to use more than 90% of the memory bandwidth on two hybrid Intel CPUs....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...nt/2024-12-04 Generating a Low-code Complete Workflow via Task Decomposition and RAG.yaml b/...nt/2024-12-04 Generating a Low-code Complete Workflow via Task Decomposition and RAG.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Orlando Marquez Ayala
+title: Generating a Low-code Complete Workflow via Task Decomposition and RAG
+thumbnail: ""
+link: https://huggingface.co/papers/2412.00239
+summary: 'The paper introduces two design patterns for AI-based systems: Task Decomposition and Retrieval-Augmented Generation (RAG). These patterns are used to create a complex real-world GenAI application for enterprise users: Workflow Generation. The paper discusses the trade-offs of these patterns in terms of software quality attributes and provides recommendations for AI practitioners....'
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-04 MALT: Improving Reasoning with Multi-Agent LLM Training.yaml b/current/2024-12-04 MALT: Improving Reasoning with Multi-Agent LLM Training.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Sumeet Ramesh Motwani
+title: 'MALT: Improving Reasoning with Multi-Agent LLM Training'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.01928
+summary: The paper introduces a new approach called MALT that trains multiple LLMs to work together on reasoning problems. The LLMs are given specific roles and work together in a sequential way. The paper also proposes a method for generating synthetic data and assigning rewards to improve each model's performance. The approach is evaluated on three datasets and shows improvements over the baseline model....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...t/2024-12-04 Motion Prompting: Controlling Video Generation with Motion Trajectories.yaml b/...t/2024-12-04 Motion Prompting: Controlling Video Generation with Motion Trajectories.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Daniel Geng
+title: 'Motion Prompting: Controlling Video Generation with Motion Trajectories'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.02700
+summary: The paper introduces a new method for controlling video generation by using motion trajectories, which can be sparse or dense and encode any number of trajectories. This method can be used to control camera and object motion, transfer motion, and edit images. The results show realistic physics and emergent behaviors, and the method outperforms previous methods in quantitative evaluations and human studies....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...inders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation.yaml b/...inders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Junyuan Zhang
+title: 'OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.02592
+summary: 'This paper introduces OHRBench, a benchmark for evaluating the impact of OCR on Retrieval-Augmented Generation (RAG) systems. It identifies two types of OCR noise: Semantic Noise and Formatting Noise, and demonstrates the vulnerability of RAG systems to these noises. The paper also discusses the potential of using Vision-Language Models (VLMs) without OCR in RAG systems....'
+opinion: placeholder
+tags:
+    - ML
diff --git a/...nt/2024-12-04 OmniCreator: Self-Supervised Unified Generation with Universal Editing.yaml b/...nt/2024-12-04 OmniCreator: Self-Supervised Unified Generation with Universal Editing.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Haodong Chen
+title: 'OmniCreator: Self-Supervised Unified Generation with Universal Editing'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.02114
+summary: OmniCreator is a self-supervised framework that can generate or edit images and videos based on text prompts. It uses original text-video pairs to learn the relationship between text and video, and can generate high-quality videos or edit existing videos to match a given text prompt. The framework also works with images and has been tested on a new dataset called OmniBench-99, where it outperformed other models....
+opinion: placeholder
+tags:
+    - ML
diff --git a/.../2024-12-04 Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS.yaml b/.../2024-12-04 Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Alessandro Scirè
+title: Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS
+thumbnail: ""
+link: https://huggingface.co/papers/2411.19655
+summary: This paper introduces LLM-OASIS, a large-scale resource for training end-to-end factuality evaluators for Large Language Models (LLMs). LLM-OASIS addresses the limitations of existing resources by being task- and domain-agnostic, large in size, and designed for complex verification tasks. The paper demonstrates that LLM-OASIS presents a significant challenge for state-of-the-art LLMs, highlighting its potential to drive future research in the field....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval.yaml b/...-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-04"
+author: Dhiman Paul
+title: 'VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.01558
+summary: This paper proposes VideoLights, a new system for finding important parts of videos and finding specific moments in them. It uses a special method to combine video and text information, and it also uses a large language model to help improve the results. The system performs better than other systems on several tests, and the code is available online....
+opinion: placeholder
+tags:
+    - ML