generated from codingpot/newsletter_awesome_articles
-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
6b4c97b
commit 5184297
Showing
15 changed files
with
135 additions
and
0 deletions.
There are no files selected for viewing
9 changes: 9 additions & 0 deletions
9
...-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-03" | ||
author: Kaixiong Gong | ||
title: 'AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.02611 | ||
summary: This paper introduces AV-Odyssey Bench, a comprehensive audio-visual benchmark to evaluate the understanding of audio-visual information by multimodal large language models. It consists of 4,555 multiple-choice questions that require models to leverage clues from both visual and audio inputs. The benchmark aims to provide insights for future dataset collection and model development by revealing the limitations of current models.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-03" | ||
author: Zicheng Lin | ||
title: 'Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM''s Reasoning Capability' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2411.19943 | ||
summary: In this paper, we introduce cDPO, a novel approach that identifies and rewards critical tokens during the alignment process in Large Language Models (LLMs) to improve their reasoning capability. We use a contrastive estimation approach to identify critical tokens and extend the conventional DPO algorithms to token-level DPO for better alignment with critical token information. Our approach is evaluated on GSM8K and MATH500 benchmarks with Llama-3 (8B and 70B) and deepseek-math (7B) models, demon... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-03 Free Process Rewards without Process Labels.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-03" | ||
author: Lifan Yuan | ||
title: Free Process Rewards without Process Labels | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.01981 | ||
summary: This paper introduces a new method for training a process reward model (PRM) without the need for manually annotated labels at every intermediate step. The method trains an outcome reward model (ORM) on cheaper response-level labels and shows that it outperforms a strong baseline using less than 1/38 of the training data. The performance can be further improved with majority voting and by scaling up instructions and responses. The method is more data-efficient and can keep improving generation m... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
... LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-03" | ||
author: Hongyan Zhi | ||
title: 'LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.01292 | ||
summary: Researchers propose LSceneLLM, a framework that uses an LLM's visual preference to identify task-relevant areas in large 3D scenes, then uses a scene magnifier module to capture fine-grained details in these areas. They also introduce XR-Scene, a benchmark for large scene understanding tasks, and show that their method outperforms existing methods.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...askRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-03" | ||
author: Minhyun Lee | ||
title: 'MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2411.19067 | ||
summary: This paper introduces a new way to improve the performance of Referring Image Segmentation (RIS) by using a method called Masked Referring Image Segmentation (MaskRIS). This method uses image and text masking, followed by Distortion-aware Contextual Learning (DCL) to make the model more robust to things like occlusions and incomplete information. The authors demonstrate that their method outperforms existing methods in both fully supervised and weakly supervised settings, and achieves new state-... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-03 Scaling Image Tokenizers with Grouped Spherical Quantization.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-03" | ||
author: Jiangtao Wang | ||
title: Scaling Image Tokenizers with Grouped Spherical Quantization | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.02632 | ||
summary: We propose Grouped Spherical Quantization (GSQ) for image tokenizers, which improves reconstruction quality and enables efficient scaling. Our findings reveal distinct behaviors at high and low spatial compression levels, and we show that GSQ can represent high-dimensional latent spaces more efficiently.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...12-03 VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-03" | ||
author: Mingzhe Zheng | ||
title: 'VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.02259 | ||
summary: VideoGen-of-Thought (VGoT) is a new method for creating multi-shot videos that are cohesive and have a logical storyline. It does this by breaking the video creation process into smaller steps, including generating scripts, keyframes, and individual shots, and by ensuring that the characters and story remain consistent throughout the video.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...ent/2024-12-04 A dynamic parallel method for performance optimization on hybrid CPUs.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-04" | ||
author: Luo Yu | ||
title: A dynamic parallel method for performance optimization on hybrid CPUs | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2411.19542 | ||
summary: A new method for running AI models on hybrid CPUs has been introduced to balance the workload of each core and improve inference performance. This method allows Neural Speed to use more than 90% of the memory bandwidth on two hybrid Intel CPUs.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...nt/2024-12-04 Generating a Low-code Complete Workflow via Task Decomposition and RAG.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-04" | ||
author: Orlando Marquez Ayala | ||
title: Generating a Low-code Complete Workflow via Task Decomposition and RAG | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.00239 | ||
summary: 'The paper introduces two design patterns for AI-based systems: Task Decomposition and Retrieval-Augmented Generation (RAG). These patterns are used to create a complex real-world GenAI application for enterprise users: Workflow Generation. The paper discusses the trade-offs of these patterns in terms of software quality attributes and provides recommendations for AI practitioners....' | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-04 MALT: Improving Reasoning with Multi-Agent LLM Training.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-04" | ||
author: Sumeet Ramesh Motwani | ||
title: 'MALT: Improving Reasoning with Multi-Agent LLM Training' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.01928 | ||
summary: The paper introduces a new approach called MALT that trains multiple LLMs to work together on reasoning problems. The LLMs are given specific roles and work together in a sequential way. The paper also proposes a method for generating synthetic data and assigning rewards to improve each model's performance. The approach is evaluated on three datasets and shows improvements over the baseline model.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...t/2024-12-04 Motion Prompting: Controlling Video Generation with Motion Trajectories.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-04" | ||
author: Daniel Geng | ||
title: 'Motion Prompting: Controlling Video Generation with Motion Trajectories' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.02700 | ||
summary: The paper introduces a new method for controlling video generation by using motion trajectories, which can be sparse or dense and encode any number of trajectories. This method can be used to control camera and object motion, transfer motion, and edit images. The results show realistic physics and emergent behaviors, and the method outperforms previous methods in quantitative evaluations and human studies.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...inders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-04" | ||
author: Junyuan Zhang | ||
title: 'OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.02592 | ||
summary: 'This paper introduces OHRBench, a benchmark for evaluating the impact of OCR on Retrieval-Augmented Generation (RAG) systems. It identifies two types of OCR noise: Semantic Noise and Formatting Noise, and demonstrates the vulnerability of RAG systems to these noises. The paper also discusses the potential of using Vision-Language Models (VLMs) without OCR in RAG systems....' | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...nt/2024-12-04 OmniCreator: Self-Supervised Unified Generation with Universal Editing.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-04" | ||
author: Haodong Chen | ||
title: 'OmniCreator: Self-Supervised Unified Generation with Universal Editing' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.02114 | ||
summary: OmniCreator is a self-supervised framework that can generate or edit images and videos based on text prompts. It uses original text-video pairs to learn the relationship between text and video, and can generate high-quality videos or edit existing videos to match a given text prompt. The framework also works with images and has been tested on a new dataset called OmniBench-99, where it outperformed other models.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
.../2024-12-04 Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-04" | ||
author: Alessandro Scirè | ||
title: Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2411.19655 | ||
summary: This paper introduces LLM-OASIS, a large-scale resource for training end-to-end factuality evaluators for Large Language Models (LLMs). LLM-OASIS addresses the limitations of existing resources by being task- and domain-agnostic, large in size, and designed for complex verification tasks. The paper demonstrates that LLM-OASIS presents a significant challenge for state-of-the-art LLMs, highlighting its potential to drive future research in the field.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-04" | ||
author: Dhiman Paul | ||
title: 'VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.01558 | ||
summary: This paper proposes VideoLights, a new system for finding important parts of videos and finding specific moments in them. It uses a special method to combine video and text information, and it also uses a large language model to help improve the results. The system performs better than other systems on several tests, and the code is available online.... | ||
opinion: placeholder | ||
tags: | ||
- ML |