generated from codingpot/newsletter_awesome_articles
-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3cc161e
commit aabbdc2
Showing
11 changed files
with
99 additions
and
0 deletions.
There are no files selected for viewing
9 changes: 9 additions & 0 deletions
9
current/2024-12-17 Are Your LLMs Capable of Stable Reasoning?.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-17" | ||
author: Junnan Liu | ||
title: Are Your LLMs Capable of Stable Reasoning? | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.13147 | ||
summary: This paper introduces G-Pass@k, a new evaluation metric that measures the performance and stability of Large Language Models (LLMs) in complex reasoning tasks. It also presents LiveMathBench, a dynamic benchmark of challenging mathematical problems. The authors find that LLMs have room for improvement in their realistic reasoning capabilities, highlighting the need for more robust evaluation methods.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...nsional Insights: Benchmarking Real-World Personalization in Large Multimodal Models.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-17" | ||
author: YiFan Zhang | ||
title: 'Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.12606 | ||
summary: The Multi-Dimensional Insights benchmark is a new tool that tests large multimodal models' ability to understand and analyze images in real-world scenarios. It includes questions for different age groups and shows that these models still need to improve in meeting people's needs.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...12-18 Compressed Chain of Thought: Efficient Reasoning Through Dense Representations.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-18" | ||
author: Jeffrey Cheng | ||
title: 'Compressed Chain of Thought: Efficient Reasoning Through Dense Representations' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.13171 | ||
summary: We propose a method called Compressed Chain-of-Thought (CCoT) to generate continuous and variable-length contemplation tokens for language models during inference. These tokens represent explicit reasoning chains and can be applied to existing models. Our method improves accuracy by allowing the models to reason over dense contentful representations, and the level of reasoning can be adjusted by controlling the number of tokens generated.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...ons: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-18" | ||
author: Seungwook Han | ||
title: 'Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.12276 | ||
summary: This paper proposes a concept encoding-decoding mechanism to explain in-context learning (ICL) in transformers. The mechanism involves the model encoding different latent concepts into distinct, separable representations and building conditional decoding algorithms. The quality of concept encoding is causally related and predictive of ICL performance. The mechanism is validated across pretrained models of varying scales and through controlled finetuning.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-18" | ||
author: Mark Endo | ||
title: 'Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.13180 | ||
summary: This paper studies the issue of pruning visual tokens in Vision-Language Models and introduces a new approach called FEATHER that resolves this issue and improves performance on vision-centric localization benchmarks by more than 5 times compared to the original approach.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-18 MIVE: New Design and Benchmark for Multi-Instance Video Editing.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-18" | ||
author: Samuel Teodoro | ||
title: 'MIVE: New Design and Benchmark for Multi-Instance Video Editing' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.12877 | ||
summary: 'This paper introduces a new video editing framework called MIVE. It has two main components: DMS to prevent editing leakage and IPR to ensure precise editing. The paper also presents a new MIVE Dataset and an evaluation metric called CIA Score. MIVE outperforms existing methods in terms of editing accuracy, faithfulness, and leakage prevention....' | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...iEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-18" | ||
author: Shuting Wang | ||
title: 'OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.13018 | ||
summary: This paper presents OmniEval, a comprehensive RAG benchmark in the financial domain that assesses both retrieval and generation performance. It includes a multi-dimensional evaluation framework, automatic data generation, and robust evaluation metrics, highlighting the performance variations of RAG systems across diverse topics and tasks. The code is open-sourced.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...gent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-18" | ||
author: Yifei Zhou | ||
title: 'Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.13194 | ||
summary: The paper proposes a system called PAE that allows foundation model agents to discover and practice skills in the real world. It uses a task proposer to suggest tasks, a thought-based agent policy to attempt those tasks, and an autonomous success evaluator to assess the results. PAE is validated on vision-based web navigation tasks and outperforms other methods with real-world human-annotated benchmarks. The open-source code and checkpoints are available at https://yanqval.github.io/PAE/.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...Towards Exception Safety Code Generation with Intermediate Language Agents Framework.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-18" | ||
author: Xuanming Zhang | ||
title: 'Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.11713 | ||
summary: 'This academic paper introduces Seeker, a multi-agent framework that uses large language models to improve exception handling in code. The framework addresses three key issues: Insensitive Detection of Fragile Code, Inaccurate Capture of Exception Block, and Distorted Handling Solution. Seeker uses agents to assist LLMs in detecting, capturing, and resolving exceptions more effectively, providing valuable insights for future improvements in code reliability....' | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...ument QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-18" | ||
author: Manan Suri | ||
title: 'VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.10704 | ||
summary: This paper presents VisDoMBench, a benchmark for evaluating question answering systems in multi-document settings with rich multimodal content. The paper also introduces VisDoMRAG, a novel multimodal approach that combines visual and textual retrieval augmented generation, improving accuracy and answer verifiability. The approach outperforms unimodal and long-context LLM baselines by 12-20% in end-to-end multimodal document QA.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-18 When to Speak, When to Abstain: Contrastive Decoding with Abstention.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-18" | ||
author: Hyuhng Joon Kim | ||
title: 'When to Speak, When to Abstain: Contrastive Decoding with Abstention' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.12527 | ||
summary: This paper introduces Contrastive Decoding with Abstention (CDA), a method that helps large language models decide when to provide an answer and when to abstain, improving their reliability and trustworthiness.... | ||
opinion: placeholder | ||
tags: | ||
- ML |