Automated report

deep-diver · Dec 18, 2024 · aabbdc2 · aabbdc2
1 parent 3cc161e
commit aabbdc2
Show file tree

Hide file tree

Showing 11 changed files with 99 additions and 0 deletions.
diff --git a/current/2024-12-17 Are Your LLMs Capable of Stable Reasoning?.yaml b/current/2024-12-17 Are Your LLMs Capable of Stable Reasoning?.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-17"
+author: Junnan Liu
+title: Are Your LLMs Capable of Stable Reasoning?
+thumbnail: ""
+link: https://huggingface.co/papers/2412.13147
+summary: This paper introduces G-Pass@k, a new evaluation metric that measures the performance and stability of Large Language Models (LLMs) in complex reasoning tasks. It also presents LiveMathBench, a dynamic benchmark of challenging mathematical problems. The authors find that LLMs have room for improvement in their realistic reasoning capabilities, highlighting the need for more robust evaluation methods....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...nsional Insights: Benchmarking Real-World Personalization in Large Multimodal Models.yaml b/...nsional Insights: Benchmarking Real-World Personalization in Large Multimodal Models.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-17"
+author: YiFan Zhang
+title: 'Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.12606
+summary: The Multi-Dimensional Insights benchmark is a new tool that tests large multimodal models' ability to understand and analyze images in real-world scenarios. It includes questions for different age groups and shows that these models still need to improve in meeting people's needs....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...12-18 Compressed Chain of Thought: Efficient Reasoning Through Dense Representations.yaml b/...12-18 Compressed Chain of Thought: Efficient Reasoning Through Dense Representations.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-18"
+author: Jeffrey Cheng
+title: 'Compressed Chain of Thought: Efficient Reasoning Through Dense Representations'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.13171
+summary: We propose a method called Compressed Chain-of-Thought (CCoT) to generate continuous and variable-length contemplation tokens for language models during inference. These tokens represent explicit reasoning chains and can be applied to existing models. Our method improves accuracy by allowing the models to reason over dense contentful representations, and the level of reasoning can be adjusted by controlling the number of tokens generated....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...ons: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers.yaml b/...ons: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-18"
+author: Seungwook Han
+title: 'Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.12276
+summary: This paper proposes a concept encoding-decoding mechanism to explain in-context learning (ICL) in transformers. The mechanism involves the model encoding different latent concepts into distinct, separable representations and building conditional decoding algorithms. The quality of concept encoding is causally related and predictive of ICL performance. The mechanism is validated across pretrained models of varying scales and through controlled finetuning....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration.yaml b/...the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-18"
+author: Mark Endo
+title: 'Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.13180
+summary: This paper studies the issue of pruning visual tokens in Vision-Language Models and introduces a new approach called FEATHER that resolves this issue and improves performance on vision-centric localization benchmarks by more than 5 times compared to the original approach....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-18 MIVE: New Design and Benchmark for Multi-Instance Video Editing.yaml b/current/2024-12-18 MIVE: New Design and Benchmark for Multi-Instance Video Editing.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-18"
+author: Samuel Teodoro
+title: 'MIVE: New Design and Benchmark for Multi-Instance Video Editing'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.12877
+summary: 'This paper introduces a new video editing framework called MIVE. It has two main components: DMS to prevent editing leakage and IPR to ensure precise editing. The paper also presents a new MIVE Dataset and an evaluation metric called CIA Score. MIVE outperforms existing methods in terms of editing accuracy, faithfulness, and leakage prevention....'
+opinion: placeholder
+tags:
+    - ML
diff --git a/...iEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain.yaml b/...iEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-18"
+author: Shuting Wang
+title: 'OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.13018
+summary: This paper presents OmniEval, a comprehensive RAG benchmark in the financial domain that assesses both retrieval and generation performance. It includes a multi-dimensional evaluation framework, automatic data generation, and robust evaluation metrics, highlighting the performance variations of RAG systems across diverse topics and tasks. The code is open-sourced....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...gent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents.yaml b/...gent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-18"
+author: Yifei Zhou
+title: 'Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.13194
+summary: The paper proposes a system called PAE that allows foundation model agents to discover and practice skills in the real world. It uses a task proposer to suggest tasks, a thought-based agent policy to attempt those tasks, and an autonomous success evaluator to assess the results. PAE is validated on vision-based web navigation tasks and outperforms other methods with real-world human-annotated benchmarks. The open-source code and checkpoints are available at https://yanqval.github.io/PAE/....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...Towards Exception Safety Code Generation with Intermediate Language Agents Framework.yaml b/...Towards Exception Safety Code Generation with Intermediate Language Agents Framework.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-18"
+author: Xuanming Zhang
+title: 'Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.11713
+summary: 'This academic paper introduces Seeker, a multi-agent framework that uses large language models to improve exception handling in code. The framework addresses three key issues: Insensitive Detection of Fragile Code, Inaccurate Capture of Exception Block, and Distorted Handling Solution. Seeker uses agents to assist LLMs in detecting, capturing, and resolving exceptions more effectively, providing valuable insights for future improvements in code reliability....'
+opinion: placeholder
+tags:
+    - ML
diff --git a/...ument QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation.yaml b/...ument QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-18"
+author: Manan Suri
+title: 'VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.10704
+summary: This paper presents VisDoMBench, a benchmark for evaluating question answering systems in multi-document settings with rich multimodal content. The paper also introduces VisDoMRAG, a novel multimodal approach that combines visual and textual retrieval augmented generation, improving accuracy and answer verifiability. The approach outperforms unimodal and long-context LLM baselines by 12-20% in end-to-end multimodal document QA....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-18 When to Speak, When to Abstain: Contrastive Decoding with Abstention.yaml b/current/2024-12-18 When to Speak, When to Abstain: Contrastive Decoding with Abstention.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-18"
+author: Hyuhng Joon Kim
+title: 'When to Speak, When to Abstain: Contrastive Decoding with Abstention'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.12527
+summary: This paper introduces Contrastive Decoding with Abstention (CDA), a method that helps large language models decide when to provide an answer and when to abstain, improving their reliability and trustworthiness....
+opinion: placeholder
+tags:
+    - ML