generated from codingpot/newsletter_awesome_articles
-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
16603dd
commit 20a4fcd
Showing
13 changed files
with
117 additions
and
0 deletions.
There are no files selected for viewing
9 changes: 9 additions & 0 deletions
9
current/2024-12-30 Edicho: Consistent Image Editing in the Wild.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Qingyan Bai | ||
title: 'Edicho: Consistent Image Editing in the Wild' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.21079 | ||
summary: Edicho is an algorithm that uses diffusion models to make image editing more consistent across different images, even when factors like object poses, lighting conditions, and photography environments change. It uses an attention manipulation module and a refined classifier-free guidance denoising strategy, both of which take into account pre-estimated image correspondence. The algorithm is compatible with most diffusion-based editing methods and has been shown to work well in various settings. T... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-30 Efficiently Serving LLM Reasoning Programs with Certaindex.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Yichao Fu | ||
title: Efficiently Serving LLM Reasoning Programs with Certaindex | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.20993 | ||
summary: Dynasor is a system that optimizes inference-time compute for LLM reasoning queries by tracking and scheduling requests within the queries and using Certaindex, a proxy that measures statistical reasoning progress based on model certainty, to guide compute allocation dynamically. It co-adapts scheduling with reasoning progress to balance accuracy, latency, and cost, reducing compute by up to 50% in batch processing and sustaining higher query rates or tighter latency SLOs in online serving.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...nstructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Yang Shen | ||
title: 'Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.18525 | ||
summary: This paper proposes Explanatory Instructions as a way to define computer vision tasks through detailed linguistic transformations. By training a vision-language model on a large dataset of image-instruction-output triplets, the model learns to follow these instructions and achieves zero-shot capabilities for both seen and unseen tasks.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...024-12-30 On the Compositional Generalization of Multimodal LLMs for Medical Imaging.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Zhenyang Cai | ||
title: On the Compositional Generalization of Multimodal LLMs for Medical Imaging | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.20070 | ||
summary: Med-MAT is a collection of 106 medical datasets used to study how multimodal large language models (MLLMs) can understand unseen medical images by combining learned elements. MLLMs can use this ability, called compositional generalization, to improve their performance on specific tasks and work well with different types of data and models.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...-12-30 OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Yujie Luo | ||
title: 'OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.20005 | ||
summary: OneKE is a Dockerized knowledge extraction system that can extract information from the web and PDF books, and is designed to support various domains. It uses multiple agents and a configurable knowledge base to improve performance, and has been evaluated on benchmark datasets and case studies, demonstrating its effectiveness and adaptability.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
... Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Chia-Yu Hung | ||
title: 'TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.21037 | ||
summary: TangoFlux is a fast and accurate text-to-audio model that can generate 30 seconds of audio in 3.7 seconds on a single GPU. It uses a new method called CLAP-Ranked Preference Optimization (CRPO) to improve the alignment of text and audio. TangoFlux outperforms other models in both objective and subjective tests, and the code and models are available for others to use.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-30 Training Software Engineering Agents and Verifiers with SWE-Gym.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Jiayi Pan | ||
title: Training Software Engineering Agents and Verifiers with SWE-Gym | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.21139 | ||
summary: SWE-Gym, a software engineering environment, is introduced to train agents and verifiers using real-world Python tasks. The paper presents a method to train language model-based SWE agents, achieving up to 19% absolute gains in resolve rate. The paper also introduces the use of verifiers trained on agent trajectories, resulting in a new state-of-the-art for open-weight SWE agents.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-31 Bringing Objects to Life: 4D generation from 3D objects.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-31" | ||
author: Ohad Rahamim | ||
title: 'Bringing Objects to Life: 4D generation from 3D objects' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.20422 | ||
summary: The paper describes a new method for animating user-provided 3D objects by using text prompts to guide the animation process. The method involves converting a 3D mesh into a 4D NeRF and then using an Image-to-Video diffusion model to animate the object. The paper also introduces an incremental viewpoint selection protocol and a masked Score Distillation Sampling loss to improve motion realism. The method is evaluated and found to outperform other approaches in terms of temporal coherence, prompt... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-31 Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-31" | ||
author: Xingyu Chen | ||
title: Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.21187 | ||
summary: This paper studies the issue of overthinking in o1-like LLMs, where too many computational resources are used for simple problems with little benefit. It introduces new efficiency metrics and proposes strategies to reduce computational overhead without sacrificing accuracy.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...ilitating large language model Russian adaptation with Learned Embedding Propagation.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-31" | ||
author: Mikhail Tikhomirov | ||
title: Facilitating large language model Russian adaptation with Learned Embedding Propagation | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.21140 | ||
summary: This paper introduces Learned Embedding Propagation (LEP) as a cost-efficient method for adapting large language models (LLMs) to specific languages. LEP has lower training data size requirements and minimizes the impact on existing LLM knowledge by using an ad-hoc embedding propagation procedure to implant new language knowledge into existing instruct-tuned variants. The authors evaluated LEP on four Russian vocabulary adaptations for LLaMa-3-8B and Mistral-7B, demonstrating that it is competit... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
... Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-31" | ||
author: Zhaojian Yu | ||
title: 'HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.21199 | ||
summary: 'This paper introduces a new task called self-invoking code generation to evaluate the problem-solving abilities of Large Language Models (LLMs). The task involves solving a base problem and using its solution to address a more complex problem. The paper proposes three new benchmarks: HumanEval Pro, MBPP Pro, and BigCodeBench-Lite Pro, specifically designed for this task. The experimental results show that most LLMs perform well on traditional code generation benchmarks but struggle with self-inv...' | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-31 PERSE: Personalized 3D Generative Avatars from A Single Portrait.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-31" | ||
author: Hyunsoo Cha | ||
title: 'PERSE: Personalized 3D Generative Avatars from A Single Portrait' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.21206 | ||
summary: PERSE is a method that creates a personalized 3D avatar from a single portrait, allowing for the editing of facial attributes in a continuous and disentangled latent space. It uses a synthetic attribute dataset and a novel pipeline to produce high-quality, photorealistic 2D videos. The method enforces smooth transitions in the latent space using a latent space regularization technique and produces high-quality avatars with interpolated attributes while preserving the identity of the reference pe... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-31 Slow Perception: Let's Perceive Geometric Figures Step-by-step.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-31" | ||
author: Haoran Wei | ||
title: 'Slow Perception: Let''s Perceive Geometric Figures Step-by-step' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.20631 | ||
summary: This paper introduces a concept called 'slow perception' where the model gradually perceives basic point-line combinations to reconstruct complex geometric structures. This approach aims to improve the accuracy of copying geometric figures, which is considered the first step to visual reasoning. The paper proposes a 'perceptual ruler' to trace each line stroke-by-stroke and suggests that a slower perception manner can lead to better results.... | ||
opinion: placeholder | ||
tags: | ||
- ML |