Automated report

deep-diver · Dec 31, 2024 · 20a4fcd · 20a4fcd
1 parent 16603dd
commit 20a4fcd
Show file tree

Hide file tree

Showing 13 changed files with 117 additions and 0 deletions.
diff --git a/current/2024-12-30 Edicho: Consistent Image Editing in the Wild.yaml b/current/2024-12-30 Edicho: Consistent Image Editing in the Wild.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Qingyan Bai
+title: 'Edicho: Consistent Image Editing in the Wild'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.21079
+summary: Edicho is an algorithm that uses diffusion models to make image editing more consistent across different images, even when factors like object poses, lighting conditions, and photography environments change. It uses an attention manipulation module and a refined classifier-free guidance denoising strategy, both of which take into account pre-estimated image correspondence. The algorithm is compatible with most diffusion-based editing methods and has been shown to work well in various settings. T...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-30 Efficiently Serving LLM Reasoning Programs with Certaindex.yaml b/current/2024-12-30 Efficiently Serving LLM Reasoning Programs with Certaindex.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Yichao Fu
+title: Efficiently Serving LLM Reasoning Programs with Certaindex
+thumbnail: ""
+link: https://huggingface.co/papers/2412.20993
+summary: Dynasor is a system that optimizes inference-time compute for LLM reasoning queries by tracking and scheduling requests within the queries and using Certaindex, a proxy that measures statistical reasoning progress based on model certainty, to guide compute allocation dynamically. It co-adapts scheduling with reasoning progress to balance accuracy, latency, and cost, reducing compute by up to 50% in batch processing and sustaining higher query rates or tighter latency SLOs in online serving....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...nstructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization.yaml b/...nstructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Yang Shen
+title: 'Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.18525
+summary: This paper proposes Explanatory Instructions as a way to define computer vision tasks through detailed linguistic transformations. By training a vision-language model on a large dataset of image-instruction-output triplets, the model learns to follow these instructions and achieves zero-shot capabilities for both seen and unseen tasks....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...024-12-30 On the Compositional Generalization of Multimodal LLMs for Medical Imaging.yaml b/...024-12-30 On the Compositional Generalization of Multimodal LLMs for Medical Imaging.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Zhenyang Cai
+title: On the Compositional Generalization of Multimodal LLMs for Medical Imaging
+thumbnail: ""
+link: https://huggingface.co/papers/2412.20070
+summary: Med-MAT is a collection of 106 medical datasets used to study how multimodal large language models (MLLMs) can understand unseen medical images by combining learned elements. MLLMs can use this ability, called compositional generalization, to improve their performance on specific tasks and work well with different types of data and models....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...-12-30 OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System.yaml b/...-12-30 OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Yujie Luo
+title: 'OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.20005
+summary: OneKE is a Dockerized knowledge extraction system that can extract information from the web and PDF books, and is designed to support various domains. It uses multiple agents and a configurable knowledge base to improve performance, and has been evaluated on benchmark datasets and case studies, demonstrating its effectiveness and adaptability....
+opinion: placeholder
+tags:
+    - ML
diff --git a/... Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization.yaml b/... Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Chia-Yu Hung
+title: 'TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.21037
+summary: TangoFlux is a fast and accurate text-to-audio model that can generate 30 seconds of audio in 3.7 seconds on a single GPU. It uses a new method called CLAP-Ranked Preference Optimization (CRPO) to improve the alignment of text and audio. TangoFlux outperforms other models in both objective and subjective tests, and the code and models are available for others to use....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-30 Training Software Engineering Agents and Verifiers with SWE-Gym.yaml b/current/2024-12-30 Training Software Engineering Agents and Verifiers with SWE-Gym.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Jiayi Pan
+title: Training Software Engineering Agents and Verifiers with SWE-Gym
+thumbnail: ""
+link: https://huggingface.co/papers/2412.21139
+summary: SWE-Gym, a software engineering environment, is introduced to train agents and verifiers using real-world Python tasks. The paper presents a method to train language model-based SWE agents, achieving up to 19% absolute gains in resolve rate. The paper also introduces the use of verifiers trained on agent trajectories, resulting in a new state-of-the-art for open-weight SWE agents....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-31 Bringing Objects to Life: 4D generation from 3D objects.yaml b/current/2024-12-31 Bringing Objects to Life: 4D generation from 3D objects.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-31"
+author: Ohad Rahamim
+title: 'Bringing Objects to Life: 4D generation from 3D objects'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.20422
+summary: The paper describes a new method for animating user-provided 3D objects by using text prompts to guide the animation process. The method involves converting a 3D mesh into a 4D NeRF and then using an Image-to-Video diffusion model to animate the object. The paper also introduces an incremental viewpoint selection protocol and a masked Score Distillation Sampling loss to improve motion realism. The method is evaluated and found to outperform other approaches in terms of temporal coherence, prompt...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-31 Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs.yaml b/current/2024-12-31 Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-31"
+author: Xingyu Chen
+title: Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
+thumbnail: ""
+link: https://huggingface.co/papers/2412.21187
+summary: This paper studies the issue of overthinking in o1-like LLMs, where too many computational resources are used for simple problems with little benefit. It introduces new efficiency metrics and proposes strategies to reduce computational overhead without sacrificing accuracy....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...ilitating large language model Russian adaptation with Learned Embedding Propagation.yaml b/...ilitating large language model Russian adaptation with Learned Embedding Propagation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-31"
+author: Mikhail Tikhomirov
+title: Facilitating large language model Russian adaptation with Learned Embedding Propagation
+thumbnail: ""
+link: https://huggingface.co/papers/2412.21140
+summary: This paper introduces Learned Embedding Propagation (LEP) as a cost-efficient method for adapting large language models (LLMs) to specific languages. LEP has lower training data size requirements and minimizes the impact on existing LLM knowledge by using an ad-hoc embedding propagation procedure to implant new language knowledge into existing instruct-tuned variants. The authors evaluated LEP on four Russian vocabulary adaptations for LLaMa-3-8B and Mistral-7B, demonstrating that it is competit...
+opinion: placeholder
+tags:
+    - ML
diff --git a/... Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation.yaml b/... Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-31"
+author: Zhaojian Yu
+title: 'HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.21199
+summary: 'This paper introduces a new task called self-invoking code generation to evaluate the problem-solving abilities of Large Language Models (LLMs). The task involves solving a base problem and using its solution to address a more complex problem. The paper proposes three new benchmarks: HumanEval Pro, MBPP Pro, and BigCodeBench-Lite Pro, specifically designed for this task. The experimental results show that most LLMs perform well on traditional code generation benchmarks but struggle with self-inv...'
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-31 PERSE: Personalized 3D Generative Avatars from A Single Portrait.yaml b/current/2024-12-31 PERSE: Personalized 3D Generative Avatars from A Single Portrait.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-31"
+author: Hyunsoo Cha
+title: 'PERSE: Personalized 3D Generative Avatars from A Single Portrait'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.21206
+summary: PERSE is a method that creates a personalized 3D avatar from a single portrait, allowing for the editing of facial attributes in a continuous and disentangled latent space. It uses a synthetic attribute dataset and a novel pipeline to produce high-quality, photorealistic 2D videos. The method enforces smooth transitions in the latent space using a latent space regularization technique and produces high-quality avatars with interpolated attributes while preserving the identity of the reference pe...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-31 Slow Perception: Let's Perceive Geometric Figures Step-by-step.yaml b/current/2024-12-31 Slow Perception: Let's Perceive Geometric Figures Step-by-step.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-31"
+author: Haoran Wei
+title: 'Slow Perception: Let''s Perceive Geometric Figures Step-by-step'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.20631
+summary: This paper introduces a concept called 'slow perception' where the model gradually perceives basic point-line combinations to reconstruct complex geometric structures. This approach aims to improve the accuracy of copying geometric figures, which is considered the first step to visual reasoning. The paper proposes a 'perceptual ruler' to trace each line stroke-by-stroke and suggests that a slower perception manner can lead to better results....
+opinion: placeholder
+tags:
+    - ML