Skip to content

Commit

Permalink
Automated report
Browse files Browse the repository at this point in the history
  • Loading branch information
deep-diver committed Dec 31, 2024
1 parent 16603dd commit 20a4fcd
Show file tree
Hide file tree
Showing 13 changed files with 117 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-30"
author: Qingyan Bai
title: 'Edicho: Consistent Image Editing in the Wild'
thumbnail: ""
link: https://huggingface.co/papers/2412.21079
summary: Edicho is an algorithm that uses diffusion models to make image editing more consistent across different images, even when factors like object poses, lighting conditions, and photography environments change. It uses an attention manipulation module and a refined classifier-free guidance denoising strategy, both of which take into account pre-estimated image correspondence. The algorithm is compatible with most diffusion-based editing methods and has been shown to work well in various settings. T...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-30"
author: Yichao Fu
title: Efficiently Serving LLM Reasoning Programs with Certaindex
thumbnail: ""
link: https://huggingface.co/papers/2412.20993
summary: Dynasor is a system that optimizes inference-time compute for LLM reasoning queries by tracking and scheduling requests within the queries and using Certaindex, a proxy that measures statistical reasoning progress based on model certainty, to guide compute allocation dynamically. It co-adapts scheduling with reasoning progress to balance accuracy, latency, and cost, reducing compute by up to 50% in batch processing and sustaining higher query rates or tighter latency SLOs in online serving....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-30"
author: Yang Shen
title: 'Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization'
thumbnail: ""
link: https://huggingface.co/papers/2412.18525
summary: This paper proposes Explanatory Instructions as a way to define computer vision tasks through detailed linguistic transformations. By training a vision-language model on a large dataset of image-instruction-output triplets, the model learns to follow these instructions and achieves zero-shot capabilities for both seen and unseen tasks....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-30"
author: Zhenyang Cai
title: On the Compositional Generalization of Multimodal LLMs for Medical Imaging
thumbnail: ""
link: https://huggingface.co/papers/2412.20070
summary: Med-MAT is a collection of 106 medical datasets used to study how multimodal large language models (MLLMs) can understand unseen medical images by combining learned elements. MLLMs can use this ability, called compositional generalization, to improve their performance on specific tasks and work well with different types of data and models....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-30"
author: Yujie Luo
title: 'OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System'
thumbnail: ""
link: https://huggingface.co/papers/2412.20005
summary: OneKE is a Dockerized knowledge extraction system that can extract information from the web and PDF books, and is designed to support various domains. It uses multiple agents and a configurable knowledge base to improve performance, and has been evaluated on benchmark datasets and case studies, demonstrating its effectiveness and adaptability....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-30"
author: Chia-Yu Hung
title: 'TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization'
thumbnail: ""
link: https://huggingface.co/papers/2412.21037
summary: TangoFlux is a fast and accurate text-to-audio model that can generate 30 seconds of audio in 3.7 seconds on a single GPU. It uses a new method called CLAP-Ranked Preference Optimization (CRPO) to improve the alignment of text and audio. TangoFlux outperforms other models in both objective and subjective tests, and the code and models are available for others to use....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-30"
author: Jiayi Pan
title: Training Software Engineering Agents and Verifiers with SWE-Gym
thumbnail: ""
link: https://huggingface.co/papers/2412.21139
summary: SWE-Gym, a software engineering environment, is introduced to train agents and verifiers using real-world Python tasks. The paper presents a method to train language model-based SWE agents, achieving up to 19% absolute gains in resolve rate. The paper also introduces the use of verifiers trained on agent trajectories, resulting in a new state-of-the-art for open-weight SWE agents....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-31"
author: Ohad Rahamim
title: 'Bringing Objects to Life: 4D generation from 3D objects'
thumbnail: ""
link: https://huggingface.co/papers/2412.20422
summary: The paper describes a new method for animating user-provided 3D objects by using text prompts to guide the animation process. The method involves converting a 3D mesh into a 4D NeRF and then using an Image-to-Video diffusion model to animate the object. The paper also introduces an incremental viewpoint selection protocol and a masked Score Distillation Sampling loss to improve motion realism. The method is evaluated and found to outperform other approaches in terms of temporal coherence, prompt...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-31"
author: Xingyu Chen
title: Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
thumbnail: ""
link: https://huggingface.co/papers/2412.21187
summary: This paper studies the issue of overthinking in o1-like LLMs, where too many computational resources are used for simple problems with little benefit. It introduces new efficiency metrics and proposes strategies to reduce computational overhead without sacrificing accuracy....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-31"
author: Mikhail Tikhomirov
title: Facilitating large language model Russian adaptation with Learned Embedding Propagation
thumbnail: ""
link: https://huggingface.co/papers/2412.21140
summary: This paper introduces Learned Embedding Propagation (LEP) as a cost-efficient method for adapting large language models (LLMs) to specific languages. LEP has lower training data size requirements and minimizes the impact on existing LLM knowledge by using an ad-hoc embedding propagation procedure to implant new language knowledge into existing instruct-tuned variants. The authors evaluated LEP on four Russian vocabulary adaptations for LLaMa-3-8B and Mistral-7B, demonstrating that it is competit...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-31"
author: Zhaojian Yu
title: 'HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation'
thumbnail: ""
link: https://huggingface.co/papers/2412.21199
summary: 'This paper introduces a new task called self-invoking code generation to evaluate the problem-solving abilities of Large Language Models (LLMs). The task involves solving a base problem and using its solution to address a more complex problem. The paper proposes three new benchmarks: HumanEval Pro, MBPP Pro, and BigCodeBench-Lite Pro, specifically designed for this task. The experimental results show that most LLMs perform well on traditional code generation benchmarks but struggle with self-inv...'
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-31"
author: Hyunsoo Cha
title: 'PERSE: Personalized 3D Generative Avatars from A Single Portrait'
thumbnail: ""
link: https://huggingface.co/papers/2412.21206
summary: PERSE is a method that creates a personalized 3D avatar from a single portrait, allowing for the editing of facial attributes in a continuous and disentangled latent space. It uses a synthetic attribute dataset and a novel pipeline to produce high-quality, photorealistic 2D videos. The method enforces smooth transitions in the latent space using a latent space regularization technique and produces high-quality avatars with interpolated attributes while preserving the identity of the reference pe...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-31"
author: Haoran Wei
title: 'Slow Perception: Let''s Perceive Geometric Figures Step-by-step'
thumbnail: ""
link: https://huggingface.co/papers/2412.20631
summary: This paper introduces a concept called 'slow perception' where the model gradually perceives basic point-line combinations to reconstruct complex geometric structures. This approach aims to improve the accuracy of copying geometric figures, which is considered the first step to visual reasoning. The paper proposes a 'perceptual ruler' to trace each line stroke-by-stroke and suggests that a slower perception manner can lead to better results....
opinion: placeholder
tags:
- ML

0 comments on commit 20a4fcd

Please sign in to comment.