generated from codingpot/newsletter_awesome_articles
-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
6029362
commit f10d795
Showing
11 changed files
with
99 additions
and
0 deletions.
There are no files selected for viewing
9 changes: 9 additions & 0 deletions
9
...From Elements to Design: A Layered Approach for Automatic Graphic Design Composition.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-29" | ||
author: Jiawei Lin | ||
title: 'From Elements to Design: A Layered Approach for Automatic Graphic Design Composition' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.19712 | ||
summary: This paper proposes a new method called LaDeCo to automatically compose graphic designs by dividing the elements into different layers and predicting their attributes. The method is designed to make the generation process smoother and clearer, and it can be used for various tasks such as resolution adjustment and element filling.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-29 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-29" | ||
author: Junying Chen | ||
title: HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.18925 | ||
summary: 'HuatuoGPT-o1 is a medical LLM that can perform complex reasoning and outperforms general and medical-specific baselines using only 40K verifiable problems. It uses a two-stage approach: a medical verifier to guide the search for a complex reasoning trajectory and reinforcement learning with verifier-based rewards to enhance complex reasoning. This approach is hoped to inspire advancements in reasoning across medical and other specialized domains....' | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...ent Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-29" | ||
author: Zehan Wang | ||
title: 'Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.18605 | ||
summary: This paper introduces a new model called Orient Anything that can accurately estimate the orientation of objects in a single image by learning from 3D models and synthetic-to-real transfer strategies. It achieves state-of-the-art accuracy and improves various applications such as spatial concept comprehension and 3D object pose adjustment. ... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
... Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-29" | ||
author: Ziang Yan | ||
title: 'Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.19326 | ||
summary: The paper proposes a new method called Task Preference Optimization (TPO) to improve multimodal large language models (MLLMs) by incorporating task-specific heads and rich visual labels during training. TPO significantly enhances the MLLM's multimodal capabilities and task-specific performance, and demonstrates robust zero-shot capabilities across various tasks.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Chenglin Yang | ||
title: 1.58-bit FLUX | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.18653 | ||
summary: We introduce 1.58-bit FLUX, a method to reduce the size of a text-to-image model while maintaining its performance. This is done without using any image data and by developing a custom kernel for the model. The result is a model that uses less storage, memory, and time for inference, while still generating good images.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...ch: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Yanlin Feng | ||
title: 'CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.18702 | ||
summary: To improve retrieval from graph data for large language models (LLMs), the paper proposes property graph views on top of RDF knowledge graphs. It introduces CypherBench, a benchmark with property graphs for efficient LLM querying. The paper also addresses challenges in converting RDF to property graphs and generating tasks for Cypher.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...-12-30 Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Liang Chen | ||
title: 'Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.18619 | ||
summary: 'This paper proposes a new taxonomy for multimodal learning that unifies both understanding and generation tasks within the Next Token Prediction (NTP) framework. The taxonomy covers five key aspects: multimodal tokenization, model architectures, task representation, datasets & evaluation, and open challenges. An associated GitHub repository is available at https://github.com/LMM101/Awesome-Multimodal-Next-Token-Prediction....' | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...024-12-30 SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Risa Shinoda | ||
title: 'SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.17606 | ||
summary: This paper introduces SBSFigures, a dataset for pre-training figure QA that uses a stage-by-stage pipeline to create chart figures with complete annotations and diverse topics, making it possible to achieve efficient training with a limited amount of real-world chart data.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-12-30 Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Hua Farn | ||
title: Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.19512 | ||
summary: The paper introduces a method to improve downstream task performance in safety-aligned LLMs without relying on additional safety data. This method involves merging the weights of pre- and post-fine-tuned safety-aligned models, which helps maintain the safety of LLMs while enhancing their performance. The approach is effective in mitigating safety degradation and offers a practical solution for adapting safety-aligned LLMs.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...ent/2024-12-30 The Superposition of Diffusion Models Using the Itô Density Estimator.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Marta Skreta | ||
title: The Superposition of Diffusion Models Using the Itô Density Estimator | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.17762 | ||
summary: The Cambrian explosion of easily accessible pre-trained diffusion models suggests a demand for methods that combine multiple different pre-trained diffusion models without incurring the significant computational burden of re-training a larger combined model. In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. Theoretically, we derive superposition from rigorous first principles stemmi... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...o-shot Customized Video Generation with the Inherent Force of Video Diffusion Models.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-12-30" | ||
author: Tao Wu | ||
title: 'VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2412.19645 | ||
summary: This paper presents a new approach to create customized videos without needing additional models. It uses a method called Video Diffusion Model (VDM) to extract and inject subject features directly from reference images, and improves the consistency of subject appearance in the generated videos.... | ||
opinion: placeholder | ||
tags: | ||
- ML |