Automated report

deep-diver · Dec 30, 2024 · f10d795 · f10d795
1 parent 6029362
commit f10d795
Show file tree

Hide file tree

Showing 11 changed files with 99 additions and 0 deletions.
diff --git a/...From Elements to Design: A Layered Approach for Automatic Graphic Design Composition.yaml b/...From Elements to Design: A Layered Approach for Automatic Graphic Design Composition.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-29"
+author: Jiawei Lin
+title: 'From Elements to Design: A Layered Approach for Automatic Graphic Design Composition'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.19712
+summary: This paper proposes a new method called LaDeCo to automatically compose graphic designs by dividing the elements into different layers and predicting their attributes. The method is designed to make the generation process smoother and clearer, and it can be used for various tasks such as resolution adjustment and element filling....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-29 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs.yaml b/current/2024-12-29 HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-29"
+author: Junying Chen
+title: HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
+thumbnail: ""
+link: https://huggingface.co/papers/2412.18925
+summary: 'HuatuoGPT-o1 is a medical LLM that can perform complex reasoning and outperforms general and medical-specific baselines using only 40K verifiable problems. It uses a two-stage approach: a medical verifier to guide the search for a complex reasoning trajectory and reinforcement learning with verifier-based rewards to enhance complex reasoning. This approach is hoped to inspire advancements in reasoning across medical and other specialized domains....'
+opinion: placeholder
+tags:
+    - ML
diff --git a/...ent Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models.yaml b/...ent Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-29"
+author: Zehan Wang
+title: 'Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.18605
+summary: This paper introduces a new model called Orient Anything that can accurately estimate the orientation of objects in a single image by learning from 3D models and synthetic-to-real transfer strategies. It achieves state-of-the-art accuracy and improves various applications such as spatial concept comprehension and 3D object pose adjustment. ...
+opinion: placeholder
+tags:
+    - ML
diff --git a/... Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.yaml b/... Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-29"
+author: Ziang Yan
+title: 'Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.19326
+summary: The paper proposes a new method called Task Preference Optimization (TPO) to improve multimodal large language models (MLLMs) by incorporating task-specific heads and rich visual labels during training. TPO significantly enhances the MLLM's multimodal capabilities and task-specific performance, and demonstrates robust zero-shot capabilities across various tasks....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-30 1.58-bit FLUX.yaml b/current/2024-12-30 1.58-bit FLUX.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Chenglin Yang
+title: 1.58-bit FLUX
+thumbnail: ""
+link: https://huggingface.co/papers/2412.18653
+summary: We introduce 1.58-bit FLUX, a method to reduce the size of a text-to-image model while maintaining its performance. This is done without using any image data and by developing a custom kernel for the model. The result is a model that uses less storage, memory, and time for inference, while still generating good images....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...ch: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era.yaml b/...ch: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Yanlin Feng
+title: 'CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.18702
+summary: To improve retrieval from graph data for large language models (LLMs), the paper proposes property graph views on top of RDF knowledge graphs. It introduces CypherBench, a benchmark with property graphs for efficient LLM querying. The paper also addresses challenges in converting RDF to property graphs and generating tasks for Cypher....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...-12-30 Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey.yaml b/...-12-30 Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Liang Chen
+title: 'Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.18619
+summary: 'This paper proposes a new taxonomy for multimodal learning that unifies both understanding and generation tasks within the Next Token Prediction (NTP) framework. The taxonomy covers five key aspects: multimodal tokenization, model architectures, task representation, datasets & evaluation, and open challenges. An associated GitHub repository is available at https://github.com/LMM101/Awesome-Multimodal-Next-Token-Prediction....'
+opinion: placeholder
+tags:
+    - ML
diff --git a/...024-12-30 SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images.yaml b/...024-12-30 SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Risa Shinoda
+title: 'SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.17606
+summary: This paper introduces SBSFigures, a dataset for pre-training figure QA that uses a stage-by-stage pipeline to create chart figures with complete annotations and diverse topics, making it possible to achieve efficient training with a limited amount of real-world chart data....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-30 Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging.yaml b/current/2024-12-30 Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Hua Farn
+title: Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
+thumbnail: ""
+link: https://huggingface.co/papers/2412.19512
+summary: The paper introduces a method to improve downstream task performance in safety-aligned LLMs without relying on additional safety data. This method involves merging the weights of pre- and post-fine-tuned safety-aligned models, which helps maintain the safety of LLMs while enhancing their performance. The approach is effective in mitigating safety degradation and offers a practical solution for adapting safety-aligned LLMs....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...ent/2024-12-30 The Superposition of Diffusion Models Using the Itô Density Estimator.yaml b/...ent/2024-12-30 The Superposition of Diffusion Models Using the Itô Density Estimator.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Marta Skreta
+title: The Superposition of Diffusion Models Using the Itô Density Estimator
+thumbnail: ""
+link: https://huggingface.co/papers/2412.17762
+summary: The Cambrian explosion of easily accessible pre-trained diffusion models suggests a demand for methods that combine multiple different pre-trained diffusion models without incurring the significant computational burden of re-training a larger combined model. In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. Theoretically, we derive superposition from rigorous first principles stemmi...
+opinion: placeholder
+tags:
+    - ML
diff --git a/...o-shot Customized Video Generation with the Inherent Force of Video Diffusion Models.yaml b/...o-shot Customized Video Generation with the Inherent Force of Video Diffusion Models.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-30"
+author: Tao Wu
+title: 'VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.19645
+summary: This paper presents a new approach to create customized videos without needing additional models. It uses a method called Video Diffusion Model (VDM) to extract and inject subject features directly from reference images, and improves the consistency of subject appearance in the generated videos....
+opinion: placeholder
+tags:
+    - ML