Automated report

deep-diver · Dec 20, 2024 · ec20f9a · ec20f9a
1 parent 11b5fc4
commit ec20f9a
Show file tree

Hide file tree

Showing 16 changed files with 144 additions and 0 deletions.
diff --git a/...19 AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling.yaml b/...19 AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Zihan Liu
+title: 'AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.15084
+summary: This paper introduces AceMath, a suite of math models that are trained to solve complex problems and evaluate their solutions. It also introduces a reward model that can identify the correct solutions. The models are trained using a process called supervised fine-tuning and are evaluated on a benchmark called AceMath-RewardBench. The resulting models outperform existing models and can be accessed at a provided link....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-19 Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion.yaml b/current/2024-12-19 Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Jixuan He
+title: Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
+thumbnail: ""
+link: https://huggingface.co/papers/2412.14462
+summary: This paper introduces a new way to insert objects into scenes by considering how well the object fits into the scene, called affordance-aware object insertion. They created a large dataset of objects and scenes to train a model called Mask-Aware Dual Diffusion (MADD) to do this. MADD is designed to consider both the colors of the scene and the shape of the object to make the insertion look natural. The model is able to place objects in new scenes and even do well on images from the internet....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...d Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation.yaml b/...d Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Wang Zhao
+title: 'DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.15200
+summary: This paper introduces DI-PCG, a new and efficient method for generating high-quality 3D assets by using a lightweight diffusion transformer model to directly treat PCG parameters as the denoising target and the observed images as conditions to control parameter generation. DI-PCG is efficient and effective, requiring only 7.6M network parameters and 30 GPU hours to train, and demonstrates superior performance in recovering parameters accurately and generalizing well to in-the-wild images. The me...
+opinion: placeholder
+tags:
+    - ML
diff --git a/...19 Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception.yaml b/...19 Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Yanpeng Sun
+title: Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
+thumbnail: ""
+link: https://huggingface.co/papers/2412.14233
+summary: We propose to use visual specialists trained on annotated images to enhance image captions. Our approach, DCE, uses object attributes and relations to improve visual understanding tasks and reasoning. We will release the source code and pipeline to easily combine other visual specialists....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...nt/2024-12-19 Flowing from Words to Pixels: A Framework for Cross-Modality Evolution.yaml b/...nt/2024-12-19 Flowing from Words to Pixels: A Framework for Cross-Modality Evolution.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Qihao Liu
+title: 'Flowing from Words to Pixels: A Framework for Cross-Modality Evolution'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.15213
+summary: This paper proposes a new framework called CrossFlow for cross-modal flow matching, which directly maps one modality to another without using noise distribution or conditioning mechanism. The framework uses Variational Encoders and Classifier-free guidance, and it outperforms standard flow matching for text-to-image generation and is on par with or outperforms state-of-the-art for image captioning, depth estimation, and image super-resolution....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-19 How to Synthesize Text Data without Model Collapse?.yaml b/current/2024-12-19 How to Synthesize Text Data without Model Collapse?.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Xuekai Zhu
+title: How to Synthesize Text Data without Model Collapse?
+thumbnail: ""
+link: https://huggingface.co/papers/2412.14689
+summary: This paper investigates the impact of synthetic data on language model training and proposes a method to synthesize data without causing model collapse. By pre-training language models with different proportions of synthetic data, the paper reveals a negative correlation between synthetic data and model performance. The paper also identifies distributional shift and over-concentration of n-gram features in synthetic data. To address these issues, the paper proposes token editing on human-produce...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-19 LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis.yaml b/current/2024-12-19 LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Hanlin Wang
+title: 'LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.15214
+summary: This paper introduces a new method for controlling object trajectories in image-to-video synthesis by adding a depth dimension to drag-based interaction, allowing for more precise manipulation of object movements and broadening the scope of creativity....
+opinion: placeholder
+tags:
+    - ML
diff --git a/... v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks.yaml b/... v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Yushi Bai
+title: 'LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.15204
+summary: LongBench v2 is a benchmark that tests LLMs' ability to understand and reason in long contexts across various real-world tasks. It has 503 questions with contexts up to 2M words, and even the best model only gets 50.1% correct. However, a model with longer thinking time gets 57.7% correct, beating human experts....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-19 MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval.yaml b/current/2024-12-19 MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Junjie Zhou
+title: 'MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.14475
+summary: This paper introduces MegaPairs, a method for creating large amounts of training data for multimodal retrieval. It uses vision language models and open-domain images to generate high-quality data, which allows a multimodal retriever to outperform a baseline model trained on 70 times more data from existing datasets. MegaPairs can be easily scaled up and has produced more than 26 million training instances. The paper also introduces several new models that achieve state-of-the-art zero-shot perfo...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-19 Progressive Multimodal Reasoning via Active Retrieval.yaml b/current/2024-12-19 Progressive Multimodal Reasoning via Active Retrieval.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Guanting Dong
+title: Progressive Multimodal Reasoning via Active Retrieval
+thumbnail: ""
+link: https://huggingface.co/papers/2412.14835
+summary: AR-MCTS is a framework that enhances the reasoning abilities of large language models by using Active Retrieval and Monte Carlo Tree Search to find key insights from a hybrid-modal retrieval corpus. This helps in improving the diversity and reliability of reasoning space, and it yields better performance on complex multimodal reasoning tasks....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-19 Qwen2.5 Technical Report.yaml b/current/2024-12-19 Qwen2.5 Technical Report.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Qwen
+title: Qwen2.5 Technical Report
+thumbnail: ""
+link: https://huggingface.co/papers/2412.15115
+summary: Qwen2.5 is a series of large language models designed to meet various needs, with significant improvements in pre-training and post-training stages. It includes different sizes and variants, such as base, instruction-tuned, quantized, and MoE (Mixture-of-Experts) models. Qwen2.5 has demonstrated top-tier performance on a wide range of language understanding and reasoning benchmarks, and is available through open-weight offerings and hosted solutions from Alibaba Cloud Model Studio....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-19 TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation.yaml b/current/2024-12-19 TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Jiatong Li
+title: 'TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.14642
+summary: 'This paper introduces TOMG-Bench, a benchmark for evaluating the open-domain molecule generation capability of LLMs. It includes three tasks: molecule editing, molecule optimization, and customized molecule generation. The benchmark also includes an automated evaluation system and a new instruction tuning dataset called OpenMolIns. Llama3.1-8B performed better than other open-source general LLMs and even GPT-3.5-turbo on TOMG-Bench....'
+opinion: placeholder
+tags:
+    - ML
diff --git a/...12-19 UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency.yaml b/...12-19 UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-19"
+author: Enis Simsar
+title: 'UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.15216
+summary: We propose a new way to edit images without needing example images of the edits during training. Our method uses a process called Cycle Edit Consistency to make sure the edited images look good and follow instructions. This allows us to train on more types of data and makes our method better at making edits with high quality and precision. Our method is better than previous ones because it doesn't have biases and can use more types of data....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-20 DateLogicQA: Benchmarking Temporal Biases in Large Language Models.yaml b/current/2024-12-20 DateLogicQA: Benchmarking Temporal Biases in Large Language Models.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-20"
+author: Gagan Bhatia
+title: 'DateLogicQA: Benchmarking Temporal Biases in Large Language Models'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.13377
+summary: 'This paper presents DateLogicQA, a benchmark for testing how well large language models understand dates and time. They also introduce two types of biases that can affect how well these models work: Representation-Level Bias and Logical-Level Bias. The paper shows that these models can make mistakes when dealing with dates and time, and they provide a way to measure how well these models do. The code for their work is available on GitHub....'
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-12-20 Move-in-2D: 2D-Conditioned Human Motion Generation.yaml b/current/2024-12-20 Move-in-2D: 2D-Conditioned Human Motion Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-20"
+author: Hsin-Ping Huang
+title: 'Move-in-2D: 2D-Conditioned Human Motion Generation'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.13185
+summary: Generating realistic human videos remains a challenging task, with most effective methods currently relying on a human motion sequence as a control signal. Existing approaches often use existing motion extracted from other videos, which restricts applications to specific motion types and global scene matching. We propose Move-in-2D, a novel approach to generate human motion sequences conditioned on a scene image, allowing for diverse motion that adapts to different scenes. Our approach utilizes ...
+opinion: placeholder
+tags:
+    - ML
diff --git a/...onsistent Object Editing with Diffusion Models via Pixel Manipulation and Generation.yaml b/...onsistent Object Editing with Diffusion Models via Pixel Manipulation and Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-12-20"
+author: Liyao Jiang
+title: 'PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation'
+thumbnail: ""
+link: https://huggingface.co/papers/2412.14283
+summary: PixelMan is a new method for editing objects in images that doesn't require training and is faster than other methods....
+opinion: placeholder
+tags:
+    - ML