Skip to content

Commit

Permalink
Automated report
Browse files Browse the repository at this point in the history
  • Loading branch information
deep-diver committed Dec 20, 2024
1 parent 11b5fc4 commit ec20f9a
Show file tree
Hide file tree
Showing 16 changed files with 144 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Zihan Liu
title: 'AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling'
thumbnail: ""
link: https://huggingface.co/papers/2412.15084
summary: This paper introduces AceMath, a suite of math models that are trained to solve complex problems and evaluate their solutions. It also introduces a reward model that can identify the correct solutions. The models are trained using a process called supervised fine-tuning and are evaluated on a benchmark called AceMath-RewardBench. The resulting models outperform existing models and can be accessed at a provided link....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Jixuan He
title: Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
thumbnail: ""
link: https://huggingface.co/papers/2412.14462
summary: This paper introduces a new way to insert objects into scenes by considering how well the object fits into the scene, called affordance-aware object insertion. They created a large dataset of objects and scenes to train a model called Mask-Aware Dual Diffusion (MADD) to do this. MADD is designed to consider both the colors of the scene and the shape of the object to make the insertion look natural. The model is able to place objects in new scenes and even do well on images from the internet....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Wang Zhao
title: 'DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation'
thumbnail: ""
link: https://huggingface.co/papers/2412.15200
summary: This paper introduces DI-PCG, a new and efficient method for generating high-quality 3D assets by using a lightweight diffusion transformer model to directly treat PCG parameters as the denoising target and the observed images as conditions to control parameter generation. DI-PCG is efficient and effective, requiring only 7.6M network parameters and 30 GPU hours to train, and demonstrates superior performance in recovering parameters accurately and generalizing well to in-the-wild images. The me...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Yanpeng Sun
title: Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
thumbnail: ""
link: https://huggingface.co/papers/2412.14233
summary: We propose to use visual specialists trained on annotated images to enhance image captions. Our approach, DCE, uses object attributes and relations to improve visual understanding tasks and reasoning. We will release the source code and pipeline to easily combine other visual specialists....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Qihao Liu
title: 'Flowing from Words to Pixels: A Framework for Cross-Modality Evolution'
thumbnail: ""
link: https://huggingface.co/papers/2412.15213
summary: This paper proposes a new framework called CrossFlow for cross-modal flow matching, which directly maps one modality to another without using noise distribution or conditioning mechanism. The framework uses Variational Encoders and Classifier-free guidance, and it outperforms standard flow matching for text-to-image generation and is on par with or outperforms state-of-the-art for image captioning, depth estimation, and image super-resolution....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Xuekai Zhu
title: How to Synthesize Text Data without Model Collapse?
thumbnail: ""
link: https://huggingface.co/papers/2412.14689
summary: This paper investigates the impact of synthetic data on language model training and proposes a method to synthesize data without causing model collapse. By pre-training language models with different proportions of synthetic data, the paper reveals a negative correlation between synthetic data and model performance. The paper also identifies distributional shift and over-concentration of n-gram features in synthetic data. To address these issues, the paper proposes token editing on human-produce...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Hanlin Wang
title: 'LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis'
thumbnail: ""
link: https://huggingface.co/papers/2412.15214
summary: This paper introduces a new method for controlling object trajectories in image-to-video synthesis by adding a depth dimension to drag-based interaction, allowing for more precise manipulation of object movements and broadening the scope of creativity....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Yushi Bai
title: 'LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks'
thumbnail: ""
link: https://huggingface.co/papers/2412.15204
summary: LongBench v2 is a benchmark that tests LLMs' ability to understand and reason in long contexts across various real-world tasks. It has 503 questions with contexts up to 2M words, and even the best model only gets 50.1% correct. However, a model with longer thinking time gets 57.7% correct, beating human experts....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Junjie Zhou
title: 'MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval'
thumbnail: ""
link: https://huggingface.co/papers/2412.14475
summary: This paper introduces MegaPairs, a method for creating large amounts of training data for multimodal retrieval. It uses vision language models and open-domain images to generate high-quality data, which allows a multimodal retriever to outperform a baseline model trained on 70 times more data from existing datasets. MegaPairs can be easily scaled up and has produced more than 26 million training instances. The paper also introduces several new models that achieve state-of-the-art zero-shot perfo...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Guanting Dong
title: Progressive Multimodal Reasoning via Active Retrieval
thumbnail: ""
link: https://huggingface.co/papers/2412.14835
summary: AR-MCTS is a framework that enhances the reasoning abilities of large language models by using Active Retrieval and Monte Carlo Tree Search to find key insights from a hybrid-modal retrieval corpus. This helps in improving the diversity and reliability of reasoning space, and it yields better performance on complex multimodal reasoning tasks....
opinion: placeholder
tags:
- ML
9 changes: 9 additions & 0 deletions current/2024-12-19 Qwen2.5 Technical Report.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Qwen
title: Qwen2.5 Technical Report
thumbnail: ""
link: https://huggingface.co/papers/2412.15115
summary: Qwen2.5 is a series of large language models designed to meet various needs, with significant improvements in pre-training and post-training stages. It includes different sizes and variants, such as base, instruction-tuned, quantized, and MoE (Mixture-of-Experts) models. Qwen2.5 has demonstrated top-tier performance on a wide range of language understanding and reasoning benchmarks, and is available through open-weight offerings and hosted solutions from Alibaba Cloud Model Studio....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Jiatong Li
title: 'TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation'
thumbnail: ""
link: https://huggingface.co/papers/2412.14642
summary: 'This paper introduces TOMG-Bench, a benchmark for evaluating the open-domain molecule generation capability of LLMs. It includes three tasks: molecule editing, molecule optimization, and customized molecule generation. The benchmark also includes an automated evaluation system and a new instruction tuning dataset called OpenMolIns. Llama3.1-8B performed better than other open-source general LLMs and even GPT-3.5-turbo on TOMG-Bench....'
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-19"
author: Enis Simsar
title: 'UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency'
thumbnail: ""
link: https://huggingface.co/papers/2412.15216
summary: We propose a new way to edit images without needing example images of the edits during training. Our method uses a process called Cycle Edit Consistency to make sure the edited images look good and follow instructions. This allows us to train on more types of data and makes our method better at making edits with high quality and precision. Our method is better than previous ones because it doesn't have biases and can use more types of data....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-20"
author: Gagan Bhatia
title: 'DateLogicQA: Benchmarking Temporal Biases in Large Language Models'
thumbnail: ""
link: https://huggingface.co/papers/2412.13377
summary: 'This paper presents DateLogicQA, a benchmark for testing how well large language models understand dates and time. They also introduce two types of biases that can affect how well these models work: Representation-Level Bias and Logical-Level Bias. The paper shows that these models can make mistakes when dealing with dates and time, and they provide a way to measure how well these models do. The code for their work is available on GitHub....'
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-20"
author: Hsin-Ping Huang
title: 'Move-in-2D: 2D-Conditioned Human Motion Generation'
thumbnail: ""
link: https://huggingface.co/papers/2412.13185
summary: Generating realistic human videos remains a challenging task, with most effective methods currently relying on a human motion sequence as a control signal. Existing approaches often use existing motion extracted from other videos, which restricts applications to specific motion types and global scene matching. We propose Move-in-2D, a novel approach to generate human motion sequences conditioned on a scene image, allowing for diverse motion that adapts to different scenes. Our approach utilizes ...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-20"
author: Liyao Jiang
title: 'PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation'
thumbnail: ""
link: https://huggingface.co/papers/2412.14283
summary: PixelMan is a new method for editing objects in images that doesn't require training and is faster than other methods....
opinion: placeholder
tags:
- ML

0 comments on commit ec20f9a

Please sign in to comment.