Skip to content

Commit

Permalink
Automated report
Browse files Browse the repository at this point in the history
  • Loading branch information
deep-diver committed Dec 9, 2024
1 parent c7c11af commit 1d9bed9
Show file tree
Hide file tree
Showing 14 changed files with 126 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-08"
author: Wanting Zhang
title: '2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction'
thumbnail: ""
link: https://huggingface.co/papers/2412.03428
summary: The paper proposes a new method called 2DGS-Room that uses 2D Gaussian Splatting to create detailed indoor scenes. It uses seed points to control the distribution of the Gaussians and adds depth and normal information to make the scenes more accurate. The method also uses multi-view consistency to reduce artifacts and improve the overall quality of the reconstructions. Experiments show that this method performs better than existing ones for indoor scene reconstruction....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-08"
author: Hanqing Zhu
title: 'APOLLO: SGD-like Memory, AdamW-level Performance'
thumbnail: ""
link: https://huggingface.co/papers/2412.05270
summary: APOLLO is a new optimizer that achieves AdamW-level performance while using less memory, allowing for faster training and larger batch sizes on less powerful GPUs. It does this by approximating learning rate scaling using an auxiliary low-rank optimizer state based on random projection, which makes it highly tolerant to further memory reductions. The APOLLO series performs on-par with or better than AdamW, and provides significant system-level benefits, including enhanced throughput, improved mo...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-08"
author: LG AI Research
title: 'EXAONE 3.5: Series of Large Language Models for Real-world Use Cases'
thumbnail: ""
link: https://huggingface.co/papers/2412.04862
summary: EXAONE 3.5 is a series of large language models designed for real-world applications. They have exceptional instruction following capabilities, outstanding long-context comprehension, and competitive results compared to other models of similar sizes. They're available for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE. For commercial use, contact LG AI Research at [email protected]....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-08"
author: Zhe Chen
title: Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
thumbnail: ""
link: https://huggingface.co/papers/2412.05271
summary: InternVL 2.5 is an advanced multimodal large language model that improves upon InternVL 2.0 by scaling up model, data, and test-time configurations. It performs well on various benchmarks and is the first open-source model to surpass 70% on the MMMU benchmark. The model is available for the open-source community to use and build upon....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-08"
author: Kaiyi Huang
title: 'GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration'
thumbnail: ""
link: https://huggingface.co/papers/2412.04440
summary: 'GenMAC is a text-to-video generation model that uses multiple agents to collaborate and create complex scenes based on text prompts. It has four stages: design, generation, and redesign, with an iterative loop between the last two to correct and improve the generated video. The redesign stage uses four agents to verify the video, suggest corrections, and redesign the text prompt and layout. GenMAC also has a self-routing mechanism to choose the best correction agent for each scenario....'
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-08"
author: Yibin Wang
title: 'LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment'
thumbnail: ""
link: https://huggingface.co/papers/2412.04814
summary: The paper proposes a new way to improve text-to-video models by using human feedback. They created a dataset of human ratings and used it to train a model that can predict how well a video matches a text description. They then used this model to improve the text-to-video model by making it more likely to generate videos that match the text description....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-08"
author: Jarvis Guo
title: 'MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale'
thumbnail: ""
link: https://huggingface.co/papers/2412.05237
summary: This paper introduces a new method to create a large-scale multimodal instruction-tuning dataset with detailed rationales, which improves the reasoning capabilities of open-source multimodal large language models. The dataset is created using only open models and contains 12M instruction-response pairs. Experiments show that training MLLMs on this dataset improves performance on reasoning-intensive tasks and even non-reasoning-based benchmarks....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-08"
author: Yi Chen
title: 'Moto: Latent Motion Token as the Bridging Language for Robot Manipulation'
thumbnail: ""
link: https://huggingface.co/papers/2412.04445
summary: This paper proposes Moto, a method to use video data to improve robot learning by converting video content into latent Motion Token sequences and pre-training a model called Moto-GPT. The paper also introduces a co-fine-tuning strategy to transfer learned motion priors to real robot actions, and experiments show that the fine-tuned Moto-GPT is effective in improving robot manipulation tasks....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-08"
author: Trong-Tung Nguyen
title: 'SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion'
thumbnail: ""
link: https://huggingface.co/papers/2412.04301
summary: This paper introduces SwiftEdit, a fast and efficient tool for text-guided image editing. It uses a one-step inversion framework and a mask-guided editing technique to achieve instant results, which is at least 50 times faster than previous methods while maintaining similar editing quality. The project page is at https://swift-edit.github.io/....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-09"
author: Xiaohui Chen
title: 'CompCap: Improving Multimodal Large Language Models with Composite Captions'
thumbnail: ""
link: https://huggingface.co/papers/2412.05243
summary: A new framework called CompCap is introduced to improve the understanding of composite images by multimodal large language models. This is done by creating a dataset of 118K image-caption pairs and fine-tuning the models with this data, leading to improved performance on various benchmarks....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-09"
author: Minzheng Wang
title: 'DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling'
thumbnail: ""
link: https://huggingface.co/papers/2412.04905
summary: The paper introduces a new research task called Dialogue Element MOdeling and a benchmark called DEMO to improve dialogue generation and assessment. They also build an agent that can model dialogue elements. Experiments show that existing language models can be improved and their agent performs well in both in-domain and out-of-domain tasks....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-09"
author: Ziyi Wu
title: 'Mind the Time: Temporally-Controlled Multi-Event Video Generation'
thumbnail: ""
link: https://huggingface.co/papers/2412.05263
summary: The paper introduces MinT, a video generator that can create sequences of events with precise timing control. It binds each event to a specific time period and uses a time-based positional encoding method to guide the cross-attention operation. This results in coherent videos with smoothly connected events, and it's the first model in the literature to offer control over event timing. MinT outperforms existing open-source models by a large margin....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-09"
author: Jixuan Fan
title: 'Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction'
thumbnail: ""
link: https://huggingface.co/papers/2412.04887
summary: Momentum-GS is a new method that improves the accuracy of large-scale scene reconstruction by using a moving average of a teacher Gaussian decoder to provide guidance to each block during training. It also dynamically adjusts the weight of each block based on its accuracy. This method outperforms existing techniques by 12.8% in LPIPS and requires fewer divided blocks....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-09"
author: Avinash Paliwal
title: 'PanoDreamer: 3D Panorama Synthesis from a Single Image'
thumbnail: ""
link: https://huggingface.co/papers/2412.04827
summary: PanoDreamer is a new way to make a full 3D scene from one picture, by first guessing what the whole panorama and its depth would be, then filling in the missing parts, and finally turning it into 3D. This is different from other methods that create the scene bit by bit, and PanoDreamer does a better job at making the scene look right and feel whole....
opinion: placeholder
tags:
- ML

0 comments on commit 1d9bed9

Please sign in to comment.