Skip to content

Commit

Permalink
Automated report
Browse files Browse the repository at this point in the history
  • Loading branch information
deep-diver committed Dec 5, 2024
1 parent f503cfe commit ac55c11
Show file tree
Hide file tree
Showing 18 changed files with 162 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Shengyuan Zhang
title: Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
thumbnail: ""
link: https://huggingface.co/papers/2412.03515
summary: This paper presents a new method called ScoreLiDAR that improves the speed and quality of 3D LiDAR scene completion models used in autonomous vehicles. It reduces completion time by more than 5 times on SemanticKITTI and outperforms existing models....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Jing Tan
title: 'Imagine360: Immersive 360 Video Generation from Perspective Anchor'
thumbnail: ""
link: https://huggingface.co/papers/2412.03552
summary: Imagine360 is a new framework that converts standard perspective videos into 360-degree equirectangular videos with rich and diverse motion patterns. It uses dual-branch design, antipodal mask, and elevation-aware designs to capture long-range motion dependencies and handle diverse perspective video inputs. Imagine360 outperforms other 360-degree video generation methods in terms of graphics quality and motion coherence....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Zehuan Huang
title: 'MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation'
thumbnail: ""
link: https://huggingface.co/papers/2412.03558
summary: This paper presents MIDI, a new approach for creating 3D scenes from a single image. Unlike other methods that require multiple steps or complex processes, MIDI uses a multi-instance attention mechanism to capture interactions and spatial coherence between objects. The method models object completion during 3D generation and supervises interactions between 3D instances using limited scene-level data. MIDI achieves state-of-the-art performance in image-to-scene generation....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Shuai Tan
title: 'Mimir: Improving Video Diffusion Models for Precise Text Understanding'
thumbnail: ""
link: https://huggingface.co/papers/2412.03085
summary: Mimir is a new framework for generating videos from text descriptions that improves the precision of text understanding by combining the strengths of video diffusion models and large language models. It does this by using a special token fuser to blend the features from both types of models, allowing the video generation model to benefit from the learned video priors and the text-related capabilities of the language model. This results in higher quality videos with better text comprehension, esp...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Jun Xiang
title: 'One Shot, One Talk: Whole-body Talking Avatar from a Single Image'
thumbnail: ""
link: https://huggingface.co/papers/2412.01106
summary: This paper introduces a novel pipeline to create a whole-body talking avatar from a single image by addressing complex dynamic modeling and generalization to novel gestures and expressions. They use pose-guided image-to-video diffusion models to generate imperfect video frames as pseudo-labels and a 3DGS-mesh hybrid avatar representation to overcome dynamic modeling challenges. Experiments show that their method produces photorealistic, animatable, and expressive avatars from a single image....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Liao Qu
title: 'TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation'
thumbnail: ""
link: https://huggingface.co/papers/2412.03069
summary: TokenFlow is a new way to turn images into smaller pieces of information that can be used for understanding and creating things. It helps with finding the right words and pictures when you're trying to understand or make something new. It's better than other ways because it can handle both big ideas and small details at the same time. And it can even be better at understanding things than some big computer systems....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Nick Stracke
title: 'CleanDIFT: Diffusion Features without Noise'
thumbnail: ""
link: https://huggingface.co/papers/2412.03439
summary: We present a method to fine-tune diffusion models to produce high-quality, noise-free semantic features for various tasks, outperforming previous methods and ensemble-based approaches at a lower cost....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Wujian Peng
title: 'Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning'
thumbnail: ""
link: https://huggingface.co/papers/2412.03565
summary: This paper proposes a method called Inst-IT to improve the understanding of specific elements in images and videos by Large Multimodal Models. It uses explicit visual prompts and instruction tuning, and the results show that it improves the models' performance on various image and video understanding tasks....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Xiaoyan Xing
title: 'LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting'
thumbnail: ""
link: https://huggingface.co/papers/2412.00177
summary: LumiNet is a new method that uses generative models and hidden properties of light to change the lighting in a scene. It uses a StyleGAN-based model to create a new image that has the lighting of a target image but keeps the shape and color of the original image. This is done by processing hidden properties from both images and using a learned adaptor to transfer the lighting. LumiNet is better than other methods at transferring complex lighting effects like specular highlights and indirect illu...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Lingen Li
title: 'NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images'
thumbnail: ""
link: https://huggingface.co/papers/2412.03517
summary: We present NVComposer, a new method for creating realistic images from different viewpoints using multiple sparse and unposed images. Our approach doesn't require external alignment and improves the flexibility and accessibility of existing methods. NVComposer uses a dual-stream diffusion model and a geometry-aware feature alignment module to generate target novel views and condition camera poses. The approach achieves state-of-the-art performance in generative multi-view NVS tasks and shows imp...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Dar-Yen Chen
title: 'NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training'
thumbnail: ""
link: https://huggingface.co/papers/2412.02030
summary: NitroFusion, a new method for creating high-quality images in a single step, uses a group of specialized judges to provide feedback on different aspects of the image, such as composition, color, and technique. This helps to improve the quality of the generated images and allows users to choose the number of steps for the best balance of quality and speed....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Andreas Steiner
title: 'PaliGemma 2: A Family of Versatile VLMs for Transfer'
thumbnail: ""
link: https://huggingface.co/papers/2412.03555
summary: PaliGemma 2 is an improved version of the PaliGemma open Vision-Language Model (VLM) that uses the Gemma 2 family of language models and the SigLIP-So400m vision encoder. The models are trained at different resolutions to improve knowledge transfer. The family of models allows for analysis of factors impacting transfer performance and includes more and broader transfer tasks than PaliGemma, achieving state-of-the-art results in some tasks....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Viet Nguyen
title: 'SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance'
thumbnail: ""
link: https://huggingface.co/papers/2412.02687
summary: This paper introduces SNOOPI, a new framework that improves training stability and adds support for negative prompt guidance in one-step diffusion models. It uses a random-scale classifier-free guidance approach and a training-free method called Negative-Away Steer Attention (NASA). SNOOPI significantly improves baseline models across various metrics and sets a new state-of-the-art benchmark for one-step diffusion models with an HPSv2 score of 31.08....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Alex Havrilla
title: Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models
thumbnail: ""
link: https://huggingface.co/papers/2412.02980
summary: The paper investigates the impact of quality, diversity, and complexity in synthetic data generated by Large Language Models. It finds that quality is crucial for in-distribution generalization, diversity is essential for out-of-distribution generalization, and complexity is beneficial for both. The paper also highlights the trade-offs between these characteristics and their influence on model performance. Additionally, it discusses the importance of balancing these trade-offs for efficient rein...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Konstantin Chernyshev
title: 'U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs'
thumbnail: ""
link: https://huggingface.co/papers/2412.03205
summary: The abstract introduces U-MATH, a new benchmark for evaluating mathematical skills in language models, which consists of 1,100 unpublished open-ended university-level problems. The benchmark is balanced across six core subjects and includes 20% multimodal problems. The evaluation of various language models on U-MATH reveals that they struggle with both text-based and visual problems, achieving a maximum accuracy of only 63% and 45% respectively. The solution assessment is also challenging for la...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Jeongho Ju
title: 'VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models'
thumbnail: ""
link: https://huggingface.co/papers/2411.19103
summary: VARCO-VISION is an open-source Korean-English vision-language model that can understand and generate images and text in both languages. It performs well in various tasks and is available for researchers to use....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Duo Zheng
title: 'Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding'
thumbnail: ""
link: https://huggingface.co/papers/2412.00493
summary: The paper introduces a new model called Video-3D LLM that improves 3D scene understanding by treating 3D scenes as videos and adding 3D position information to the model's representations. The model also uses a technique called maximum coverage sampling to balance computational costs and performance. The authors show that their model outperforms other models on several 3D scene understanding tasks....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-05"
author: Ziyi Yang
title: Weighted-Reward Preference Optimization for Implicit Model Fusion
thumbnail: ""
link: https://huggingface.co/papers/2412.03187
summary: This paper proposes a method called Weighted-Reward Preference Optimization (WRPO) for combining different language models without needing to align their words or combine their numbers. It can improve the performance of the combined model and even beat other powerful models on some tests....
opinion: placeholder
tags:
- ML

0 comments on commit ac55c11

Please sign in to comment.