Skip to content

Commit

Permalink
Automated report
Browse files Browse the repository at this point in the history
  • Loading branch information
deep-diver committed Nov 18, 2024
1 parent 0591c1b commit 956b4da
Show file tree
Hide file tree
Showing 6 changed files with 54 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-11-18"
author: Yushi Lan
title: 'GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation'
thumbnail: ""
link: https://huggingface.co/papers/2411.08033
summary: While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder (VAE) with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space ...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-11-18"
author: Guowei Xu
title: 'LLaVA-o1: Let Vision Language Models Reason Step-by-Step'
thumbnail: ""
link: https://huggingface.co/papers/2411.10440
summary: LLaVA-o1 is a new Vision-Language Model that can perform step-by-step reasoning on complex tasks. It uses a structured approach and a smaller dataset than other models, but still outperforms them on multimodal reasoning tasks. It also uses a new inference-time scaling method that helps it reason better....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-11-18"
author: Yongliang Wu
title: 'Number it: Temporal Grounding Videos like Flipping Manga'
thumbnail: ""
link: https://huggingface.co/papers/2411.10332
summary: In this paper, we introduce a new method called Number-Prompt (NumPro) that helps Vid-LLMs understand video content better by adding unique numbers to each frame. This makes it easier for Vid-LLMs to find specific moments in the video, and our experiments show that it improves performance for video temporal grounding tasks....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-11-18"
author: Zhennan Chen
title: Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement
thumbnail: ""
link: https://huggingface.co/papers/2411.06558
summary: This paper presents RAG, a method for generating images based on regional descriptions. It decouples multi-region generation into two sub-tasks and allows users to modify specific unsatisfied regions in the last generation without relying on additional inpainting models. RAG is tuning-free and can be applied to other frameworks as an enhancement to the prompt following property....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-11-18"
author: Siyuan Hu
title: 'The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use'
thumbnail: ""
link: https://huggingface.co/papers/2411.10323
summary: A new AI model, Claude 3.5 Computer Use, is the first to offer computer use through a graphical user interface (GUI) in public beta. This case study explores its capabilities and limitations by designing tasks and providing an agent framework for deployment. The study aims to showcase its abilities and inspire future research into GUI agents....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-11-18"
author: Wang Qun
title: 'Xmodel-1.5: An 1B-scale Multilingual LLM'
thumbnail: ""
link: https://huggingface.co/papers/2411.10083
summary: Xmodel-1.5 is a new, big computer program that can understand and speak many languages, including Thai, Arabic, French, Chinese, and English. It's really good at understanding and speaking these languages, and it can help with lots of different tasks like answering questions. The people who made it also made a new test for Thai language understanding. They want to keep making it better and hope it helps with research about understanding different languages....
opinion: placeholder
tags:
- ML

0 comments on commit 956b4da

Please sign in to comment.