generated from codingpot/newsletter_awesome_articles
-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0591c1b
commit 956b4da
Showing
6 changed files
with
54 additions
and
0 deletions.
There are no files selected for viewing
9 changes: 9 additions & 0 deletions
9
...4-11-18 GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-11-18" | ||
author: Yushi Lan | ||
title: 'GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2411.08033 | ||
summary: While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder (VAE) with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space ... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-11-18 LLaVA-o1: Let Vision Language Models Reason Step-by-Step.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-11-18" | ||
author: Guowei Xu | ||
title: 'LLaVA-o1: Let Vision Language Models Reason Step-by-Step' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2411.10440 | ||
summary: LLaVA-o1 is a new Vision-Language Model that can perform step-by-step reasoning on complex tasks. It uses a structured approach and a smaller dataset than other models, but still outperforms them on multimodal reasoning tasks. It also uses a new inference-time scaling method that helps it reason better.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-11-18 Number it: Temporal Grounding Videos like Flipping Manga.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-11-18" | ||
author: Yongliang Wu | ||
title: 'Number it: Temporal Grounding Videos like Flipping Manga' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2411.10332 | ||
summary: In this paper, we introduce a new method called Number-Prompt (NumPro) that helps Vid-LLMs understand video content better by adding unique numbers to each frame. This makes it easier for Vid-LLMs to find specific moments in the video, and our experiments show that it improves performance for video temporal grounding tasks.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...024-11-18 Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-11-18" | ||
author: Zhennan Chen | ||
title: Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2411.06558 | ||
summary: This paper presents RAG, a method for generating images based on regional descriptions. It decouples multi-region generation into two sub-tasks and allows users to modify specific unsatisfied regions in the last generation without relying on additional inpainting models. RAG is tuning-free and can be applied to other frameworks as an enhancement to the prompt following property.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
...4-11-18 The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-11-18" | ||
author: Siyuan Hu | ||
title: 'The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2411.10323 | ||
summary: A new AI model, Claude 3.5 Computer Use, is the first to offer computer use through a graphical user interface (GUI) in public beta. This case study explores its capabilities and limitations by designing tasks and providing an agent framework for deployment. The study aims to showcase its abilities and inspire future research into GUI agents.... | ||
opinion: placeholder | ||
tags: | ||
- ML |
9 changes: 9 additions & 0 deletions
9
current/2024-11-18 Xmodel-1.5: An 1B-scale Multilingual LLM.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-11-18" | ||
author: Wang Qun | ||
title: 'Xmodel-1.5: An 1B-scale Multilingual LLM' | ||
thumbnail: "" | ||
link: https://huggingface.co/papers/2411.10083 | ||
summary: Xmodel-1.5 is a new, big computer program that can understand and speak many languages, including Thai, Arabic, French, Chinese, and English. It's really good at understanding and speaking these languages, and it can help with lots of different tasks like answering questions. The people who made it also made a new test for Thai language understanding. They want to keep making it better and hope it helps with research about understanding different languages.... | ||
opinion: placeholder | ||
tags: | ||
- ML |