generated from codingpot/newsletter_awesome_articles
-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
de75a27
commit f479f3c
Showing
7 changed files
with
76 additions
and
0 deletions.
There are no files selected for viewing
10 changes: 10 additions & 0 deletions
10
...-03-11 CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
date: "2024-03-11" | ||
author: Zhengyi Wang | ||
title: 'CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model' | ||
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05034.png | ||
link: https://huggingface.co/papers/2403.05034 | ||
summary: The Convolutional Reconstruction Model (CRM) is a fast, high-fidelity feed-forward generative model that uses a convolutional U-Net and Flexicubes to create a high-resolution, textured 3D mesh from a single image in just 10 seconds, without any test-time optimization. The model leverages the strengths of convolutional layers for pixel-level alignment and strong bandwidth to generate a high-quality mesh from sparse 3D data.... | ||
opinion: placeholder | ||
tags: | ||
- Computer Vision | ||
- Deep Learning |
12 changes: 12 additions & 0 deletions
12
...t/2024-03-11 CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
date: "2024-03-11" | ||
author: Wendi Zheng | ||
title: 'CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion' | ||
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05121.png | ||
link: https://huggingface.co/papers/2403.05121 | ||
summary: The paper presents CogView3, a text-to-image generation framework that uses relay diffusion to create low-resolution images and then applies super-resolution. It increases computational efficiency and image detail refinement, outperforming the current state-of-the-art model by 77.0% in human evaluations while requiring less inference time.... | ||
opinion: placeholder | ||
tags: | ||
- Unsupervised Learning | ||
- Deep Learning | ||
- Natural Language Processing | ||
- Computer Vision |
11 changes: 11 additions & 0 deletions
11
current/2024-03-11 DeepSeek-VL: Towards Real-World Vision-Language Understanding.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
date: "2024-03-11" | ||
author: Haoyu Lu | ||
title: 'DeepSeek-VL: Towards Real-World Vision-Language Understanding' | ||
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05525.png | ||
link: https://huggingface.co/papers/2403.05525 | ||
summary: This paper introduces DeepSeek-VL, a new open-source Vision-Language Model designed for real-world applications. It has a diverse dataset, a use case taxonomy, and a hybrid vision encoder that processes high-resolution images efficiently. The model prioritizes strong language abilities and shows better user experiences as a chatbot, with state-of-the-art performance on visual-language benchmarks.... | ||
opinion: placeholder | ||
tags: | ||
- Deep Learning | ||
- Computer Vision | ||
- Natural Language Processing |
12 changes: 12 additions & 0 deletions
12
...ent/2024-03-11 ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
date: "2024-03-11" | ||
author: Xiwei Hu | ||
title: 'ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment' | ||
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05135.png | ||
link: https://huggingface.co/papers/2403.05135 | ||
summary: This paper proposes ELLA, a method for improving the ability of text-to-image diffusion models to understand complex and lengthy prompts by using powerful large language models. The paper also introduces a new benchmark for evaluating dense prompt following called DPG-Bench and demonstrates the effectiveness of ELLA through various experiments.... | ||
opinion: placeholder | ||
tags: | ||
- Supervised Learning | ||
- Deep Learning | ||
- Natural Language Processing | ||
- Computer Vision |
12 changes: 12 additions & 0 deletions
12
... Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
date: "2024-03-11" | ||
author: Machel Reid | ||
title: 'Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context' | ||
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05530.png | ||
link: https://huggingface.co/papers/2403.05530 | ||
summary: Gemini 1.5 Pro is a multimodal model that can recall and reason over information from millions of tokens of context including multiple long documents, hours of video and audio. It achieves near-perfect recall on long-context retrieval tasks across modalities and improves on state-of-the-art performance in long-document QA, long-video QA, and long-context ASR. It also has the ability to translate English to Kalamang at a similar level to a person who learned from the same content.... | ||
opinion: placeholder | ||
tags: | ||
- Deep Learning | ||
- Natural Language Processing | ||
- Computer Vision | ||
- Speech Recognition and Synthesis |
9 changes: 9 additions & 0 deletions
9
...3-11 Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
date: "2024-03-11" | ||
author: Marco De Nadai | ||
title: Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks | ||
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05185.png | ||
link: https://huggingface.co/papers/2403.05185 | ||
summary: The paper introduces a new recommendation system, 2T-HGNN, for Spotify's audiobooks. The system uses Heterogeneous Graph Neural Networks (HGNNs) and a Two Tower (2T) model to recommend audiobooks to users based on their podcast and music preferences. It improves the quality of the recommendations and increases the number of new audiobooks started and streaming rates.... | ||
opinion: placeholder | ||
tags: | ||
- Recommender Systems |
10 changes: 10 additions & 0 deletions
10
...or: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
date: "2024-03-11" | ||
author: Yabo Zhang | ||
title: 'VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models' | ||
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05438.png | ||
link: https://huggingface.co/papers/2403.05438 | ||
summary: VideoElevator is a method that improves the quality of text-to-video diffusion models by using the strengths of text-to-image diffusion models. It enhances temporal consistency and adds more realistic details to the generated videos.... | ||
opinion: placeholder | ||
tags: | ||
- Deep Learning | ||
- Computer Vision |