Skip to content

Commit

Permalink
Automated report
Browse files Browse the repository at this point in the history
  • Loading branch information
deep-diver committed Mar 11, 2024
1 parent de75a27 commit f479f3c
Show file tree
Hide file tree
Showing 7 changed files with 76 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
date: "2024-03-11"
author: Zhengyi Wang
title: 'CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model'
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05034.png
link: https://huggingface.co/papers/2403.05034
summary: The Convolutional Reconstruction Model (CRM) is a fast, high-fidelity feed-forward generative model that uses a convolutional U-Net and Flexicubes to create a high-resolution, textured 3D mesh from a single image in just 10 seconds, without any test-time optimization. The model leverages the strengths of convolutional layers for pixel-level alignment and strong bandwidth to generate a high-quality mesh from sparse 3D data....
opinion: placeholder
tags:
- Computer Vision
- Deep Learning
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
date: "2024-03-11"
author: Wendi Zheng
title: 'CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion'
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05121.png
link: https://huggingface.co/papers/2403.05121
summary: The paper presents CogView3, a text-to-image generation framework that uses relay diffusion to create low-resolution images and then applies super-resolution. It increases computational efficiency and image detail refinement, outperforming the current state-of-the-art model by 77.0% in human evaluations while requiring less inference time....
opinion: placeholder
tags:
- Unsupervised Learning
- Deep Learning
- Natural Language Processing
- Computer Vision
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
date: "2024-03-11"
author: Haoyu Lu
title: 'DeepSeek-VL: Towards Real-World Vision-Language Understanding'
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05525.png
link: https://huggingface.co/papers/2403.05525
summary: This paper introduces DeepSeek-VL, a new open-source Vision-Language Model designed for real-world applications. It has a diverse dataset, a use case taxonomy, and a hybrid vision encoder that processes high-resolution images efficiently. The model prioritizes strong language abilities and shows better user experiences as a chatbot, with state-of-the-art performance on visual-language benchmarks....
opinion: placeholder
tags:
- Deep Learning
- Computer Vision
- Natural Language Processing
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
date: "2024-03-11"
author: Xiwei Hu
title: 'ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment'
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05135.png
link: https://huggingface.co/papers/2403.05135
summary: This paper proposes ELLA, a method for improving the ability of text-to-image diffusion models to understand complex and lengthy prompts by using powerful large language models. The paper also introduces a new benchmark for evaluating dense prompt following called DPG-Bench and demonstrates the effectiveness of ELLA through various experiments....
opinion: placeholder
tags:
- Supervised Learning
- Deep Learning
- Natural Language Processing
- Computer Vision
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
date: "2024-03-11"
author: Machel Reid
title: 'Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context'
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05530.png
link: https://huggingface.co/papers/2403.05530
summary: Gemini 1.5 Pro is a multimodal model that can recall and reason over information from millions of tokens of context including multiple long documents, hours of video and audio. It achieves near-perfect recall on long-context retrieval tasks across modalities and improves on state-of-the-art performance in long-document QA, long-video QA, and long-context ASR. It also has the ability to translate English to Kalamang at a similar level to a person who learned from the same content....
opinion: placeholder
tags:
- Deep Learning
- Natural Language Processing
- Computer Vision
- Speech Recognition and Synthesis
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-03-11"
author: Marco De Nadai
title: Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05185.png
link: https://huggingface.co/papers/2403.05185
summary: The paper introduces a new recommendation system, 2T-HGNN, for Spotify's audiobooks. The system uses Heterogeneous Graph Neural Networks (HGNNs) and a Two Tower (2T) model to recommend audiobooks to users based on their podcast and music preferences. It improves the quality of the recommendations and increases the number of new audiobooks started and streaming rates....
opinion: placeholder
tags:
- Recommender Systems
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
date: "2024-03-11"
author: Yabo Zhang
title: 'VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models'
thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05438.png
link: https://huggingface.co/papers/2403.05438
summary: VideoElevator is a method that improves the quality of text-to-video diffusion models by using the strengths of text-to-image diffusion models. It enhances temporal consistency and adds more realistic details to the generated videos....
opinion: placeholder
tags:
- Deep Learning
- Computer Vision

0 comments on commit f479f3c

Please sign in to comment.