diff --git a/current/2024-03-11 CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.yaml b/current/2024-03-11 CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.yaml new file mode 100644 index 00000000..1dd6a950 --- /dev/null +++ b/current/2024-03-11 CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.yaml @@ -0,0 +1,10 @@ +date: "2024-03-11" +author: Zhengyi Wang +title: 'CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model' +thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05034.png +link: https://huggingface.co/papers/2403.05034 +summary: The Convolutional Reconstruction Model (CRM) is a fast, high-fidelity feed-forward generative model that uses a convolutional U-Net and Flexicubes to create a high-resolution, textured 3D mesh from a single image in just 10 seconds, without any test-time optimization. The model leverages the strengths of convolutional layers for pixel-level alignment and strong bandwidth to generate a high-quality mesh from sparse 3D data.... +opinion: placeholder +tags: + - Computer Vision + - Deep Learning diff --git a/current/2024-03-11 CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion.yaml b/current/2024-03-11 CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion.yaml new file mode 100644 index 00000000..d0e0a00d --- /dev/null +++ b/current/2024-03-11 CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion.yaml @@ -0,0 +1,12 @@ +date: "2024-03-11" +author: Wendi Zheng +title: 'CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion' +thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05121.png +link: https://huggingface.co/papers/2403.05121 +summary: The paper presents CogView3, a text-to-image generation framework that uses relay diffusion to create low-resolution images and then applies super-resolution. It increases computational efficiency and image detail refinement, outperforming the current state-of-the-art model by 77.0% in human evaluations while requiring less inference time.... +opinion: placeholder +tags: + - Unsupervised Learning + - Deep Learning + - Natural Language Processing + - Computer Vision diff --git a/current/2024-03-11 DeepSeek-VL: Towards Real-World Vision-Language Understanding.yaml b/current/2024-03-11 DeepSeek-VL: Towards Real-World Vision-Language Understanding.yaml new file mode 100644 index 00000000..f9e03228 --- /dev/null +++ b/current/2024-03-11 DeepSeek-VL: Towards Real-World Vision-Language Understanding.yaml @@ -0,0 +1,11 @@ +date: "2024-03-11" +author: Haoyu Lu +title: 'DeepSeek-VL: Towards Real-World Vision-Language Understanding' +thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05525.png +link: https://huggingface.co/papers/2403.05525 +summary: This paper introduces DeepSeek-VL, a new open-source Vision-Language Model designed for real-world applications. It has a diverse dataset, a use case taxonomy, and a hybrid vision encoder that processes high-resolution images efficiently. The model prioritizes strong language abilities and shows better user experiences as a chatbot, with state-of-the-art performance on visual-language benchmarks.... +opinion: placeholder +tags: + - Deep Learning + - Computer Vision + - Natural Language Processing diff --git a/current/2024-03-11 ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment.yaml b/current/2024-03-11 ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment.yaml new file mode 100644 index 00000000..6326526b --- /dev/null +++ b/current/2024-03-11 ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment.yaml @@ -0,0 +1,12 @@ +date: "2024-03-11" +author: Xiwei Hu +title: 'ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment' +thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05135.png +link: https://huggingface.co/papers/2403.05135 +summary: This paper proposes ELLA, a method for improving the ability of text-to-image diffusion models to understand complex and lengthy prompts by using powerful large language models. The paper also introduces a new benchmark for evaluating dense prompt following called DPG-Bench and demonstrates the effectiveness of ELLA through various experiments.... +opinion: placeholder +tags: + - Supervised Learning + - Deep Learning + - Natural Language Processing + - Computer Vision diff --git a/current/2024-03-11 Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.yaml b/current/2024-03-11 Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.yaml new file mode 100644 index 00000000..7ed37396 --- /dev/null +++ b/current/2024-03-11 Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.yaml @@ -0,0 +1,12 @@ +date: "2024-03-11" +author: Machel Reid +title: 'Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context' +thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05530.png +link: https://huggingface.co/papers/2403.05530 +summary: Gemini 1.5 Pro is a multimodal model that can recall and reason over information from millions of tokens of context including multiple long documents, hours of video and audio. It achieves near-perfect recall on long-context retrieval tasks across modalities and improves on state-of-the-art performance in long-document QA, long-video QA, and long-context ASR. It also has the ability to translate English to Kalamang at a similar level to a person who learned from the same content.... +opinion: placeholder +tags: + - Deep Learning + - Natural Language Processing + - Computer Vision + - Speech Recognition and Synthesis diff --git a/current/2024-03-11 Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks.yaml b/current/2024-03-11 Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks.yaml new file mode 100644 index 00000000..0ffef1f5 --- /dev/null +++ b/current/2024-03-11 Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks.yaml @@ -0,0 +1,9 @@ +date: "2024-03-11" +author: Marco De Nadai +title: Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks +thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05185.png +link: https://huggingface.co/papers/2403.05185 +summary: The paper introduces a new recommendation system, 2T-HGNN, for Spotify's audiobooks. The system uses Heterogeneous Graph Neural Networks (HGNNs) and a Two Tower (2T) model to recommend audiobooks to users based on their podcast and music preferences. It improves the quality of the recommendations and increases the number of new audiobooks started and streaming rates.... +opinion: placeholder +tags: + - Recommender Systems diff --git a/current/2024-03-11 VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models.yaml b/current/2024-03-11 VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models.yaml new file mode 100644 index 00000000..bf8ed778 --- /dev/null +++ b/current/2024-03-11 VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models.yaml @@ -0,0 +1,10 @@ +date: "2024-03-11" +author: Yabo Zhang +title: 'VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models' +thumbnail: https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2403.05438.png +link: https://huggingface.co/papers/2403.05438 +summary: VideoElevator is a method that improves the quality of text-to-video diffusion models by using the strengths of text-to-image diffusion models. It enhances temporal consistency and adds more realistic details to the generated videos.... +opinion: placeholder +tags: + - Deep Learning + - Computer Vision