Automated report

deep-diver · Nov 22, 2024 · c7143a8 · c7143a8
1 parent bae5dc6
commit c7143a8
Show file tree

Hide file tree

Showing 14 changed files with 126 additions and 0 deletions.
diff --git a/...ng into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation.yaml b/...ng into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Yuanhao Cai
+title: Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation
+thumbnail: ""
+link: https://huggingface.co/papers/2411.14384
+summary: This paper introduces a new method called DiffusionGS for creating 3D images from 2D ones. It's faster and produces better results compared to other methods, and it can handle different view directions and object-centric inputs. The authors also developed a training strategy to improve the model's ability to generalize to different scenes and objects....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...-22 Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models.yaml b/...-22 Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Javier Ferrando
+title: Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
+thumbnail: ""
+link: https://huggingface.co/papers/2411.14257
+summary: We use sparse autoencoders to understand why large language models hallucinate and find that they have internal representations about their own capabilities in recognizing entities. These representations can steer the model to refuse to answer questions about known entities or to hallucinate attributes of unknown entities. They also have a causal effect on the chat model's refusal behavior, suggesting that chat finetuning has repurposed this existing mechanism. We explore the mechanistic role of...
+opinion: placeholder
+tags:
+    - ML
diff --git a/...soning Ability of Multimodal Large Language Models via Mixed Preference Optimization.yaml b/...soning Ability of Multimodal Large Language Models via Mixed Preference Optimization.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Weiyun Wang
+title: Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
+thumbnail: ""
+link: https://huggingface.co/papers/2411.10442
+summary: Researchers have developed a new method called Mixed Preference Optimization (MPO) to improve the reasoning abilities of multimodal large language models (MLLMs). They created a large dataset for multimodal reasoning and used it along with MPO to enhance the performance of MLLMs, particularly in chain-of-thought tasks. The new model, InternVL2-8B-MPO, outperforms previous models and shows comparable performance to larger models....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-11-22 Hymba: A Hybrid-head Architecture for Small Language Models.yaml b/current/2024-11-22 Hymba: A Hybrid-head Architecture for Small Language Models.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Xin Dong
+title: 'Hymba: A Hybrid-head Architecture for Small Language Models'
+thumbnail: ""
+link: https://huggingface.co/papers/2411.13676
+summary: Hymba is a new type of small language model that combines different ways of understanding language to be more efficient. It uses something called attention to remember important information and stores it in special tokens. This makes it faster and uses less memory than other models, and it does a better job at understanding language too....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...sight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models.yaml b/...sight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Yuhao Dong
+title: 'Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models'
+thumbnail: ""
+link: https://huggingface.co/papers/2411.14432
+summary: Insight-V is a new method that uses a multi-agent system to improve the reasoning abilities of large language models in vision-language tasks. It creates long and diverse reasoning paths and uses a summary agent to judge and summarize the results. This leads to better performance on multi-modal benchmarks that require visual reasoning....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control.yaml b/...: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Ruiyuan Gao
+title: 'MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control'
+thumbnail: ""
+link: https://huggingface.co/papers/2411.13807
+summary: MagicDriveDiT is a new method for creating long, high-resolution videos for self-driving cars. It uses a special kind of model called DiT and makes it work better by improving how it learns and using special controls to make the videos look better. The new method makes better videos than other methods and can be used for many different tasks in self-driving cars....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-11-22 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions.yaml b/current/2024-11-22 Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Yu Zhao
+title: 'Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions'
+thumbnail: ""
+link: https://huggingface.co/papers/2411.14405
+summary: Marco-o1 is a model that uses fine-tuning, MCTS, reflection mechanisms, and innovative reasoning strategies to solve complex real-world problems. It focuses on finding open-ended resolutions in various domains where clear standards are absent and rewards are challenging to quantify....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-11-22 Multimodal Autoregressive Pre-training of Large Vision Encoders.yaml b/current/2024-11-22 Multimodal Autoregressive Pre-training of Large Vision Encoders.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Enrico Fini
+title: Multimodal Autoregressive Pre-training of Large Vision Encoders
+thumbnail: ""
+link: https://huggingface.co/papers/2411.14402
+summary: This paper presents AIMV2, a family of generalist vision encoders that excel in various downstream tasks, including multimodal evaluations and vision benchmarks. The encoders are characterized by a straightforward pre-training process, scalability, and remarkable performance. AIMV2-3B encoder achieves 89.5% accuracy on ImageNet-1k with a frozen trunk and outperforms state-of-the-art contrastive models in multimodal image understanding across diverse settings....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-11-22 Natural Language Reinforcement Learning.yaml b/current/2024-11-22 Natural Language Reinforcement Learning.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Xidong Feng
+title: Natural Language Reinforcement Learning
+thumbnail: ""
+link: https://huggingface.co/papers/2411.14251
+summary: This paper proposes a new approach called Natural Language Reinforcement Learning (NLRL) that uses natural language to represent decision-making problems and solve them using large language models (LLMs). The authors demonstrate the effectiveness and efficiency of their approach through experiments on various games....
+opinion: placeholder
+tags:
+    - ML
diff --git a/...4-11-22 OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs.yaml b/...4-11-22 OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Akari Asai
+title: 'OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs'
+thumbnail: ""
+link: https://huggingface.co/papers/2411.14199
+summary: OpenScholar is a specialized retrieval-augmented LM that assists scientists in synthesizing scientific literature by identifying relevant passages from 45 million open-access papers and providing citation-backed responses. It outperforms GPT-4o and PaperQA2 in correctness and citation accuracy, and experts prefer its responses over expert-written ones in human evaluations....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-11-22 Patience Is The Key to Large Language Model Reasoning.yaml b/current/2024-11-22 Patience Is The Key to Large Language Model Reasoning.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Yijiong Yu
+title: Patience Is The Key to Large Language Model Reasoning
+thumbnail: ""
+link: https://huggingface.co/papers/2411.13082
+summary: Recent advancements in the field of large language models, particularly through the Chain of Thought (CoT) approach, have demonstrated significant improvements in solving complex problems. However, existing models either tend to sacrifice detailed reasoning for brevity due to user preferences, or require extensive and expensive training data to learn complicated reasoning ability, limiting their potential in solving complex tasks. To bridge this gap, following the concept of scaling test-time, w...
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-11-22 Stable Flow: Vital Layers for Training-Free Image Editing.yaml b/current/2024-11-22 Stable Flow: Vital Layers for Training-Free Image Editing.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Omri Avrahami
+title: 'Stable Flow: Vital Layers for Training-Free Image Editing'
+thumbnail: ""
+link: https://huggingface.co/papers/2411.14430
+summary: The paper proposes a method to identify 'vital layers' within Diffusion Transformer (DiT) models, crucial for image formation, to perform consistent image edits via selective injection of attention features. The authors also introduce an improved image inversion method for flow models and evaluate their approach through qualitative and quantitative comparisons, along with a user study, demonstrating its effectiveness across multiple applications....
+opinion: placeholder
+tags:
+    - ML
diff --git a/current/2024-11-22 Ultra-Sparse Memory Network.yaml b/current/2024-11-22 Ultra-Sparse Memory Network.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Zihao Huang
+title: Ultra-Sparse Memory Network
+thumbnail: ""
+link: https://huggingface.co/papers/2411.12364
+summary: This paper proposes UltraMem, a new architecture that combines a large-scale, ultra-sparse memory layer to address the limitations of existing models. It significantly reduces inference latency while maintaining model performance and demonstrates favorable scaling properties. Experiments show that it achieves state-of-the-art inference speed and model performance within a given computational budget....
+opinion: placeholder
+tags:
+    - ML
diff --git a/... Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages.yaml b/... Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages.yaml
@@ -0,0 +1,9 @@
+date: "2024-11-22"
+author: Bethel Melesse Tessema
+title: 'UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages'
+thumbnail: ""
+link: https://huggingface.co/papers/2411.14343
+summary: We developed a method to collect text data for low-resource languages from the Common Crawl corpus efficiently, resulting in larger datasets than before. Our approach, UnifiedCrawl, filters and extracts common crawl using minimal compute resources. We fine-tuned multilingual LLMs using this data and efficient adapter methods (QLoRA), which significantly boosted performance on low-resource languages while minimizing VRAM usage. Our experiments showed improvements in language modeling perplexity a...
+opinion: placeholder
+tags:
+    - ML