Skip to content

Commit

Permalink
Automated report
Browse files Browse the repository at this point in the history
  • Loading branch information
deep-diver committed Dec 4, 2024
1 parent 6b4c97b commit 5184297
Show file tree
Hide file tree
Showing 15 changed files with 135 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-03"
author: Kaixiong Gong
title: 'AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?'
thumbnail: ""
link: https://huggingface.co/papers/2412.02611
summary: This paper introduces AV-Odyssey Bench, a comprehensive audio-visual benchmark to evaluate the understanding of audio-visual information by multimodal large language models. It consists of 4,555 multiple-choice questions that require models to leverage clues from both visual and audio inputs. The benchmark aims to provide insights for future dataset collection and model development by revealing the limitations of current models....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-03"
author: Zicheng Lin
title: 'Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM''s Reasoning Capability'
thumbnail: ""
link: https://huggingface.co/papers/2411.19943
summary: In this paper, we introduce cDPO, a novel approach that identifies and rewards critical tokens during the alignment process in Large Language Models (LLMs) to improve their reasoning capability. We use a contrastive estimation approach to identify critical tokens and extend the conventional DPO algorithms to token-level DPO for better alignment with critical token information. Our approach is evaluated on GSM8K and MATH500 benchmarks with Llama-3 (8B and 70B) and deepseek-math (7B) models, demon...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-03"
author: Lifan Yuan
title: Free Process Rewards without Process Labels
thumbnail: ""
link: https://huggingface.co/papers/2412.01981
summary: This paper introduces a new method for training a process reward model (PRM) without the need for manually annotated labels at every intermediate step. The method trains an outcome reward model (ORM) on cheaper response-level labels and shows that it outperforms a strong baseline using less than 1/38 of the training data. The performance can be further improved with majority voting and by scaling up instructions and responses. The method is more data-efficient and can keep improving generation m...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-03"
author: Hongyan Zhi
title: 'LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences'
thumbnail: ""
link: https://huggingface.co/papers/2412.01292
summary: Researchers propose LSceneLLM, a framework that uses an LLM's visual preference to identify task-relevant areas in large 3D scenes, then uses a scene magnifier module to capture fine-grained details in these areas. They also introduce XR-Scene, a benchmark for large scene understanding tasks, and show that their method outperforms existing methods....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-03"
author: Minhyun Lee
title: 'MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation'
thumbnail: ""
link: https://huggingface.co/papers/2411.19067
summary: This paper introduces a new way to improve the performance of Referring Image Segmentation (RIS) by using a method called Masked Referring Image Segmentation (MaskRIS). This method uses image and text masking, followed by Distortion-aware Contextual Learning (DCL) to make the model more robust to things like occlusions and incomplete information. The authors demonstrate that their method outperforms existing methods in both fully supervised and weakly supervised settings, and achieves new state-...
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-03"
author: Jiangtao Wang
title: Scaling Image Tokenizers with Grouped Spherical Quantization
thumbnail: ""
link: https://huggingface.co/papers/2412.02632
summary: We propose Grouped Spherical Quantization (GSQ) for image tokenizers, which improves reconstruction quality and enables efficient scaling. Our findings reveal distinct behaviors at high and low spatial compression levels, and we show that GSQ can represent high-dimensional latent spaces more efficiently....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-03"
author: Mingzhe Zheng
title: 'VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation'
thumbnail: ""
link: https://huggingface.co/papers/2412.02259
summary: VideoGen-of-Thought (VGoT) is a new method for creating multi-shot videos that are cohesive and have a logical storyline. It does this by breaking the video creation process into smaller steps, including generating scripts, keyframes, and individual shots, and by ensuring that the characters and story remain consistent throughout the video....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Luo Yu
title: A dynamic parallel method for performance optimization on hybrid CPUs
thumbnail: ""
link: https://huggingface.co/papers/2411.19542
summary: A new method for running AI models on hybrid CPUs has been introduced to balance the workload of each core and improve inference performance. This method allows Neural Speed to use more than 90% of the memory bandwidth on two hybrid Intel CPUs....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Orlando Marquez Ayala
title: Generating a Low-code Complete Workflow via Task Decomposition and RAG
thumbnail: ""
link: https://huggingface.co/papers/2412.00239
summary: 'The paper introduces two design patterns for AI-based systems: Task Decomposition and Retrieval-Augmented Generation (RAG). These patterns are used to create a complex real-world GenAI application for enterprise users: Workflow Generation. The paper discusses the trade-offs of these patterns in terms of software quality attributes and provides recommendations for AI practitioners....'
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Sumeet Ramesh Motwani
title: 'MALT: Improving Reasoning with Multi-Agent LLM Training'
thumbnail: ""
link: https://huggingface.co/papers/2412.01928
summary: The paper introduces a new approach called MALT that trains multiple LLMs to work together on reasoning problems. The LLMs are given specific roles and work together in a sequential way. The paper also proposes a method for generating synthetic data and assigning rewards to improve each model's performance. The approach is evaluated on three datasets and shows improvements over the baseline model....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Daniel Geng
title: 'Motion Prompting: Controlling Video Generation with Motion Trajectories'
thumbnail: ""
link: https://huggingface.co/papers/2412.02700
summary: The paper introduces a new method for controlling video generation by using motion trajectories, which can be sparse or dense and encode any number of trajectories. This method can be used to control camera and object motion, transfer motion, and edit images. The results show realistic physics and emergent behaviors, and the method outperforms previous methods in quantitative evaluations and human studies....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Junyuan Zhang
title: 'OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation'
thumbnail: ""
link: https://huggingface.co/papers/2412.02592
summary: 'This paper introduces OHRBench, a benchmark for evaluating the impact of OCR on Retrieval-Augmented Generation (RAG) systems. It identifies two types of OCR noise: Semantic Noise and Formatting Noise, and demonstrates the vulnerability of RAG systems to these noises. The paper also discusses the potential of using Vision-Language Models (VLMs) without OCR in RAG systems....'
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Haodong Chen
title: 'OmniCreator: Self-Supervised Unified Generation with Universal Editing'
thumbnail: ""
link: https://huggingface.co/papers/2412.02114
summary: OmniCreator is a self-supervised framework that can generate or edit images and videos based on text prompts. It uses original text-video pairs to learn the relationship between text and video, and can generate high-quality videos or edit existing videos to match a given text prompt. The framework also works with images and has been tested on a new dataset called OmniBench-99, where it outperformed other models....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Alessandro Scirè
title: Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS
thumbnail: ""
link: https://huggingface.co/papers/2411.19655
summary: This paper introduces LLM-OASIS, a large-scale resource for training end-to-end factuality evaluators for Large Language Models (LLMs). LLM-OASIS addresses the limitations of existing resources by being task- and domain-agnostic, large in size, and designed for complex verification tasks. The paper demonstrates that LLM-OASIS presents a significant challenge for state-of-the-art LLMs, highlighting its potential to drive future research in the field....
opinion: placeholder
tags:
- ML
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
date: "2024-12-04"
author: Dhiman Paul
title: 'VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval'
thumbnail: ""
link: https://huggingface.co/papers/2412.01558
summary: This paper proposes VideoLights, a new system for finding important parts of videos and finding specific moments in them. It uses a special method to combine video and text information, and it also uses a large language model to help improve the results. The system performs better than other systems on several tests, and the code is available online....
opinion: placeholder
tags:
- ML

0 comments on commit 5184297

Please sign in to comment.