GitHub - aleSuglia/EVUD: Egocentric Video Understanding Dataset (EVUD)

Egocentric Video Understanding Dataset (EVUD)

If you like our project, please give us a star ⭐ on GitHub for the latest update.

TL;DR

We introduce the Egocentric Video Understanding Dataset (EVUD), an instruction-tuning dataset for training VLMs on video captioning and question answering tasks specific to egocentric videos.

News

The AlanaVLM paper is now on arXiv!
All the checkpoints developed for this project are available on Hugging Face
The EVUD dataset is available on Hugging Face

Prerequisites

Create and activate virtual environment:

python -m venv env
source venv/bin/activate
pip install -r requirements.txt

Data generation

Together with our generated data released on HuggingFace, we are also releasing all the scripts to reproduce our data generation pipeline:

The generated data follows the LLaVa JSON format.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
ego4d_vqa		ego4d_vqa
egoclip		egoclip
figures		figures
gemini		gemini
hm3d		hm3d
vsr		vsr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Egocentric Video Understanding Dataset (EVUD)

If you like our project, please give us a star ⭐ on GitHub for the latest update.

TL;DR

News

Prerequisites

Data generation

About

Releases

Packages

Languages

License

aleSuglia/EVUD

Folders and files

Latest commit

History

Repository files navigation

Egocentric Video Understanding Dataset (EVUD)

If you like our project, please give us a star ⭐ on GitHub for the latest update.

TL;DR

News

Prerequisites

Data generation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages