The target effect exceeds EMO, supporting Audio2Video and Video2Video.
The development of the project requires organizing datasets, reading multiple articles, and a certain amount of GPU resources, which require a considerable amount of money. Therefore, your sponsorship is crucial to us.
Sponsors will have priority access to project updates and model weights.
- Foundation Dataset Preparation: Collect and curate a comprehensive dataset of high-quality audio-visual pairs.
- Model Implementation: Implement the model using PyTorch.
- Large-Scale Model Training & Tuning
- Package the model weights and configuration files for easy sharing and replication.