MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
Zhiqiang Xia *,
Zhaokang Chen*,
Bin Wu†,
Chao Li,
Kwok-Wai Hung,
Chao Zhan,
Yingjie He,
Wenjiang Zhou
(*co-first author, †Corresponding Author, [email protected])
github huggingface HuggingfaceSpace [project](comming soon) Technical report (comming soon)
We have setup the world simulator vision since March 2023, believing diffusion models can simulate the world. MuseV
was a milestone achieved around July 2023. Amazed by the progress of Sora, we decided to opensource MuseV
, hopefully it will benefit the community. Next we will move on to the promising diffusion+transformer scheme.
We will soon release MuseTalk
, a real-time high quality lip sync model, which can be applied with MuseV as a complete virtual human generation solution. Please stay tuned!
MuseV
is a diffusion-based virtual human video generation framework, which
- supports infinite length generation using a novel Visual Conditioned Parallel Denoising scheme.
- checkpoint available for virtual human video generation trained on human dataset.
- supports Image2Video, Text2Image2Video, Video2Video.
- compatible with the Stable Diffusion ecosystem, including
base_model
,lora
,controlnet
, etc. - supports multi reference image technology, including
IPAdapter
,ReferenceOnly
,ReferenceNet
,IPAdapterFaceID
. - training codes (comming very soon).
- [03/27/2024] release
MuseV
project and trained modelmusev
,muse_referencenet
. - [03/30/2024] add huggingface space gradio to generate video in gui