Jiahui (Gabriel) Huang,
Yuhe Jin,
Kwang Moo Yi,
Leonid Sigal
We introduce layered controllable video generation, where we, without any supervision, decompose the initial frame of a video into foreground and background layers, with which the user can control the video generation process by simply manipulating the foreground mask.
arXiv | BibTeX | Project Page
A suitable conda environment named control
can be created
and activated with:
conda env create -f environment.yaml
conda activate control
python setup.py install
First, download the inference model here, and put it in checkpoints/
To run demo:
python demo.py
Training on your own dataset can be beneficial to get better tokens and hence better images for your domain. Those are the steps to follow to make this work:
-
download full resolution BAIR Robot Pushing dataset from here.
-
extract the data, it should have the following structure:
$ data_path/{split}/ ├── vid1 │ ├── 00000.png │ ├── 00001.png │ ├── ... ├── vid2 │ ├── 00000.png │ ├── 00001.png │ ├── ... ├── ...
where
{split}
is one oftrain
/test
-
create 2 text files a
xx_train.txt
andxx_test.txt
that point to the files in your training and test set respectively. you can use the helper function:python scripts/make_txt.py --data_path <your data path>
-
adapt
configs/bair.yaml
to point to these 2 files -
run
python main.py --base configs/bair.yaml -t True --gpus 0,1
to train on two GPUs. Use--gpus 0,
(with a trailing comma) to train on a single GPU.
@inproceedings{Huang2022LayeredCV,
title={Layered Controllable Video Generation},
author={Jiahui Huang and Yuhe Jin and Kwang Moo Yi and Leonid Sigal},
booktitle={ECCV},
year={2022}
}