This is the official code repository for BEVGen, "Street-View Image Generation from a Bird’s-Eye View Layout, Alexander Swerdlow, Runsheng Xu, Bolei Zhou"
We provide pinned dependencies in pinned_requirements.txt
for reproducibility but suggest first attempting to install with unpinned dependencies:
pip install -r requirements.txt
An Nvidia GPU is required for training and inference. This project has been tested on an Nvidia A5000 with Python 3.9.13 and CUDA 11.6 on Ubuntu 20.04.
Note: If you need training code or support for nuScenes, see this branch.
The Argoverse 2 dataset can be downloaded here
Extract the compressed files and define this directory in configs/paths
as argoverse_dir
. We recomend setting ARGOVERSE_DATA_DIR=...
as an enviorment variable before running instead of modifying directly as some scripts require the enviorment variable to be defined. The structure should look as follows:
av2
└───sensor
│ └───test
| └───train
| └───val
| └───0aa4e8f5-2f9a-39a1-8f80-c2fdde4405a2
| └───...
In addition, to run 2nd stage training, you must pre-generate the BEV representation:
python multi_view_generation/scripts/argoverse_preprocess.py --multiprocess --save-dir ${ARGOVERSE_DATA_DIR}/generated_bev_representation
Then, you must define bev_dir_name=generated_bev_representation
as an argument for the Argoverse dataset in the datamodule
config.
We provide pre-trained weights for the Argoverse dataset. This model uses a bidirectional transformer decoder that outperforms the masking order described in the original paper. The original masking strategies are still available but may require additional configuration before training.
The following command will download the weights from huggingface:
mkdir -p pretrained
wget https://huggingface.co/aswerdlow/bevgen/resolve/main/argoverse_rgb.ckpt -P pretrained
wget https://huggingface.co/aswerdlow/bevgen/resolve/main/argoverse_bev.ckpt -P pretrained
wget https://huggingface.co/aswerdlow/bevgen/resolve/main/argoverse_stage_two.tar.gz -P pretrained && tar -xf pretrained/argoverse_stage_two.tar.gz -C pretrained
Please download
Inference:
CUDA_VISIBLE_DEVICES=0 \
python generate.py \
experiment=muse_stage_two_multi_view \
datamodule=stage_2_argoverse_generate \
'modes=[argoverse,generate]' \
trainer.devices=1 extras.mini_dataset=False \
datamodule.batch_size=16 \
'datamodule.test.eval_generate="$OUTPUT_PATH"'
You must replace $OUTPUT_PATH
with a local path.
To profile code, append debug=profile
.
More info on hydra can be found here.
multi_view_generation/bev_utils/argoverse.py
handles all dataloading for Argoverse 2 respectively. Note that not all combinations of configurations were tested and some may [silently] fail. Refer to configs/datamodule
to see examples of valid configurations used.
Note that multi_view_generation/utils/callback.py
handles most saving/logging of images during training and inference. We save data in 3 places: WandDB, the run directory and, if configured, a separate directory defined by save_dir
as an argument to the callback.
You also may wish to enable rand_str
if you generate multiple samples with the same sample_token
and save them to the same directory.
multi_view_generation/scripts/metrics*.py
generates metrics as reported in the paper.
We define the angle of the camera at between [0, 2π) going counterclockwise relative to the ego reference frame.
All BEV representations have the ego vehicle frame at the center of the segmented image, pointing upwards.
This codebase contains a large number of programs, each with many possible configurations, that were used for development and testing. These are not necessary for the core training and inference but were nonentheless provided to aid future research. However, it is expected that any functionality not directly described in this document will require modification before working as expected.
Specific errata are as follows:
- The DeepSpeed optimizer along with 16-bit precision is required for stage 2 training. DeepSpeed's sparse transformer module only supports fp16 and their optimizer automatically scales the lr and retries in case of NaNs.
- Stage 2 training can become unstable with small LR tweaks. Monitor the gradients and loss carefully.
- The Attention mask passed to the sparse attention module cannot be empty in any rows, even if they will be ignored. This causes NaN outputs.
- When resuming training be careful about which weights are loaded. The 1st stages are loaded first, then the model checkpoint (if available), and then the global checkpoint (if available). The model checkpoint (
+model.ckpt_path=...
) only loads model weights and the global checkpoint (ckpt_path=...
) loads model weights, optimizer states, etc. - DeepSpeed does not currently support changing the number of GPUs (referred to as world-size) when resuming from a global checkpoint.
- We use pyrootutils so that we can run from anywhere as a script and use either absolute or relative imports. The import must be at the top of the file to work correctly.
- Wandb logs show incorrect step counts for some graphs by default. Switch to
global_step
to get an accurate step count. - To run 2nd stage training with the camera bias and sparse masking, you must call
save_cam_data
in the respective dataset class. This saves the camera intrinsics/extrinsics so we can initialize the model before any data is passed through. - Some scripts require the enviorment variable
SAVE_DATA_DIR
to be defined. - Both nuScenes and Argoverse datasets will save cached files in
~/.cache/nuscenes
and~/.cache/av2
respectively. This speeds up instantiating the datasets but if you change your dataset files, you must remember to delete the related cache files.
- Some nested hydra experiment configs require changing the paths for their parent configs depending on how they are called. E.g.
../config_name
vsconfig_name
and./config_name
vsparent/config_name
. - Generation will sometimes complete sucessfully, despite only running on a portion of the 'test' set. When provided with an
eval_generate
argument, the NuScenes dataloader will detect which instances have already been generated and remove those from the test set mitigate this issue.