Universal Visual Decomposer:
Long-Horizon Manipulation Made Easy

[Website] [arXiv] [PDF] [Installation] [Usage] [BibTex]

pull.mp4

Installation

Follow the instruction for installing mujuco-py and install the following apt packages if using Ubuntu:

sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf

create conda env with Python==3.9

conda create -n uvd python==3.9 -y && conda activate uvd

Install any/all standalone visual foundation models from their repos separately before setup UVD, in case dependency conflicts, e.g.:

VIP

git clone https://github.com/facebookresearch/vip.git
cd vip && pip install -e .
python -c "from vip import load_vip; vip = load_vip()"

R3M

git clone https://github.com/facebookresearch/r3m.git
cd r3m && pip install -e .
python -c "from r3m import load_r3m; r3m = load_r3m('resnet50')"

LIV (& CLIP)

git clone https://github.com/penn-pal-lab/LIV.git
cd LIV && pip install -e . && cd liv/models/clip && pip install -e .
python -c "from liv import load_liv; liv = load_liv()"

VC1

git clone https://github.com/facebookresearch/eai-vc.git 
cd eai-vc && pip install -e vc_models

DINOv2 and ResNet pretrained with ImageNet-1k are directly loaded via torch hub and torchvision.

Under this UVD repo directory, install other dependencies

pip install -e .

Usage

We provide a simple API for decompose RGB videos:

import torch
import uvd

# (N sub-goals, *video frame shape)
subgoals = uvd.get_uvd_subgoals(
    "/PATH/TO/VIDEO.*",   # video filename or (L, *video frame shape) video numpy array
    preprocessor_name="vip",    # Literal["vip", "r3m", "liv", "clip", "vc1", "dinov2"]
    device="cuda" if torch.cuda.is_available() else "cpu",  # device for loading frozen preprocessor
    return_indices=False,   # True if only want the list of subgoal timesteps
)

or run

python demo.py

to host a Gradio demo locally with different choices of visual representations.

Simulation Data

We post-processed the data released from original Relay-Policy-Learning that keeps the successful trajectories only and adapt the control and observations used in our paper by:

python datasets/data_gen.py raw_data_path=/PATH/TO/RAW_DATA

Also consider to force set Builder = LinuxCPUExtensionBuilder to Builder = LinuxGPUExtensionBuilder in PATH/TO/CONDA/envs/uvd/lib/python3.9/site-packages/mujoco_py/builder.py to enable (multi-)GPU acceleration.

Runtime Benchmark

Since UVD's goal is to be an off-the-shelf method applying to any existing policy learning frameworks and models, across BC and RL, we provide minimal scripts for benchmarking the runtime showing negligible runtime under ./scripts directory:

python scripts/benchmark_decomp.py /PATH/TO/VIDEO

and passing --preprocessor_name with other preprocessors (default vip) and --n for the number of repeated iterations (default 100).

For inference or rollouts, we benchmark the runtime by

python scripts/benchmark_inference.py

and passing --policy for using MLP or causal GPT policy; --preprocessor_name with other preprocessors (default vip); --use_uvd as boolean arg for whether using UVD or no decomposition (i.e. final goal conditioned); and --n for the number of repeated iterations (default 100). The default episode horizon is set to 300. We found that running in the terminal would be almost 2s slower every episode than directly running with python IDE (e.g. PyCharm, under the script directory and run as script instead of module), but the general trend that including UVD introduces negligible extra runtime still holds true.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{zhang2024universal,
  title={Universal visual decomposer: Long-horizon manipulation made easy},
  author={Zhang, Zichen and Li, Yunshuang and Bastani, Osbert and Gupta, Abhishek and Jayaraman, Dinesh and Ma, Yecheng Jason and Weihs, Luca},
  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
  pages={6973--6980},
  year={2024},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
datasets		datasets
scripts		scripts
uvd		uvd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Universal Visual Decomposer:
Long-Horizon Manipulation Made Easy

Installation

Usage

Simulation Data

Runtime Benchmark

Citation

About

Releases

Packages

Languages

License

zcczhang/UVD

Folders and files

Latest commit

History

Repository files navigation

Universal Visual Decomposer: Long-Horizon Manipulation Made Easy

Installation

Usage

Simulation Data

Runtime Benchmark

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Universal Visual Decomposer:
Long-Horizon Manipulation Made Easy

Packages