Skip to content

Implementation of the paper "Fast video visual quality and resolution improvement using SR-UNet".

License

Notifications You must be signed in to change notification settings

fede-vaccaro/fast-sr-unet

Repository files navigation

Fast SR-UNet

This repository contains the implementation of [1]. It is an architecture comprised with a GAN-based training procedure for obtaining a fast neural network which enable better bitrate performances respect to the H.265 codec for the same quality, or better quality at the same bitrate.

Requirements:

  • Installing CUDA with torchvision and torch: $ conda install pytorch torchvision cudatoolkit=10.2 -c pytorch -c
  • LPIPS: $ pip install lpips
  • FFMpeg compiled with H.265 codec and also VMAF metric. My version is included in the helper/ directory but it won't likely work. For references check the official compilation guide and the VMAF GitHub Repository.

The dataset:

First, the dataset we use for training is the BVI-DVC. For preparing the dataset there are two helper script, compress_train_videos.sh for spatially compressing and encoding each video, then with extract_train_frames.sh the dataset can be prepared.
The train dataset should follow this naming scheme (assuming the videos are encoded with CRF 23):

  [DATASET_DIR]/
      frames_HQ/
          [clipName1]/
              [clipName1]_001.png
              [clipName1]_002.png
              ...
              [clipName1]_064.png
          [clipName2]/
              ...
          [clipNameN]/
              ...
      frames_QF23/
          [clipName1]/
              [clipName1]_001.png
              [clipName1]_002.png
              ...
              [clipName1]_064.png
          [clipName2]/
              ...
          [clipNameN]/
              ...

Training the model:

To train the SR-ResNet described in the paper for 2x Super Resolution (as used in the model for the 540p -> 1080p upscaling), you can use this command.

$ python train.py --arch srunet --device 0 --upscale 2 --export [EXPORT_DIR] \
--epochs 80 --dataset [DATASET_DIR] --crf 23

Or, since most of these arguments are defaults, simply

$ python train.py --dataset [DATASET_DIR]

For more information about the other parameters, inspect utils.py or try

& python train.py -h

However, in the bandwidth experiments we employed a lighter model, trained on a range of CRFs for performing an easier 1.5x upscale (720p -> 1080p). It is obtainable with the following command:

$ python train.py --arch srunet --layer_multiplier 0.7 --n_filters 48 --downsample 0.75 --device 0 \
--upscale 2 --export [EXPORT_DIR] --epochs 80 --dataset [DATASET_DIR] --crf [CRF]

Testing the models:

You may want to test your models. In our paper we tested on the 1080p clips available from the (Derf's Collection)[https://media.xiph.org/video/derf/] in Y4M format. For preparing the test set (of encoded clips) you can use the compress_test_videos.sh helper script. This time, the test set will be structured as follows, and there is no need of extracting each frame:

    [TEST_DIR]/
        encoded540CRF23/
            aspen_1080p.mp4
            crowd_run_1080p50.mp4
            ducks_take_off_1080p50.mp4
            ...
            touchdown_pass_1080p.mp4
        aspen_1080p.y4m
        crowd_run_1080p50.y4m
        ducks_take_off_1080p50.y4m
        ...
        touchdown_pass_1080p.y4m

Finally, for testing a model (e.g. the one performing 1.5x upscale) which name is [MODEL_NAME] you can use the command:

$ python evaluate_model.py --model [MODEL_NAME] --arch srunet --layer_mult 0.7 --n_filters 48 \
--downsample 0.75 --device 0 --upscale 2 --crf 23 --test_dir [TEST_DIR] --testinputres 720 --testoutputres 1080

Ultimately will be printed on screen the experimental results, and also will be saved a .csv file contained these infos.

Inference with the model using render.py

You can use the script render.py for using the model in real-time to upscale your clips. Examples:

  • For 2x upscaling
    $ python render.py --clipname path/to/clip.mp4 --model models/srunet_2x_crf23.pth
    
  • For 1.5x upscaling
    $ python render.py --clipname path/to/clip.mp4 --model models/srunet_1.5x_crf23.pth \
    --layer_mult 0.7 --n_filters 48 --downsample 0.75
    

You will notice that by default the output is split in two halves: on the left there is the input, on the right there is the upscaled version. You can show only the upscaled version by adding the flag --show-only-upscaled.

About performances, my GTX 1080 Ti is enough for rendering at 30fps when upscaling 540p -> 1080p and 25fps when 720p -> 1080p. Note that in the paper we also employed Nvidia Apex for speeding up inference times.

Examples

TajMahal Check this link for the complete clip.
Venice

References:

This code is the implementation of my Master Degree Thesis, from which my supervisors and I wrote the paper:

  • [1] Fast video visual quality and resolution improvement using SR-UNet. Authors Federico Vaccaro, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo (accepted at ACM MM '21)

About

Implementation of the paper "Fast video visual quality and resolution improvement using SR-UNet".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published