Skip to content

Encoder Performance

Christian Lehmann edited this page Dec 13, 2024 · 72 revisions

The encoder performance tests focus on the most relevant HD (1920x1080) and UHD (3840x2160) resolution use cases, in the following denoted as HD4K use case, with random access encoding as defined in the JVET common test conditions (CTC) 10. Unless stated otherwise, all presented results are shown for the CTC test sequences, i.e. classes A1 and A2 for UHD and class B for HD sequences. Constant QP encoding is used with quantization paramter (QP) values of 22, 27, 32 and 37 according to VTM CTC. Reported rate distortion results are calculated as Bjøntegaard delta rate (BD-rate) 11,12 to evaluate the compression performance based on the following weighted average of peak signal-to-noise ratio (PSNR) and multiscale structural similarity measure (MS-SSIM) 13 values per color component:

PSNRYUV = (6 * PSNRY + PSNRCb + PSNRCr) / 8

MS-SSIMYUV = (6 * MS-SSIMY + MS-SSIMCb + MS-SSIMCr) / 8

For VVenC and x265, multi-threading with 8 threads has been enabled to generate all results. For HM and VTM, multi-threading is not supported. All tests have been performed on Dell Servers with AMD EPYC 7502P 32-Core Processor @2.5GHz. Since v1.7.0 the AMD EPYC architecture is used. All anchors and older versions are remeasured on that platform. Before the performance was measured on Supermicro servers with Intel Xeon processors E5-2697A v4 @2.6GHz.

PSNR Optimized Use Case

The PSNRYUV BD-rate gain of VVenC over the HEVC test model reference software HM-17.0 is shown in Figure 1. PSNRYUV BD-rate represents the approximate average bit-rate savings between two encoders for the same objective quality (as quantified by PSNRYUV above). Here lower values mean larger bit-rate savings with respect to the HM-17.0 anchor. Please note the logarithmic scale of the relative encoder runtime in comparison to HM-17.0.

With the slower preset and multi-threading enabled the BD-rate gain of VVenC over HM is similar to VTM-19.2 CTC, but a speedup of more than 20x for HD4K sequences is achieved over VTM.

With the faster preset and multi-threading the BD-rate gain over HM is still approx. 13.7%, with a speedup of more than 2300x for HD4K over VTM-19.2. Comparing the runtime with HM gives a speedup of around 370x.

As a good tradeoff between encoder runtime and BD-rate performance, we recommend the medium preset with multi-threading enabled. Here, the BD-rate gain over HM is approx. 35.8%, which is close to the slower preset and VTM CTC, but in comparison to VTM-19.2 the encoder runs 280x faster for HD4K sequences. Compared to HM-17.0 this is an encoder runtime speedup of 45x. A summary of all results is shown in Table III.

Table III: PSNRYUV BD-rate and multi-threaded (8 threads) encoder speedup for HD, UHD and both test sequences, for VVenC v1.13.0.

HD UHD HD4K
Preset PSNR
BD-rate
vs. HM
Speedup
vs. HM
Speedup
vs. VTM
PSNR
BD-rate
vs. HM
Speedup
vs. HM
Speedup
vs. VTM
PSNR
BD-rate
vs. HM
Speedup
vs. HM
Speedup
vs. VTM
faster -10.9% 350x 2300x -15.0% 400x 2400x -13.2% 380x 2400x
fast -25.6% 150x 990x -27.1% 170x 1000x -26.4% 160x 1000x
medium -34.7% 41x 270x -36.3% 56x 340x -35.6% 49x 310x
slow -38.9% 12.0x 80x -40.2% 17.0x 100x -39.6% 14x 92x
slower -41.8% 2.7x 17x -43.1% 3.8x 23x -42.5% 3.2x 20x

VVenC MT preset history

Figure 1: PSNRYUV BD-rate gain and relative encoder runtime for VVenC in comparison to HM-17.0 and VTM (JVET HD and UHD test sequences, MCTF enabled for HM-17.0 and VTM-19.2). Results are given for the 5 preset options: faster, fast, medium, slow and slower. VVenC is running multi-threaded using 6 threads for version <= 0.2 and 8 threads for version >=1.0. Lower PSNR YUV BD-rate values mean a better compression for the same objective quality in terms of PSNRYUV

Additionally, Figure 2 includes multi-threaded results for the HEVC open-source encoder x265 v3.5 at comparable speed presets 14. For the comparison with VVenC, also x265 was configured to run with 8 threads. Besides sequence-specific parameters, the following parameter settings have been used for x265:

--preset {0,1,2,3,…,9} --tune psnr --crf {17,22,27,32} --keyint 1s --min-keyint 1s --profile main10 --output-depth 10

VVenC MT preset

Figure 2: PSNRYUV BD-rate gain and relative encoder runtime in comparison to HM-17.0 for VVenC and x265 running with 8 threads. Lower PSNR YUV BD-rate values mean a better compression for the same objective quality in terms of PSNRYUV.

Perceptually Optimized Quantization Parameter Adaptation

To improve the perceived (subjective) coding quality, VVenC supports a low-complexity quantization parameter adaptation (QPA) algorithm based on the simplified model of the human visual system adopted in the XPSNR psychovisual video quality measure 15. To evaluate the quality of the perceptually optimized quantization parameter adaptation (PQPA) especially in comparison to the approaches used in VTM and x265, MS-SSIMYUV as a measure of subjective video quality (see MS-SSIMYUV above) is used.

In Figure 3 the MS-SSIMYUV BD-rate gain of VVenC over the HEVC test model reference software HM-16.24 is shown (lower is better). For VTM simulation, JVET's common test conditions CTC with additional PQPA are used (--PerceptQPA=1). With PQPA enabled, the speedups achieved over HM and VTM are similar to the Non-PQPA results presented in the previous section. This demonstrates the low-complexity nature of the PQPA algorithm. Also, the MS-SSIMYUV based BD-rates show that additional bit-rate reduction can be achieved. Especially for the slower preset, an MS-SSIM BD-rate gain of more than 4% over VTM CTC without PQPA is realized. We recommend using the medium preset with multi-threading and PQPA enabled as a good tradeoff between encoder runtime and resulting perceived video quality. A summary of the MS-SSIMYUV results for PQPA is shown in Table IV.

VVenC MT QPA preset history

Figure 3: MS-SSIM YUV BD-rate gain and encoder runtime in comparison to HM-16.24 for VTM and VVenC with perceptually optimized quantization parameter adaptation enabled for HD4K sequences (MCTF enabled for HM-16.24 and VTM-19.2). VVenC results are given for the 5 preset options: faster, fast, medium, slow and slower. VVenC is running multi-threaded using 6 threads for version <= 0.2 and 8 threads for version >=1.0. Lower MS-SSIM YUV BD-rate values mean a better compression for the same quality in terms of MS-SSIMYUV.

Additionally, Figure 4 includes multi-threaded results for the HEVC open-source encoder x265 v3.5 tuned for SSIM at comparable speed presets 14. For the comparison with VVenC, also x265 was configured to run with 8 threads. Besides sequence-specific parameters, the following parameter settings have been used for x265:

--preset {0,1,2,3,…,9} --tune ssim --crf {17,22,27,32} --keyint 1s --min-keyint 1s --profile main10 --output-depth 10

VVenC MT QPA

Figure 4: MS-SSIMYUV BD-rate gain and encoder runtime in comparison to HM-16.24 for VVenC with QPA enabled and x265 with --tune=ssim. Both VVenC and x265 are running with 8 threads. Lower YUV MS-SSIM YUV BD-rate values mean a better compression for the same quality in terms of MS-SSIMYUV.

A summary of the MS-SSIMYUV results for PQPA is shown in Table IV.

Table IV: MS-SSIMYUV BD-rate gain and multi-threaded encoder speedup for HD and UHD test sequences for VVenC v1.12.0 with perceptually optimized QPA enabled.

HD UHD HD4K
Preset SSIM
BD-rate
vs. HM
Speedup
vs. HM
Speedup
vs. VTM
SSIM
BD-rate
vs. HM
Speedup
vs. HM
Speedup
vs. VTM
SSIM
BD-rate
vs. HM
Speedup
vs. HM
Speedup
vs. VTM
faster -21.0% 340x 2300x -20.6% 380x 2600x -20.8% 360x 2500x
fast -32.4% 150x 1000x -31.7% 170x 1100x -32.0% 160x 1100x
medium -38.9% 41x 280x -40.0% 56x 390x -39.5% 48x 330x
slow -42.5% 12x 83x -43.8% 17x 120x -43.2% 14x 100x
slower -44.8% 2.6x 18x -46.5% 3.8x 26x -45.7% 3.2x 22x

Rate Control

To support encoding with a predefined target rate instead of a fixed QP in which the final bitrate is generally unknown beforehand, VVenC includes one- and two-pass rate control 16,17. The one-pass (GOP-wise) rate control uses a short look-ahead window to collect information on-the-fly by fast encoding all frames in the group of pictures (GOP) covered by the window. It is intended for applications that cannot perform a full first pass through the entire sequence. The two-pass (sequence-wise) rate control includes a first, fast encoding pass in which the statistics for the entire sequence are collected in advance. This information is then used in the second pass to improve the rate control performance at the cost of increased encoding time. In addition, rate capping by means of a maximum rate parameter is supported in one- and two-pass mode 18.

PSNRYUV BD-rate results of both rate control variants for HD4K content over a fixed QP VVenC encoding are shown in Table V. It should be noted that 10-second versions of the CTC sequences from classes A1 and A2 are used for the rate control tests. The target rates for the rate control runs were set based on the resulting target rates from the fixed QP runs. All runs were executed using 8 threads and an Intra period of 1 second. The one-pass approach achieves BD-rate performance that is similar to that of the fixed QP encoding, while keeping the encoding runtime overhead low. The BD-rate performance of the two-pass version is sometimes even better than that of the fixed QP encoding. For both rate control variants, the computational overhead is decreasing as presets become slower due to the constant complexity of the look-ahead pass. The average bitrate deviation for the one-pass approach is around 2%, while in the two-pass case it is around 1.3%.

Table V: PSNRYUV BD-rate and relative encoding runtime for 1- and 2-pass rate control on HD4K sequences in comparison to a VVenC fixed QP encoding for all presets. Encoders were running with 8 threads and version v1.12.0.

HD4K
1-pass RC 2-pass RC
Preset PSNRYUV
BD-rate vs.
Fixed QP
Encoding
Time vs.
Fixed QP
PSNRYUV
BD-rate vs.
Fixed QP
Encoding
Time vs.
Fixed QP
faster 1.84% 116% 0.40% 118%
fast 2.19% 107% 1.02% 113%
medium 2.27% 107% 1.14% 109%
slow 2.45% 106% 1.25% 102%
slower 2.57% 104% 1.38% 105%

For improved perceived video quality, the rate control can be used in combination with the PQPA method introduced above. The MS-SSIMYUV BD-rate results for a combination of rate control and QPA over a fixed QP VVenC encoding with QPA are shown in Table VI. The results are similar to the results shown in Table V.

Table VI: MS-SSIMYUV BD-rate and relative encoding runtime for 1- and 2-pass rate control on HD4K sequences in comparison to a VVenC fixed QP encoding with QPA for all presets. Encoders were running with 8 threads and version v1.12.0.

HD4K
1-pass RC 2-pass RC
Preset MS-SSIMYUV
BD-rate vs.
Fixed QP
with QPA
Encoding
Time vs.
Fixed QP
with QPA
MS-SSIMYUV
BD-rate vs.
Fixed QP
with QPA
Encoding
Time vs.
Fixed QP
with QPA
faster 4.67% 116% 2.10% 124%
fast 5.04% 109% 2.72% 117%
medium 2.44% 111% 0.99% 114%
slow 2.48% 102% 1.34% 104%
slower 2.40% 96% 0.96% 101%

To test the rate control in conditions that correspond to the real-world use cases, experiments were conducted with the intra period size of approximately 4 seconds. The results for the rate control without and with PQPA are shown in Table VII and Table VIII. The results are similar to the results for the intra period length of 1 second in Table V and Table VI. This confirms the robustness of the rate control methods in VVenC.

Table VII: PSNRYUV BD-rate and relative encoding runtime for 1- and 2-pass rate control on HD4K sequences in comparison to a VVenC fixed QP encoding for all presets with intra period size 4 seconds. Encoders were running with 8 threads and version v1.12.0.

HD4K
1-pass RC 2-pass RC
Preset PSNRYUV
BD-rate vs.
Fixed QP
Encoding
Time vs.
Fixed QP
PSNRYUV
BD-rate vs.
Fixed QP
Encoding
Time vs.
Fixed QP
faster 0.60% 120% 0.18% 121%
fast 1.02% 108% 0.59% 116%
medium 1.27% 107% 0.96% 108%
slow 1.40% 105% 0.97% 102%
slower 1.48% 106% 1.07% 104%

The MS-SSIMYUV BD-rate results for a combination of rate control and QPA over a fixed QP VVenC encoding with QPA for 4 seconds intra period are shown in Table VIII. The results are similar to the results shown in Table VI.

Table VIII: MS-SSIMYUV BD-rate and relative encoding runtime for 1- and 2-pass rate control on HD4K sequences in comparison to a VVenC fixed QP encoding with QPA for all presets with intra period size 4 seconds. Encoders were running with 8 threads and version v1.12.0.

HD4K
1-pass RC 2-pass RC
Preset MS-SSIMYUV
BD-rate vs.
Fixed QP
with QPA
Encoding
Time vs.
Fixed QP
with QPA
MS-SSIMYUV
BD-rate vs.
Fixed QP
with QPA
Encoding
Time vs.
Fixed QP
with QPA
faster 2.87% 110% 3.05% 120%
fast 3.39% 100% 2.51% 113%
medium 1.66% 106% 1.28% 110%
slow 1.78% 105% 1.19% 103%
slower 1.59% 107% 0.93% 105%

References

  • [10] F. Bossen, X. Li, V. Seregin, K. Sharman, and K. Sühring, “VTM and HM common test conditions and software reference configurations for SDR 4:2:0 10-bit video,” Doc. JVET-Y2010 of Joint Video Experts Team (JVET), Feb. 2022. [Online]. Available: https://www.jvet-experts.org/doc_end_user/current_document.php?id=11471
  • [11] G. Bjøntegaard, “Improvement of BD-PSNR Model,” Doc. VCEG-AI11 of ITU-T SG16/Q6, Berlin, Germany, July 2008. [Online]. Available: http://wftp3.itu.int/av-arch/video-site/0807_Ber/
  • [12] ITU-T and ISO/IEC JTC 1, Working practices using objective metrics for evaluation of video coding efficiency experiments, Technical Paper ITU-T HSTP-VID-WPOM and ISO/IEC DTR 23002-8, 2020.
  • [13] Z. Wang, E. Simoncelli, and A. C. Bovik, “Multi-Scale Structural Similarity for Image Quality Assessment,” in Proc. IEEE Asilomar Conf. Signals, Systems, and Comp., Pacific Grove, Nov. 2003.
  • [14] x265 software repository, version 3.4. Available online: https://github.com/videolan/x265/tree/Release_3.4
  • [15] C. R. Helmrich, S. Bosse, H. Schwarz, D. Marpe, and T. Wiegand, “A Study of the Extended Perceptually Weighted Peak Signal-to-Noise Ratio (XPSNR) for Video Com­pression with Different Resolutions and Bit Depths,” ITU Journal: ICT Dis­coveries – Special Issue: The Future of Video and Immersive Media, vol. 3, no. 1, May 2020. [Online]. Available: https://www.itu.int/en/journal/2020/001/Pages/08.aspx, https://github.com/fraunhoferhhi/xpsnr
  • [16] C. R. Helmrich, I. Zupancic, J. Brandenburg, V. George, A. Wieckowski, and B. Bross, “Visually Optimized Two-Pass Rate Control for Video Coding Using the Low-Complexity XPSNR Model”, in Proc. IEEE Int. Conf. Visual Communications and Image Processing (VCIP), Munich, Dec. 2021. DOI: 10.1109/VCIP53242.2021.9675364
  • [17] C. R. Helmrich, C. Bartnik, J. Brandenburg, V. George, T. Hinz, C. Lehmann, I. Zupancic, A. Wieckowski, B. Bross, and D. Marpe, “A Scene Change and Noise Aware Rate Control Method for VVenC, an Open VVC Encoder Implementation”, in Proc. IEEE Picture Coding Symposium (PCS), San Jose, Dec. 2022. DOI: 10.1109/PCS56426.2022.10018041
  • [18] C. R. Helmrich, C. Bartnik, J. Brandenburg, A. Wieckowski, B. Bross, and D. Marpe, “A Constrained Variable Bit Rate (CVBR) Algorithm for VVenC, an Open VVC Encoder Imple­mentation”, in Proc. IEEE International Conf. Visual Communications and Image Processing (VCIP), Jeju, Dec. 2023.
Clone this wiki locally