Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDO Postprocess #460

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from
Draft

RDO Postprocess #460

wants to merge 16 commits into from

Conversation

YunHsiao
Copy link

This is one possible implementation for #361, which is based on Richard Geldreich's original ert implementation (also subsequently used in KTX2 implementation for UASTC), with some interface tweaks that makes the solver more general and performant to integrate with.

@YunHsiao
Copy link
Author

YunHsiao commented Mar 28, 2024

I've reached a point where I think most of the work is solid, but I'd love some second opinions:

  • The whole process is implemented as a separate step with an independent ParallelManager
  • Optimization is disabled by default, with CLI interfaces closely resembles the relevant parameters in KTX2
  • The diff function for trial blocks are routed to existing specialized ones for different planes & partitions, with a new one added for the constant symbolic block case
  • HDR, swizzle, different weights should be working but I haven't looked much into it yet
  • By default the extra texels in edge blocks are just value-clamped at image borders and treated as normal blocks, which may unnecessarily increase the errors since trial blocks may differ in those 'non-existing' texels. But for simplicity I kept this behavior, otherwise will have to change all 3 compute_symbolic_block_difference functions just to handle this case, adding more involved masking operations (see compute_block_mse).

@solidpixel
Copy link
Contributor

Have you looked at image quality to bitrate trade offs, and how this compares to just using a larger ASTC block size to reduce bitrate?

@MarkCallow
Copy link
Contributor

Have you looked at image quality to bitrate trade offs, and how this compares to just using a larger ASTC block size to reduce bitrate?

I'm not sure I understand the question. RDO is all about findign a good balance between quality (distortion) and (bit)rate. The intent is to reduce entropy between blocks to reduce the size after Deflate-style lossless compression. I wonder if there is a point at which a larger block size produces a smaller result than RDO+Deflate. It would be good to know. But keep in mind that the purpose of RDO+Deflate is to reduce file and transmission size. It does not affect the GPU memory used.

@solidpixel
Copy link
Contributor

solidpixel commented Apr 11, 2024

the purpose of RDO+Deflate is to reduce file and transmission size.

Yes, but as you noted ASTC can do the same thing natively (just use a larger block size) which gives file, transmission, AND memory savings. There isn't much point adding this if it ends up the wrong side of the pareto frontier vs just using a lower ASTC bitrate in the compression.

Ignoring that, I'd still like the PR to include some analysis of the new feature and quality/bitrate which can be added to the documentation so we can explain when it should be used, and how to use it effectively.

@MarkCallow
Copy link
Contributor

pareto frontier

What is this?

I'd still like the PR to include some analysis of the new feature and quality/bitrate which can be added to the documentation so we can explain when it should be used, and how to use it effectively.

Absolutely. From that we can determine if it is useful. We need to compare the zstd deflated file with and without RDO to determine its effectiveness.

@solidpixel
Copy link
Contributor

solidpixel commented Apr 11, 2024

pareto frontier
What is this?

Scatter plot whatever pair of metrics you care about for a compressor (quality-performance for lossy compressors, quality-size for RDO, size-performance for lossless compressors). The pareto frontier is the line made through the best compressor/configuration for a given quality or performance setting.

For a new optimisation to be useful it needs to be above the frontier - e.g. giving better performance for the same quality, or visa versa. If you're behind the frontier, then there isn't any point - better options already exist.

@YunHsiao
Copy link
Author

YunHsiao commented Jun 7, 2024

Okay finally had some time to get to this... Although I wasn't able to make a full performance report, some simple test can still provide useful insights:

For a 2048x2048 test image, the following script is run:

./astcenc-avx2.exe -tl test.jpg 6x6.png 6x6 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 6x6r.png 6x6 -thorough -dimage -rdo
7z a 6x6.7z 6x6.astc
7z a 6x6r.7z 6x6r.astc

./astcenc-avx2.exe -tl test.jpg 8x6.png 8x6 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 8x6r.png 8x6 -thorough -dimage -rdo
7z a 8x6.7z 8x6.astc
7z a 8x6r.7z 8x6r.astc

./astcenc-avx2.exe -tl test.jpg 8x8.png 8x8 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 8x8r.png 8x8 -thorough -dimage -rdo
7z a 8x8.7z 8x8.astc
7z a 8x8r.7z 8x8r.astc

./astcenc-avx2.exe -tl test.jpg 10x10.png 10x10 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 10x10r.png 10x10 -thorough -dimage -rdo
7z a 10x10.7z 10x10.astc
7z a 10x10r.7z 10x10r.astc

./astcenc-avx2.exe -tl test.jpg 12x12.png 12x12 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 12x12r.png 12x12 -thorough -dimage -rdo
7z a 12x12.7z 12x12.astc
7z a 12x12r.7z 12x12r.astc

The key results are as follows:

Block Size Non-RDO PSNR RDO PSNR Non-RDO Zip Size RDO Zip Size
6x6 36.1594 34.7948 3491 2711
8x6 34.2204 33.3474 2552 2107
8x8 32.5107 31.9004 1919 1581
10x10 29.8623 29.5536 1253 1032
12x12 28.1800 27.9452 864 695

So I guess it may not be some ground-breaking optimization:

  • Like @solidpixel questioned, 8x6 seems like a better option than 6x6-rdo considering RDO optimizes on transmission size only and is usually a time-consuming process.

But in many cases it still may provide interesting trade offs:

  • RDO fills the gap between neighboring block sizes (e.g. 8x6-rdo metrics sits comfortably right between 8x6 & 8x8 and adjusting compression quality per-se wouldn't make this much of a difference)
  • We can still push some boundaries on top of 12x12
  • In our own use cases, comparing outputs with near-identical PSNRs (e.g. 8x6 vs. 6x6-rdo), we tend to feel the RDO ones almost always yield better results (because of the smaller block size, perhaps), but of course this is a subjective matter.

@solidpixel
Copy link
Contributor

Super, thanks. I'll try and take a look next week.

@sindney
Copy link

sindney commented Jul 29, 2024

the purpose of RDO+Deflate is to reduce file and transmission size.

Yes, but as you noted ASTC can do the same thing natively (just use a larger block size) which gives file, transmission, AND memory savings. There isn't much point adding this if it ends up the wrong side of the pareto frontier vs just using a lower ASTC bitrate in the compression.

Ignoring that, I'd still like the PR to include some analysis of the new feature and quality/bitrate which can be added to the documentation so we can explain when it should be used, and how to use it effectively.

We use smaller asymmetrical blocksize like (8x5\8x6\10x8) to compress specular\normalmaps\etc and using an auto blocksize calcaulation method to use up to 12x12 settings over origional one if PSNR is over certain point to compress diffuse like textures. And acheived similar optimization effect than 6x6/8x8 with RDO in large scale projects in terms of package size(ie compressed astc file).

You're correct this pr needs more data, but with some basic data, the potencial is not bad:

Okay finally had some time to get to this... Although I wasn't able to make a full performance report, some simple test can still provide useful insights:

i think its worth it to investage further about RDO support, not only because yunhsiao's point, but also on mobile platforms, we sometimes pack astc textures by block in runtime to support some kind of texture packing without realtime encoding\decoding of astc format, which requires input texture's block size to be the same, which limits our case. And with RDO we can fine tune the quality and compression rate by selecting different boundary PSNR, to make some extra room in terms of package sizes.

btw, apart from how well it affects package size, the encoding speed is currentlly a big problem ...

@solidpixel
Copy link
Contributor

btw, apart from how well it affects package size, the encoding speed is currentlly a big problem ...

Agree. The original BC implementation has the same problem because it's a somewhat brute force approach to finding RDO matches. Suspect it's too slow to be usable at scale, given the marginal benefits over just choosing other ASTC block sizes.

@YunHsiao
Copy link
Author

Alright, like I said in the beginning this is only one possible approach to do this, it is not ideal but to some extent it perfectly works. I came for possible advices on the implementation and as far as it goes, it stood up to our production usages. @solidpixel Feel free to close this if this approach is not expected to be merged.

@MarkCallow
Copy link
Contributor

Please explain why/how it "perfectly works" and, if possible, give some more comparisons between this and choosing other block sizes.

@YunHsiao
Copy link
Author

YunHsiao commented Sep 3, 2024

It perfectly works bcs any invalid trial blocks are rejected aggressively, or at least I tried my best to do so.

In practice, on top of a reasonable block-size selection scheme, if the transmission size out-benefits the cook time required, then it is a perfectly valid optimization to consider. For our unreal project, it can cut down ~500M package size with carefully selected compression parameters, with no obvious visual impacts. Also with DDC the build is always incremental, so overall not that expensive.

@solidpixel
Copy link
Contributor

@solidpixel Feel free to close this if this approach is not expected to be merged.

At the moment, I don't have any work schedule for astcenc, so I'm really only merging essentials for platform support, compiler support, and bug fixes. I'd like to add RDO support, but I'm just wary of adding something new to test and maintain at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants