-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDO Postprocess #460
base: main
Are you sure you want to change the base?
RDO Postprocess #460
Conversation
I've reached a point where I think most of the work is solid, but I'd love some second opinions:
|
Have you looked at image quality to bitrate trade offs, and how this compares to just using a larger ASTC block size to reduce bitrate? |
I'm not sure I understand the question. RDO is all about findign a good balance between quality (distortion) and (bit)rate. The intent is to reduce entropy between blocks to reduce the size after Deflate-style lossless compression. I wonder if there is a point at which a larger block size produces a smaller result than RDO+Deflate. It would be good to know. But keep in mind that the purpose of RDO+Deflate is to reduce file and transmission size. It does not affect the GPU memory used. |
Yes, but as you noted ASTC can do the same thing natively (just use a larger block size) which gives file, transmission, AND memory savings. There isn't much point adding this if it ends up the wrong side of the pareto frontier vs just using a lower ASTC bitrate in the compression. Ignoring that, I'd still like the PR to include some analysis of the new feature and quality/bitrate which can be added to the documentation so we can explain when it should be used, and how to use it effectively. |
What is this?
Absolutely. From that we can determine if it is useful. We need to compare the zstd deflated file with and without RDO to determine its effectiveness. |
Scatter plot whatever pair of metrics you care about for a compressor (quality-performance for lossy compressors, quality-size for RDO, size-performance for lossless compressors). The pareto frontier is the line made through the best compressor/configuration for a given quality or performance setting. For a new optimisation to be useful it needs to be above the frontier - e.g. giving better performance for the same quality, or visa versa. If you're behind the frontier, then there isn't any point - better options already exist. |
Okay finally had some time to get to this... Although I wasn't able to make a full performance report, some simple test can still provide useful insights: For a 2048x2048 test image, the following script is run: ./astcenc-avx2.exe -tl test.jpg 6x6.png 6x6 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 6x6r.png 6x6 -thorough -dimage -rdo
7z a 6x6.7z 6x6.astc
7z a 6x6r.7z 6x6r.astc
./astcenc-avx2.exe -tl test.jpg 8x6.png 8x6 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 8x6r.png 8x6 -thorough -dimage -rdo
7z a 8x6.7z 8x6.astc
7z a 8x6r.7z 8x6r.astc
./astcenc-avx2.exe -tl test.jpg 8x8.png 8x8 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 8x8r.png 8x8 -thorough -dimage -rdo
7z a 8x8.7z 8x8.astc
7z a 8x8r.7z 8x8r.astc
./astcenc-avx2.exe -tl test.jpg 10x10.png 10x10 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 10x10r.png 10x10 -thorough -dimage -rdo
7z a 10x10.7z 10x10.astc
7z a 10x10r.7z 10x10r.astc
./astcenc-avx2.exe -tl test.jpg 12x12.png 12x12 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 12x12r.png 12x12 -thorough -dimage -rdo
7z a 12x12.7z 12x12.astc
7z a 12x12r.7z 12x12r.astc The key results are as follows:
So I guess it may not be some ground-breaking optimization:
But in many cases it still may provide interesting trade offs:
|
Super, thanks. I'll try and take a look next week. |
We use smaller asymmetrical blocksize like (8x5\8x6\10x8) to compress specular\normalmaps\etc and using an auto blocksize calcaulation method to use up to 12x12 settings over origional one if PSNR is over certain point to compress diffuse like textures. And acheived similar optimization effect than 6x6/8x8 with RDO in large scale projects in terms of package size(ie compressed astc file). You're correct this pr needs more data, but with some basic data, the potencial is not bad:
i think its worth it to investage further about RDO support, not only because yunhsiao's point, but also on mobile platforms, we sometimes pack astc textures by block in runtime to support some kind of texture packing without realtime encoding\decoding of astc format, which requires input texture's block size to be the same, which limits our case. And with RDO we can fine tune the quality and compression rate by selecting different boundary PSNR, to make some extra room in terms of package sizes. btw, apart from how well it affects package size, the encoding speed is currentlly a big problem ... |
Agree. The original BC implementation has the same problem because it's a somewhat brute force approach to finding RDO matches. Suspect it's too slow to be usable at scale, given the marginal benefits over just choosing other ASTC block sizes. |
Alright, like I said in the beginning this is only one possible approach to do this, it is not ideal but to some extent it perfectly works. I came for possible advices on the implementation and as far as it goes, it stood up to our production usages. @solidpixel Feel free to close this if this approach is not expected to be merged. |
Please explain why/how it "perfectly works" and, if possible, give some more comparisons between this and choosing other block sizes. |
It perfectly works bcs any invalid trial blocks are rejected aggressively, or at least I tried my best to do so. In practice, on top of a reasonable block-size selection scheme, if the transmission size out-benefits the cook time required, then it is a perfectly valid optimization to consider. For our unreal project, it can cut down ~500M package size with carefully selected compression parameters, with no obvious visual impacts. Also with DDC the build is always incremental, so overall not that expensive. |
At the moment, I don't have any work schedule for astcenc, so I'm really only merging essentials for platform support, compiler support, and bug fixes. I'd like to add RDO support, but I'm just wary of adding something new to test and maintain at the moment. |
This is one possible implementation for #361, which is based on Richard Geldreich's original ert implementation (also subsequently used in KTX2 implementation for UASTC), with some interface tweaks that makes the solver more general and performant to integrate with.