RDO Postprocess #460

YunHsiao · 2024-03-28T06:01:24Z

This is one possible implementation for #361, which is based on Richard Geldreich's original ert implementation (also subsequently used in KTX2 implementation for UASTC), with some interface tweaks that makes the solver more general and performant to integrate with.

YunHsiao · 2024-03-28T06:01:31Z

I've reached a point where I think most of the work is solid, but I'd love some second opinions:

The whole process is implemented as a separate step with an independent ParallelManager
Optimization is disabled by default, with CLI interfaces closely resembles the relevant parameters in KTX2
The diff function for trial blocks are routed to existing specialized ones for different planes & partitions, with a new one added for the constant symbolic block case
HDR, swizzle, different weights should be working but I haven't looked much into it yet
By default the extra texels in edge blocks are just value-clamped at image borders and treated as normal blocks, which may unnecessarily increase the errors since trial blocks may differ in those 'non-existing' texels. But for simplicity I kept this behavior, otherwise will have to change all 3 compute_symbolic_block_difference functions just to handle this case, adding more involved masking operations (see compute_block_mse).

solidpixel · 2024-04-09T20:42:29Z

Have you looked at image quality to bitrate trade offs, and how this compares to just using a larger ASTC block size to reduce bitrate?

MarkCallow · 2024-04-11T03:36:49Z

Have you looked at image quality to bitrate trade offs, and how this compares to just using a larger ASTC block size to reduce bitrate?

I'm not sure I understand the question. RDO is all about findign a good balance between quality (distortion) and (bit)rate. The intent is to reduce entropy between blocks to reduce the size after Deflate-style lossless compression. I wonder if there is a point at which a larger block size produces a smaller result than RDO+Deflate. It would be good to know. But keep in mind that the purpose of RDO+Deflate is to reduce file and transmission size. It does not affect the GPU memory used.

solidpixel · 2024-04-11T07:24:38Z

the purpose of RDO+Deflate is to reduce file and transmission size.

Yes, but as you noted ASTC can do the same thing natively (just use a larger block size) which gives file, transmission, AND memory savings. There isn't much point adding this if it ends up the wrong side of the pareto frontier vs just using a lower ASTC bitrate in the compression.

Ignoring that, I'd still like the PR to include some analysis of the new feature and quality/bitrate which can be added to the documentation so we can explain when it should be used, and how to use it effectively.

MarkCallow · 2024-04-11T09:19:46Z

pareto frontier

What is this?

I'd still like the PR to include some analysis of the new feature and quality/bitrate which can be added to the documentation so we can explain when it should be used, and how to use it effectively.

Absolutely. From that we can determine if it is useful. We need to compare the zstd deflated file with and without RDO to determine its effectiveness.

solidpixel · 2024-04-11T09:50:24Z

pareto frontier
What is this?

Scatter plot whatever pair of metrics you care about for a compressor (quality-performance for lossy compressors, quality-size for RDO, size-performance for lossless compressors). The pareto frontier is the line made through the best compressor/configuration for a given quality or performance setting.

For a new optimisation to be useful it needs to be above the frontier - e.g. giving better performance for the same quality, or visa versa. If you're behind the frontier, then there isn't any point - better options already exist.

YunHsiao · 2024-06-07T08:20:42Z

Okay finally had some time to get to this... Although I wasn't able to make a full performance report, some simple test can still provide useful insights:

For a 2048x2048 test image, the following script is run:

./astcenc-avx2.exe -tl test.jpg 6x6.png 6x6 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 6x6r.png 6x6 -thorough -dimage -rdo
7z a 6x6.7z 6x6.astc
7z a 6x6r.7z 6x6r.astc

./astcenc-avx2.exe -tl test.jpg 8x6.png 8x6 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 8x6r.png 8x6 -thorough -dimage -rdo
7z a 8x6.7z 8x6.astc
7z a 8x6r.7z 8x6r.astc

./astcenc-avx2.exe -tl test.jpg 8x8.png 8x8 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 8x8r.png 8x8 -thorough -dimage -rdo
7z a 8x8.7z 8x8.astc
7z a 8x8r.7z 8x8r.astc

./astcenc-avx2.exe -tl test.jpg 10x10.png 10x10 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 10x10r.png 10x10 -thorough -dimage -rdo
7z a 10x10.7z 10x10.astc
7z a 10x10r.7z 10x10r.astc

./astcenc-avx2.exe -tl test.jpg 12x12.png 12x12 -thorough -dimage
./astcenc-avx2.exe -tl test.jpg 12x12r.png 12x12 -thorough -dimage -rdo
7z a 12x12.7z 12x12.astc
7z a 12x12r.7z 12x12r.astc

The key results are as follows:

Block Size	Non-RDO PSNR	RDO PSNR	Non-RDO Zip Size	RDO Zip Size
6x6	36.1594	34.7948	3491	2711
8x6	34.2204	33.3474	2552	2107
8x8	32.5107	31.9004	1919	1581
10x10	29.8623	29.5536	1253	1032
12x12	28.1800	27.9452	864	695

So I guess it may not be some ground-breaking optimization:

Like @solidpixel questioned, 8x6 seems like a better option than 6x6-rdo considering RDO optimizes on transmission size only and is usually a time-consuming process.

But in many cases it still may provide interesting trade offs:

RDO fills the gap between neighboring block sizes (e.g. 8x6-rdo metrics sits comfortably right between 8x6 & 8x8 and adjusting compression quality per-se wouldn't make this much of a difference)
We can still push some boundaries on top of 12x12
In our own use cases, comparing outputs with near-identical PSNRs (e.g. 8x6 vs. 6x6-rdo), we tend to feel the RDO ones almost always yield better results (because of the smaller block size, perhaps), but of course this is a subjective matter.

solidpixel · 2024-06-07T12:20:43Z

Super, thanks. I'll try and take a look next week.

sindney · 2024-07-29T06:43:15Z

the purpose of RDO+Deflate is to reduce file and transmission size.

Yes, but as you noted ASTC can do the same thing natively (just use a larger block size) which gives file, transmission, AND memory savings. There isn't much point adding this if it ends up the wrong side of the pareto frontier vs just using a lower ASTC bitrate in the compression.

Ignoring that, I'd still like the PR to include some analysis of the new feature and quality/bitrate which can be added to the documentation so we can explain when it should be used, and how to use it effectively.

We use smaller asymmetrical blocksize like (8x5\8x6\10x8) to compress specular\normalmaps\etc and using an auto blocksize calcaulation method to use up to 12x12 settings over origional one if PSNR is over certain point to compress diffuse like textures. And acheived similar optimization effect than 6x6/8x8 with RDO in large scale projects in terms of package size(ie compressed astc file).

You're correct this pr needs more data, but with some basic data, the potencial is not bad:

Okay finally had some time to get to this... Although I wasn't able to make a full performance report, some simple test can still provide useful insights:

i think its worth it to investage further about RDO support, not only because yunhsiao's point, but also on mobile platforms, we sometimes pack astc textures by block in runtime to support some kind of texture packing without realtime encoding\decoding of astc format, which requires input texture's block size to be the same, which limits our case. And with RDO we can fine tune the quality and compression rate by selecting different boundary PSNR, to make some extra room in terms of package sizes.

btw, apart from how well it affects package size, the encoding speed is currentlly a big problem ...

solidpixel · 2024-07-29T08:12:56Z

btw, apart from how well it affects package size, the encoding speed is currentlly a big problem ...

Agree. The original BC implementation has the same problem because it's a somewhat brute force approach to finding RDO matches. Suspect it's too slow to be usable at scale, given the marginal benefits over just choosing other ASTC block sizes.

YunHsiao · 2024-08-30T09:36:36Z

Alright, like I said in the beginning this is only one possible approach to do this, it is not ideal but to some extent it perfectly works. I came for possible advices on the implementation and as far as it goes, it stood up to our production usages. @solidpixel Feel free to close this if this approach is not expected to be merged.

MarkCallow · 2024-08-30T10:18:21Z

Please explain why/how it "perfectly works" and, if possible, give some more comparisons between this and choosing other block sizes.

YunHsiao · 2024-09-03T08:12:57Z

It perfectly works bcs any invalid trial blocks are rejected aggressively, or at least I tried my best to do so.

In practice, on top of a reasonable block-size selection scheme, if the transmission size out-benefits the cook time required, then it is a perfectly valid optimization to consider. For our unreal project, it can cut down ~500M package size with carefully selected compression parameters, with no obvious visual impacts. Also with DDC the build is always incremental, so overall not that expensive.

solidpixel · 2024-09-03T10:53:52Z

@solidpixel Feel free to close this if this approach is not expected to be merged.

At the moment, I don't have any work schedule for astcenc, so I'm really only merging essentials for platform support, compiler support, and bug fixes. I'd like to add RDO support, but I'm just wary of adding something new to test and maintain at the moment.

YunHsiao mentioned this pull request Mar 28, 2024

Add RDO support to the ASTC encoder. KhronosGroup/KTX-Software#588

Open

YunHsiao force-pushed the main branch from e57cf3c to 0278b83 Compare March 30, 2024 05:29

YunHsiao force-pushed the main branch from f09cf0a to 50d4217 Compare May 23, 2024 04:24

YunHsiao added 16 commits June 8, 2024 09:53

rate-distortion optimization

9b4306d

Fix debug asserts

110810f

proper swizzle handling

a523f76

support multiple slices

79cd6ac

license

00e4d02

ert refactor

923dcfb

fix linux build

0999cf1

handle border pixels in non-specialized diff path

9eddb96

minor

94cbb73

quality & dict size presets

fb82b85

handle more corner cases

0343ce0

customize partitions

47c99ba

lookback param in blocks

936ac2d

rdo-partitions precedes no-multithreading

70b76c2

fix alpha weight

c860cc3

minor

92a2784

YunHsiao force-pushed the main branch from 291c242 to 92a2784 Compare June 8, 2024 01:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDO Postprocess #460

RDO Postprocess #460

YunHsiao commented Mar 28, 2024

YunHsiao commented Mar 28, 2024 •

edited

Loading

solidpixel commented Apr 9, 2024

MarkCallow commented Apr 11, 2024

solidpixel commented Apr 11, 2024 •

edited

Loading

MarkCallow commented Apr 11, 2024

solidpixel commented Apr 11, 2024 •

edited

Loading

YunHsiao commented Jun 7, 2024 •

edited

Loading

solidpixel commented Jun 7, 2024

sindney commented Jul 29, 2024

solidpixel commented Jul 29, 2024

YunHsiao commented Aug 30, 2024

MarkCallow commented Aug 30, 2024

YunHsiao commented Sep 3, 2024 •

edited

Loading

solidpixel commented Sep 3, 2024

RDO Postprocess #460

Are you sure you want to change the base?

RDO Postprocess #460

Conversation

YunHsiao commented Mar 28, 2024

YunHsiao commented Mar 28, 2024 • edited Loading

solidpixel commented Apr 9, 2024

MarkCallow commented Apr 11, 2024

solidpixel commented Apr 11, 2024 • edited Loading

MarkCallow commented Apr 11, 2024

solidpixel commented Apr 11, 2024 • edited Loading

YunHsiao commented Jun 7, 2024 • edited Loading

solidpixel commented Jun 7, 2024

sindney commented Jul 29, 2024

solidpixel commented Jul 29, 2024

YunHsiao commented Aug 30, 2024

MarkCallow commented Aug 30, 2024

YunHsiao commented Sep 3, 2024 • edited Loading

solidpixel commented Sep 3, 2024

YunHsiao commented Mar 28, 2024 •

edited

Loading

solidpixel commented Apr 11, 2024 •

edited

Loading

solidpixel commented Apr 11, 2024 •

edited

Loading

YunHsiao commented Jun 7, 2024 •

edited

Loading

YunHsiao commented Sep 3, 2024 •

edited

Loading