Scaling vs bias in quantization #9621

jubruckne · 2024-09-24T06:36:41Z

jubruckne
Sep 24, 2024

I‘ve been wondering if the bias in quantized blocks is maybe more important than the scale, since we have all these rms norms in the model anyway. Anecdotal evidence points to Q6_K (which only has a scale) sometimes being worse than Q5_K and Q4_K (which have bias too). Has anyone given this some thought?

Or, a more dynamic approach: what if during quantization we analyze the block values and then store just a single value per block, and use a bit to decide whether to apply it as a scale or bias depending on the distribution of values in the block?

ChatGPT helped me writing down the algorithm:

Proposal: Adaptive Scale/Bias Quantization Scheme

The goal is to analyze each block of values during quantization and determine whether applying a scale or a bias is more appropriate, adapting dynamically to the data distribution for better accuracy.

Block Analysis:
- For each block of values x, calculate:
  - x_min: the minimum value in the block.
  - x_max: the maximum value in the block.
  - mean: the average of the values in the block.
- Compute the range of the block:
  - range = x_max - x_min
- Calculate how centered the mean is relative to the block:
  - centeredness = absolute(mean - (x_min + x_max) / 2)
Decision Making:
- Compare centeredness against a chosen threshold T to decide whether to use a bias or a scale:
  - If centeredness is greater than T, the block values are not centered around zero, so a bias is more appropriate.
  - Otherwise, use a scale.
Storing the Scale/Bias:
- If using a bias, store the mean (mean) as a 16-bit float, and set the least significant bit (LSB) to 1 to indicate it's a bias.
- If using a scale, store the range (range) as a 16-bit float, and set the LSB to 0 to indicate it's a scale.
Quantization Process:
- When transforming each value x in the block:
  - If the LSB is 1 (indicating bias mode), quantize using:
    - x_quantized = (x - mean) / range
  - If the LSB is 0 (indicating scale mode), quantize using:
    - x_quantized = x / range

This method enables the quantization process to adaptively choose between scaling and biasing for each block, ensuring better handling of blocks where values aren't centered around zero. This adaptive approach can lead to improved quantization accuracy without a significant increase in storage overhead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling vs bias in quantization #9621

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Scaling vs bias in quantization #9621

jubruckne Sep 24, 2024

Proposal: Adaptive Scale/Bias Quantization Scheme

Replies: 0 comments

jubruckne
Sep 24, 2024