Quantization Fixes #35

Satrat · 2024-04-25T13:49:40Z

A few change so we match pytorch quantization more closely:

Adding averaging constant and renaming MinMaxObserver to MovingAverageMinMaxObserver
using torch.aminmax to calculate mins and maxes in observer. (This fixed some errors we had against the pytorch quantization, I'm really not sure why)
slight implementation changes in scale/zp calculation

Before these changes we were consistently seeing a 10-15% drop in perplexity compared to pytorch quantization. Now we are matching them within 2% and aren't consistently worse. We still don't have exact matching of scale/zeropoint calculations but I'm chalking that up to C++ vs Python differences (some of the pytorch quantization is implemented in c++)

perplexity test is in: neuralmagic/sparseml#2246

dbogunowicz · 2024-04-25T13:52:07Z

src/compressed_tensors/quantization/observers/min_max.py

        :param observed: observed tensor to calculate quantization parameters for
        :return: tuple of scale and zero point derived from the observed tensor
        """

-        min_val = torch.tensor([observed.min()])
-        max_val = torch.tensor([observed.max()])
+        min_val, max_val = torch.aminmax(observed)


I wonder why this makes a difference

Satrat added 5 commits April 23, 2024 20:08

initial fix

6d0ad7b

fix for counter

2abe23f

using aminmax to fix discrepencies

fc4f182

minor improvements

3cd992e

remove prints

377d662

Satrat requested review from horheynm, bfineran and dbogunowicz April 25, 2024 13:49

dbogunowicz approved these changes Apr 25, 2024

View reviewed changes

bfineran approved these changes Apr 25, 2024

View reviewed changes

bfineran merged commit b3edb25 into main Apr 25, 2024
2 checks passed

bfineran deleted the sa/observer_fix branch April 25, 2024 13:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization Fixes #35

Quantization Fixes #35

Satrat commented Apr 25, 2024 •

edited

Loading

dbogunowicz Apr 25, 2024

Quantization Fixes #35

Quantization Fixes #35

Conversation

Satrat commented Apr 25, 2024 • edited Loading

dbogunowicz Apr 25, 2024

Choose a reason for hiding this comment

Satrat commented Apr 25, 2024 •

edited

Loading