Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Observers] group size + channel wise + per token #32

Merged
merged 25 commits into from
May 3, 2024
Merged

Conversation

horheynm
Copy link
Member

@horheynm horheynm commented Apr 19, 2024

starter script

from copy import deepcopy

import torch
from compressed_tensors.quantization.lifecycle.calibration import set_module_for_calibration
from compressed_tensors.quantization.lifecycle.frozen import freeze_module_quantization
from compressed_tensors.quantization.lifecycle.initialize import (
    initialize_module_for_quantization,
)
from compressed_tensors.quantization.quant_config import QuantizationStatus
from compressed_tensors.quantization.quant_args import QuantizationArgs
from torch.nn import Linear

from compressed_tensors.quantization.quant_scheme import QuantizationScheme

num_bits = 8

quantization_scheme = QuantizationScheme(
    # input_activations=QuantizationArgs(num_bits=num_bits, symmetric=False, group_size = 4), # ADJUST GROUPSIZE HERE FOR CHANNEL WISE, GROUP WISE
    input_activations=QuantizationArgs(num_bits=num_bits, symmetric=False, group_size = -1),
    
    weights=QuantizationArgs(num_bits=num_bits, symmetric=True),
    targets=["*"],
)

layer = Linear(4, 4)
layer.weight.data *= 100

# over write forward pass and register zero_point and scale
initialize_module_for_quantization(layer, quantization_scheme)

set_module_for_calibration(layer)

layer(torch.randn(2, 4, 4))

initalized_layer = deepcopy(layer)

# calibrate the layers with each iteration
for _ in range(10):
    layer(torch.randn(4, 4))

# Freeze, no update after any forward pass
freeze_module_quantization(layer)
for _ in range(10):
    layer(torch.randn(4, 4))

bfineran
bfineran previously approved these changes Apr 25, 2024
@horheynm horheynm changed the title group size [Observers] group size + channel wise quantization Apr 25, 2024
Copy link
Contributor

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I see missing here is that we aren't actually using the strategy field of QuantizationArgs. It makes sense to support group_size=-1 as channelwise but I think the code would be more readable if instead of checking for group size we could just check for the QuantizationArgs.strategy enum. This would make it easier to extend when we add the token strategy too.

Maybe we could add a validator to QuantizationArgs so if the user specifies group_size=-1 we automatically set channel as the strategy

src/compressed_tensors/quantization/lifecycle/forward.py Outdated Show resolved Hide resolved
src/compressed_tensors/quantization/observers/base.py Outdated Show resolved Hide resolved
@horheynm horheynm changed the title [Observers] group size + channel wise quantization [Observers] group size + channel wise + per token Apr 29, 2024
Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also add at least a simple test for each strategy that validates a forward pass runs and scales/zero points have the expected shape

bfineran
bfineran previously approved these changes May 2, 2024
Copy link
Contributor

@Satrat Satrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the test failures are fixed!

@Satrat Satrat merged commit 05c1487 into main May 3, 2024
2 checks passed
@Satrat Satrat deleted the group-size branch May 3, 2024 17:59
@rahul-tuli rahul-tuli restored the group-size branch May 6, 2024 15:17
@bfineran bfineran deleted the group-size branch May 8, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants