Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update convert_and_optimize_asr.py #1659

Merged
merged 2 commits into from
Feb 7, 2024
Merged

Conversation

zhuo-yoyowz
Copy link
Contributor

Update convert_and_optimize_asr.py with quantization code of whisper model

Update convert_and_optimize_asr.py with quantization code of whisper model
Copy link
Contributor

@adrianboguszewski adrianboguszewski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @zhuo-yoyowz. Did you test it in the app? Will it work with Optimum Intel (app uses HF interface)?

Is it possible to quantize it using OVQuantizer from Optimum Intel?

@zhuo-yoyowz
Copy link
Contributor Author

Thanks, @zhuo-yoyowz. Did you test it in the app? Will it work with Optimum Intel (app uses HF interface)?

Is it possible to quantize it using OVQuantizer from Optimum Intel?

Hi Adrian, I've tested it in app.py. Without changing any code in app.py, the current pipeline could also load and compile the quantized model successfully. Haven't done testing the OVQuantizer from Optimum Intel. I'm still a bit uncertain about how to se the configuration for setting up the calibration dataset and defining the preprocess function.

Copy link
Contributor

@adrianboguszewski adrianboguszewski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work for you? If I use it in the app I don't get a meaningful transcription. Just a random word.

decoder_calibration_data)

calibration_dataset = load_dataset("librispeech_asr", "clean", split="validation", streaming=True)
for sample in tqdm(islice(calibration_dataset, calibration_dataset_size), desc="Collecting calibration data",
Copy link
Contributor

@adrianboguszewski adrianboguszewski Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tqdm is causing some errors for me (expecting a notebook?)

Comment on lines 150 to 160
if not output_dir.exists():
ov_model = OVModelForSpeechSeq2Seq.from_pretrained(
MODEL_NAME, ov_config=ov_config, export=True, compile=False, load_in_8bit=False
)
ov_model.half()
ov_model.save_pretrained(output_dir)
else:
ov_model = OVModelForSpeechSeq2Seq.from_pretrained(
output_dir, ov_config=ov_config, compile=False
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't check if the model is converted. I don't assume one will convert to FP16 first and then to INT8.


CALIBRATION_DATASET_SIZE = 50
quantized_distil_model_path = model_dir / (MODEL_NAME.rsplit ("/")[-1] + "-INT8")
ov_model.to("AUTO")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why AUTO device here? Shouldn't be CPU or nothing?
Is compilation needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Adrian, replaced codes with updated version, using Optimum-Intel for weights compression directly. Please help review. Thanks~

@adrianboguszewski
Copy link
Contributor

Good job!

@adrianboguszewski adrianboguszewski merged commit 9cf8836 into recipes Feb 7, 2024
1 check passed
@adrianboguszewski adrianboguszewski deleted the zhuo-yoyowz-patch-1 branch February 7, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants