-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update convert_and_optimize_asr.py #1659
Conversation
Update convert_and_optimize_asr.py with quantization code of whisper model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @zhuo-yoyowz. Did you test it in the app? Will it work with Optimum Intel (app uses HF interface)?
Is it possible to quantize it using OVQuantizer from Optimum Intel?
Hi Adrian, I've tested it in app.py. Without changing any code in app.py, the current pipeline could also load and compile the quantized model successfully. Haven't done testing the OVQuantizer from Optimum Intel. I'm still a bit uncertain about how to se the configuration for setting up the calibration dataset and defining the preprocess function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work for you? If I use it in the app I don't get a meaningful transcription. Just a random word.
decoder_calibration_data) | ||
|
||
calibration_dataset = load_dataset("librispeech_asr", "clean", split="validation", streaming=True) | ||
for sample in tqdm(islice(calibration_dataset, calibration_dataset_size), desc="Collecting calibration data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tqdm is causing some errors for me (expecting a notebook?)
if not output_dir.exists(): | ||
ov_model = OVModelForSpeechSeq2Seq.from_pretrained( | ||
MODEL_NAME, ov_config=ov_config, export=True, compile=False, load_in_8bit=False | ||
) | ||
ov_model.half() | ||
ov_model.save_pretrained(output_dir) | ||
else: | ||
ov_model = OVModelForSpeechSeq2Seq.from_pretrained( | ||
output_dir, ov_config=ov_config, compile=False | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't check if the model is converted. I don't assume one will convert to FP16 first and then to INT8.
|
||
CALIBRATION_DATASET_SIZE = 50 | ||
quantized_distil_model_path = model_dir / (MODEL_NAME.rsplit ("/")[-1] + "-INT8") | ||
ov_model.to("AUTO") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why AUTO device here? Shouldn't be CPU or nothing?
Is compilation needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Adrian, replaced codes with updated version, using Optimum-Intel for weights compression directly. Please help review. Thanks~
Good job! |
Update convert_and_optimize_asr.py with quantization code of whisper model