Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance faster_whisper Engine #128

Merged
merged 7 commits into from
Oct 14, 2023
Merged

Conversation

EvilFreelancer
Copy link
Contributor

@EvilFreelancer EvilFreelancer commented Sep 11, 2023

Hi, developers!

This pull request introduces the following improvements and fixes:

  • Updated NVIDIA CUDA Container: The version of the nvidia/cuda container has been updated from 11.7.0 to 11.7.1. This was necessitated as version 11.7.0 is no longer available.

  • Modifications in faster_whisper Engine:

    • The model_converter function within the engine has been enhanced. Previously, the function had hardcoded float16. This has now been replaced with a variable that defaults to float16, allowing for greater flexibility.
    • In core.py, the environment variable ASR_QUANTIZATION has been added. This allows users to specify various quantization levels including float16, int16, int8, int8_float16, and int4. However, it should be noted that there seems to be an issue with the int4 mode which may require further investigation.

Regarding issue #127, if there's an inherent bug in the original codebase, I would prefer to remove the MAPPING. Instead of:

if torch.cuda.is_available():
    model = WhisperModel(model_path, device="cuda", compute_type=MAPPING[model_quantization])
else:
    model = WhisperModel(model_path, device="cpu", compute_type="int8")
model_lock = Lock()

I suggest the following:

model = WhisperModel(model_path, device="cuda", compute_type=model_quantization)

PS. My PR will definitely conflict with #117.

@ayancey ayancey self-requested a review October 2, 2023 00:02
@ayancey
Copy link
Collaborator

ayancey commented Oct 4, 2023

The diff is kinda messed up for some Git reason I don't understand. Using this to compare to main after 1.2 release.

@EvilFreelancer
Copy link
Contributor Author

@ayancey hi! Thanks, yeah, seems codebase was updated. I'll fix conflicts.

@ayancey
Copy link
Collaborator

ayancey commented Oct 4, 2023

Here's my attempt at benchmarking and comparing between @ahmetoner's latest release (1.2.0) and this PR. I bumped the CUDA version on your fork so it would be an even comparison.

Tests done with 23 second audio file using small model and faster_whisper backend. GPU is NVIDIA GeForce GTX 1650.

Branch CUDA ver mem before trials GPU mem before trials mem after trials GPU mem after trials Avg time (seconds)
1.1.1 11.8.0 1.743GiB 1142MiB 2.782GiB 1452MiB 1.82
1.2 11.8.0 613.2MiB 1144MiB 1.606GiB 1454MiB 1.84
EvilFreelancer 11.8.0 1.75GiB 1142MiB 2.808GiB 1452MiB 1.93

I don't understand it, but it looks like Ahmet fixed the problem you referred to in #127. But I couldn't find any improvements in GPU or CPU memory.

@ayancey
Copy link
Collaborator

ayancey commented Oct 4, 2023

@ayancey hi! Thanks, yeah, seems codebase was updated. I'll fix conflicts.

So it sounds like the model conversion is no longer an issue, but we still want to let users choose their quantization?

@EvilFreelancer
Copy link
Contributor Author

So it sounds like the model conversion is no longer an issue, but we still want to let users choose their quantization?

I think it's a good idea, in any case we may keep float16 as default value. For example i use int8 version of large-v2 model for my projects.

@ayancey
Copy link
Collaborator

ayancey commented Oct 8, 2023

Awesome. Once you fix the conflicts I can bug Ahmet to merge this. I'm really excited to start playing around and comparing the different options.

@EvilFreelancer
Copy link
Contributor Author

@ayancey hi! I've updated my PR, you may try to add ASR_QUANTIZATION env option with one of modes described on this page: https://opennmt.net/CTranslate2/quantization.html

model_name = os.getenv("ASR_MODEL", "base")
model_path = os.getenv("ASR_MODEL_PATH", os.path.join(os.path.expanduser("~"), ".cache", "whisper"))
model_path = os.path.join(cache_path, model_name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has the potential to impact backward compatibility and may result in additional model downloads. Could you please revert to the previous lines? Once that's done, I'll proceed with the merge.
Thank you for your contribution.

Copy link
Contributor Author

@EvilFreelancer EvilFreelancer Oct 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, ill fix it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ahmetoner ahmetoner merged commit 51c6ece into ahmetoner:main Oct 14, 2023
@ayancey
Copy link
Collaborator

ayancey commented Oct 14, 2023

👏 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants