Enhance faster_whisper Engine #128

EvilFreelancer · 2023-09-11T21:56:45Z

Hi, developers!

This pull request introduces the following improvements and fixes:

Updated NVIDIA CUDA Container: The version of the nvidia/cuda container has been updated from 11.7.0 to 11.7.1. This was necessitated as version 11.7.0 is no longer available.
Modifications in faster_whisper Engine:
- The model_converter function within the engine has been enhanced. Previously, the function had hardcoded float16. This has now been replaced with a variable that defaults to float16, allowing for greater flexibility.
- In core.py, the environment variable ASR_QUANTIZATION has been added. This allows users to specify various quantization levels including float16, int16, int8, int8_float16, and int4. However, it should be noted that there seems to be an issue with the int4 mode which may require further investigation.

Regarding issue #127, if there's an inherent bug in the original codebase, I would prefer to remove the MAPPING. Instead of:

if torch.cuda.is_available():
    model = WhisperModel(model_path, device="cuda", compute_type=MAPPING[model_quantization])
else:
    model = WhisperModel(model_path, device="cpu", compute_type="int8")
model_lock = Lock()

I suggest the following:

model = WhisperModel(model_path, device="cuda", compute_type=model_quantization)

PS. My PR will definitely conflict with #117.

ayancey · 2023-10-04T00:44:15Z

The diff is kinda messed up for some Git reason I don't understand. Using this to compare to main after 1.2 release.

EvilFreelancer · 2023-10-04T19:49:02Z

@ayancey hi! Thanks, yeah, seems codebase was updated. I'll fix conflicts.

ayancey · 2023-10-04T19:55:32Z

Here's my attempt at benchmarking and comparing between @ahmetoner's latest release (1.2.0) and this PR. I bumped the CUDA version on your fork so it would be an even comparison.

Tests done with 23 second audio file using small model and faster_whisper backend. GPU is NVIDIA GeForce GTX 1650.

Branch	CUDA ver	mem before trials	GPU mem before trials	mem after trials	GPU mem after trials	Avg time (seconds)
1.1.1	11.8.0	1.743GiB	1142MiB	2.782GiB	1452MiB	1.82
1.2	11.8.0	613.2MiB	1144MiB	1.606GiB	1454MiB	1.84
EvilFreelancer	11.8.0	1.75GiB	1142MiB	2.808GiB	1452MiB	1.93

I don't understand it, but it looks like Ahmet fixed the problem you referred to in #127. But I couldn't find any improvements in GPU or CPU memory.

ayancey · 2023-10-04T19:56:48Z

@ayancey hi! Thanks, yeah, seems codebase was updated. I'll fix conflicts.

So it sounds like the model conversion is no longer an issue, but we still want to let users choose their quantization?

EvilFreelancer · 2023-10-08T20:43:53Z

So it sounds like the model conversion is no longer an issue, but we still want to let users choose their quantization?

I think it's a good idea, in any case we may keep float16 as default value. For example i use int8 version of large-v2 model for my projects.

ayancey · 2023-10-08T20:47:04Z

Awesome. Once you fix the conflicts I can bug Ahmet to merge this. I'm really excited to start playing around and comparing the different options.

EvilFreelancer · 2023-10-11T10:36:09Z

@ayancey hi! I've updated my PR, you may try to add ASR_QUANTIZATION env option with one of modes described on this page: https://opennmt.net/CTranslate2/quantization.html

ahmetoner · 2023-10-14T22:48:34Z

app/faster_whisper/core.py

 model_name = os.getenv("ASR_MODEL", "base")
-model_path = os.getenv("ASR_MODEL_PATH", os.path.join(os.path.expanduser("~"), ".cache", "whisper"))
+model_path = os.path.join(cache_path, model_name)


This change has the potential to impact backward compatibility and may result in additional model downloads. Could you please revert to the previous lines? Once that's done, I'll proceed with the merge.
Thank you for your contribution.

Okay, ill fix it

ayancey · 2023-10-14T23:47:42Z

👏 👏

EvilFreelancer added 3 commits September 10, 2023 21:08

Quantization support added

a145079

Small brushup

5d2025d

nvidia/cuda updated to 11.7.1

82c1cf8

ayancey self-requested a review October 2, 2023 00:02

Merge branch 'main' into main

42cd111

Typo fixed

2d2587b

ahmetoner reviewed Oct 14, 2023

View reviewed changes

EvilFreelancer added 2 commits October 15, 2023 02:21

Fix in ASR_MODEL_PATH of faster_whisper engine

bb713a4

Conflict fixed

c6b4128

ahmetoner approved these changes Oct 14, 2023

View reviewed changes

ahmetoner merged commit 51c6ece into ahmetoner:main Oct 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance faster_whisper Engine #128

Enhance faster_whisper Engine #128

EvilFreelancer commented Sep 11, 2023 •

edited

Loading

ayancey commented Oct 4, 2023

EvilFreelancer commented Oct 4, 2023

ayancey commented Oct 4, 2023

ayancey commented Oct 4, 2023

EvilFreelancer commented Oct 8, 2023

ayancey commented Oct 8, 2023 •

edited

Loading

EvilFreelancer commented Oct 11, 2023

ahmetoner Oct 14, 2023

EvilFreelancer Oct 14, 2023 •

edited

Loading

EvilFreelancer Oct 14, 2023

ayancey commented Oct 14, 2023

Enhance faster_whisper Engine #128

Enhance faster_whisper Engine #128

Conversation

EvilFreelancer commented Sep 11, 2023 • edited Loading

ayancey commented Oct 4, 2023

EvilFreelancer commented Oct 4, 2023

ayancey commented Oct 4, 2023

ayancey commented Oct 4, 2023

EvilFreelancer commented Oct 8, 2023

ayancey commented Oct 8, 2023 • edited Loading

EvilFreelancer commented Oct 11, 2023

ahmetoner Oct 14, 2023

Choose a reason for hiding this comment

EvilFreelancer Oct 14, 2023 • edited Loading

Choose a reason for hiding this comment

EvilFreelancer Oct 14, 2023

Choose a reason for hiding this comment

ayancey commented Oct 14, 2023

EvilFreelancer commented Sep 11, 2023 •

edited

Loading

ayancey commented Oct 8, 2023 •

edited

Loading

EvilFreelancer Oct 14, 2023 •

edited

Loading