Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switching between faster-whisper or openai-whisper via env seems to be broken #edit: most likely not, just not as fast as I hoped for #115

Closed
Deathproof76 opened this issue Jun 11, 2023 · 4 comments

Comments

@Deathproof76
Copy link

Deathproof76 commented Jun 11, 2023

I transcribed a 22 minute mp3 file with - ASR_ENGINE=openai_whisper via the webui, timed it and it took 2:15 min. I then changed the env to - ASR_ENGINE=faster_whisper, recreated the container and it took approximately the same 2:16 min. Tried another file with the same outcome. Switched to :debug, also the same outcome. Vram consumption is also much higher than expected for faster-whisper, but in line with openai-whisper.

services:
  whisper-asr-webservice:
    #image: onerahmet/openai-whisper-asr-webservice:debug-gpu
    image: onerahmet/openai-whisper-asr-webservice:latest-gpu
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
    container_name: Whisper-ASR
    environment:
      - ASR_ENGINE=openai_whisper
#      - ASR_ENGINE=faster_whisper
      - ASR_MODEL=small
    ports:
      - 9007:9000
    restart: unless-stopped

Would be glad if you could take a look at it @ahmetoner, as it seems to be broken.

Also described by other people morpheus65535/bazarr#2144

I don't know much, but it seems possible that the downloaded openai-whisper model isn't converted to the CTranslate2 model format. If that is the case, wouldn't it be possible to just download the model directly from https://huggingface.co/guillaumekln/faster-whisper-small (as an example)?

Or could it be that the wrong model is selected? Instead of loading the converted model, the openai-whisper model is loaded?

Another possibility could be that the settings differ between OpenAI-whisper and faster-whisper: Depending on the gpu fp16 could be a lot faster than fp32 openai/whisper#391 . Also beam_size SYSTRAN/faster-whisper#9 SYSTRAN/faster-whisper#172 and temperature SYSTRAN/faster-whisper#172
If these options are the reason for "underperforming" would it be possible to expose them as env variables for the docker container? (@ayancey)

@Deathproof76
Copy link
Author

Deathproof76 commented Jun 14, 2023

Okay, I tried some things. I modified the docker image and basically cut openai-whisper out, so that only the faster-whisper implementation was running. I then modified the core.py and utils.py to also deactivate the converter and than mounted a downloaded model from https://huggingface.co/guillaumekln/faster-whisper-small. And yeah, same performance.
I then played a little bit with the settings, modified compute type (float32 actually had the best performance for my gpu) added os.environ["OMP_NUM_THREADS"] = "12" (which most likely helps only with cpu) and also changed beam_size=5 to 1 and added best_of=1
segment_generator, info = model.transcribe(audio, beam_size=1, best_of=1, **options_dict)
all of that brought the time down to 1:30 min from 2:15 for the same 21 min mp3 on my rtx 3060. The small models vram usage was in line with faster-whisper float32 with 1430mb. But the quality has most likely degraded (Beam_size=5 is the recommendation), though I haven't noticed so far.

So yeah, Idk, maybe it's as fast as it can be. I'm not a coder, I just poked around. Maybe it's as slow/fast as the openai implementation because it was the other way around and only faster-whisper is used. Maybe the openai-whisper has worse and faster-whisper higher quality settings by default (Though from what I understood they where the same). The beam_size setting certainly added vram usage when upped.

This one https://github.com/m-bain/whisperX with the same whisper-asr-webservice ui would most likely be chefs kiss though 😅

https://github.com/RomanKlimov/faster-whisper-acceleration this one might actually not be so hard to integrate, but it's still above my current skill level (@ayancey could you take a quick look, maybe? 😊 )

@Deathproof76 Deathproof76 changed the title Switching between faster-whisper or openai-whisper via env seems to be broken (docker) Switching between faster-whisper or openai-whisper via env seems to be broken #edit: most likely not, just not as fast as expected Jun 14, 2023
@Deathproof76 Deathproof76 changed the title Switching between faster-whisper or openai-whisper via env seems to be broken #edit: most likely not, just not as fast as expected Switching between faster-whisper or openai-whisper via env seems to be broken #edit: most likely not, just not as fast as I hoped for Jun 14, 2023
@ayancey
Copy link
Collaborator

ayancey commented Jun 15, 2023

I'll take a look, but no promises 😅

@RedFox134
Copy link

I've heard whisper-x does a better job with subtitle time stamps so I'd love to see that get added!

@ayancey
Copy link
Collaborator

ayancey commented Oct 9, 2023

@Deathproof76 if you have input on this would love to hear it: #125

@ayancey ayancey closed this as completed Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants