feature: CTranslated2 Framework + Adaptive-batching for custom runner #4851

Matthieu-Tinycoaching · 2022-09-30T07:16:36Z

Matthieu-Tinycoaching
Sep 30, 2022

Feature request

Hi,

It would be nice to enable CTranslate2 inference within bentoML (https://github.com/OpenNMT/CTranslate2). This library implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU.

For example, for MarianMT transformer model the following code is used (https://opennmt.net/CTranslate2/guides/transformers.html#marianmt):

ct2-transformers-converter --model Helsinki-NLP/opus-mt-en-de --output_dir opus-mt-en-de

import ctranslate2
import transformers

translator = ctranslate2.Translator("opus-mt-en-de")
tokenizer = transformers.AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")

source = tokenizer.convert_ids_to_tokens(tokenizer.encode("Hello world!"))
results = translator.translate_batch([source])
target = results[0].hypotheses[0]

print(tokenizer.decode(tokenizer.convert_tokens_to_ids(target)))

Maybe this already possible with custom runner?

Motivation

CTranslate2 is a C++ and Python library for efficient inference with Transformer models

Other

No response

Answered by frostming

Jul 11, 2024

This is possible, and you can use the new service APIs to make a BentoML service, read the latest docs for how to do it.

View full answer

Matthieu-Tinycoaching · 2022-09-30T13:36:52Z

Matthieu-Tinycoaching
Sep 30, 2022
Author

Hi,

I managed to make a bentoml service worked on my local machine with custom runner indeed.

However, how could I use adaptive-batching since I cannot enable batching for the target model signature during save_model, which doesn't exist for this framework....

The converted model consist of a directory with 2 files :

model.bin
shared_vocabulary.txt

The tokenizer is used from a call to transformers AutoTokenizer.from_pretrained() method

I tried to specify @bentoml.Runnable.method(batchable=True) in the custom runner class but this didn't work. Is there any way to activate adaptive-batching for custom runner?

0 replies

frostming · 2024-07-11T02:51:49Z

frostming
Jul 11, 2024
Maintainer

This is possible, and you can use the new service APIs to make a BentoML service, read the latest docs for how to do it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML

feature: CTranslated2 Framework + Adaptive-batching for custom runner #4851

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

BentoML

feature: CTranslated2 Framework + Adaptive-batching for custom runner #4851

Matthieu-Tinycoaching Sep 30, 2022

Feature request

Motivation

Other

Replies: 2 comments

Matthieu-Tinycoaching Sep 30, 2022 Author

frostming Jul 11, 2024 Maintainer

Matthieu-Tinycoaching
Sep 30, 2022

Matthieu-Tinycoaching
Sep 30, 2022
Author

frostming
Jul 11, 2024
Maintainer