How to calculate the sequence length of a string ? #373

RageshAntonyHM · 2024-03-07T18:24:49Z

RageshAntonyHM
Mar 7, 2024

I get " maximum sequence length" issue when trying to do "Text to Text" translation

I need to calculate the sequence length of a string before passing to prediction.

How to do this?

avidale · 2024-03-11T10:37:14Z

avidale
Mar 11, 2024
Collaborator

You can apply the tokenizer to it, and count the number of tokens:

# loading the model
import torch
from seamless_communication.inference import Translator

model_name = "seamlessM4T_v2_large"
vocoder_name = "vocoder_v2" if model_name == "seamlessM4T_v2_large" else "vocoder_36langs"

translator = Translator(
    model_name,
    vocoder_name,
    device=torch.device("cuda:0"),
    dtype=torch.float16,
)

text = "This is a typical single-sentence text which the Seamless model is supposed to translate well; " \
    "although this sentence is composed from multiple ones, it is still not too long, and is pretty coherent."

# evaluating the text length in tokens
tokenizer_encoder = translator.text_tokenizer.create_encoder(lang="eng")
tokens = tokenizer_encoder(text)
print(tokens)
# tensor([256022,  10257,    254,     10,  26304,   5302,  25184, 247711,  89945,
#           3657,  29568,   9451,    321,   2103,     33,  35100,  12654,    254,
#         174769,    243,   2809, 143411,  19794, 248123, 156503,   6642,   8466,
#           3657,    254,  22442,     61,   4800, 124736,  81982, 247681,    955,
#            254,  27689,   2984,  25790,  11718, 247681,    447,    254, 187056,
#         212292,     93, 247676,      3])

print(tokens.shape)
# torch.Size([49])

sequence_length = tokens.shape[0]
print(sequence_length)
# 49

Anyway, it is not recommended to use Seamless to translate more that one sentence at a time, because it was trained mostly with single sentences.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to calculate the sequence length of a string ? #373

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to calculate the sequence length of a string ? #373

RageshAntonyHM Mar 7, 2024

Replies: 1 comment

avidale Mar 11, 2024 Collaborator

RageshAntonyHM
Mar 7, 2024

avidale
Mar 11, 2024
Collaborator