You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used the Autotiktokenizer for "Cohere/Cohere-embed-multilingual-v3.0" which led to an attribute error as the .items() is used on vocab which is a list.
116 """Convert vocab to binary mergeable_ranks.
117
118 Args:
(...)
123 mergeable_ranks (dict): The mergeable ranks of tokens in binary format.
124 """
125 mergeable_ranks = {}
--> 126 sorted_vocab = sorted(vocab.items(), key=lambda x: x[1])
127 for rank, (token, _) in enumerate(sorted_vocab, start=0):
128 # Converting wordpiece to equivalent std BPE form
129 if tokenizer_type == 'wordpiece':
AttributeError: 'list' object has no attribute 'items'
Workaround:
Issue occurred because vocab is assumed to be a dictionary. But, in the cohere tokenizer instance, vocab is a list of lists. Variables should pass or route through a type check.
The text was updated successfully, but these errors were encountered:
Bug Description:
I used the Autotiktokenizer for "Cohere/Cohere-embed-multilingual-v3.0" which led to an attribute error as the
.items()
is used onvocab
which is a list.Minimal Example:
Current behaviour:
Workaround:
Issue occurred because
vocab
is assumed to be a dictionary. But, in the cohere tokenizerinstance
,vocab
is a list of lists. Variables should pass or route through a type check.The text was updated successfully, but these errors were encountered: