You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ALL] Introduced APIs that return the struct of ImmutableSentencePieceText, which encodes string-token, id, and utf-8 byte offsets at once. New API is available both from C++ and Python.
[ALL] Allow tab ‘\t’ to be included in user defined symbols.
[ALL] Added NFKD normalization rule. NFKD rule is provided as a TSV file.
[ALL] Added option to emit unknown symbol instead of raw symbol.
[Python]: Batch encode/decode requests are performed in native multi-threads.
[Python]: Supports to pass a custom log stream during training.
[Python]: Adds module-level version variable: spm.__version__
[Python]: Creates wheel package of Mac universal binary.
Bug fixes & minor changes
Uses the efficient encoding algorithm by default. Removed the functionality to switch the Viterbi tokenization algorithm.
Make the output of Encode and 1-best from NBestEncode same.
Use std::string_view as much as possible.
[Python] Removed pip package for ppc64le and s390x architecture as cibuiltool doesn’t support them.