-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Tokenizers #687
Add Tokenizers #687
Conversation
…h and without string support in OV core. Moved StringTensorUnpack and reworked it to be aligned with the new approach. Reworked sentece piece op and translation code to be compatible with several variants of string tensor representation and the plugin wrapping hack.
…ranch to contrib in form compatible with both master and the branch with string tensors support. Added CaseFoldUTF8 from that branch.
…pty constants, register StringTensorPack and StringTensorUnpack as OV operations to be able to read IRs with those operations
…den Const translator for TF to intercept string constants
…r conditional compilation based on available features in OpenVINO
…combination of WordpieceTokenizeWithOffsets and LookupTableFindV2 from TensorFlow
…ute initialization optional (needed for core.make_node)
…ean mask extra output
…n and RegexSplit based on paddle fast_tokenizer lib. Limited implementation, not all of the features of ops and TF translated ops are implemented.
… necessary steps to complete HF bert preprocessing conversion (not validated)
…dling of model name
…kenizer and main model is fixed partially (still produces topologically incorrect model)
…uts, now Bert and its tokenizer are connected together correctly
…ding, fix bugs for batches processing
…bled debug output
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run openvino_contrib-mac |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run openvino_contrib-mac |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run openvino_contrib-mac |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run openvino_contrib-mac |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run openvino_contrib-mac |
Azure Pipelines successfully started running 1 pipeline(s). |
5c3b656
to
d34d401
Compare
/azp run openvino_contrib-mac |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see Windows compiles now. But ov_tokenizer.init_extension()
fails for me:
import os
import sys
import ov_tokenizer
if hasattr(os, "add_dll_directory"):
for path in os.environ.get("PATH", "").split(";"):
if os.path.isdir(path):
os.add_dll_directory(path)
ov_tokenizer.init_extension(sys.argv[1])
py llm/cpp/convert_tokenizers.py c:/Users/vzlobin/r/openvino.genai/build/thirdparty/openvino_contrib/modules/custom_operations/user_ie_extensions/Release/user_ov_extensions.dll C:\Users\vzlobin\r\tiny-llama-fast-tokenizer
Traceback (most recent call last):
File "C:\Users\vzlobin\r\openvino.genai\llm\cpp\convert_tokenizers.py", line 9, in <module>
ov_tokenizer.init_extension(sys.argv[1])
File "C:\Users\vzlobin\r\openvino.genai\thirdparty\openvino_contrib\modules\custom_operations\user_ie_extensions\tokenizer\python\ov_tokenizer\node_factory.py", line 21, in init_extension
factory.add_extension(extension_path)
File "C:\Users\vzlobin\Downloads\w_openvino_toolkit_windows_2023.2.0.13089.cfd42bd2cb0_x86_64\python\openvino\runtime\utils\node_factory.py", line 118, in add_extension
self.factory.add_extension(lib_path)
RuntimeError: Cannot load library 'c:/Users/vzlobin/r/openvino.genai/build/thirdparty/openvino_contrib/modules/custom_operations/user_ie_extensions/Release/user_ov_extensions.dll': 126 from cwd: C:\Users\vzlobin\r\openvino.genai
/azp run openvino_contrib-mac |
Azure Pipelines successfully started running 1 pipeline(s). |
Replaced by #767 |
Details:
This PR extends OV Opset with tokenization-related operations
Ticket: