Replies: 1 comment
-
Hi @sakib-NSL
Having a simple sample code I can look deeper into this and see if we need to improve our WordSegmenterModel() and its models |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone,
I am a beginner in Spark NLP. I have trained a Japanese Dataset on spark NLP using Bert embeddings. I have used spacy tokenizer and converted it to BIO format and then used the data in training. The result is satisfactory with the test data. But when I use the same test data on prediction pipeline, the performance decreases. I have used Tokenizer() and WordSegmenterModel() (alternatively) in prediction pipeline but did not work. Can I use a customized different tokeizer in pipeline?
Here is the training pipeline
Here is prediction pipeline
Questions:
Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions