-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference on new corpus by trained alignments #46
Comments
you have to run a phrase-table extraction algorithm with the corpus and alignment as input. eg. step 4,5,6 of the moses training |
You can have a look at this file force_align.py, i guess this code is used to be align a new corpus by using a trained conditional probability. |
Has any one got a working demo script? Best for a model supporting SentencePiece tokenization. |
I rewrote the source code using pure python codes (I can't share it with you for some reason). I think anyone can implement fast align after reading the source code. My suggestion is that don't use statistical word alignment models for SentencePiece tokenization based algorithms. They are not compatible in my view. But statistical word alignment models can be useful depending on your purpose. |
It is unclear whether the original question is about a) word-aligning a corpus with a previously trained fast_align model (nomadlx assumed this was the case) If a), then you might find useful: to train a fast_align model: https://gist.github.com/bricksdont/7a9ac764d874b90853eff88d53971033 and to apply a trained model: https://gist.github.com/bricksdont/0d1718c7c3fc05714b582afe4c3b5005 |
Edge cases that can break
|
same issues as #33 |
I've already trained on large corpus in parallel to get word alignments. How could I further infer with the word alignments to get the translation probability for the new corpus?
The text was updated successfully, but these errors were encountered: