Releases: wenet-e2e/WeTextProcessing
Releases · wenet-e2e/WeTextProcessing
1.0.4
What's Changed
- [itn] whitelist 7x24小时 by @weimeng23 in #266
- [itn] fix 十三五 by @xingchensong in #267
- [install] upgrade pynini to 2.1.6 in setup.py by @lingji-yidong in #269
- [itn] fix 四s店 by @xingchensong in #270
New Contributors
- @lingji-yidong made their first contribution in #269
Full Changelog: 1.0.3...1.0.4
1.0.3
What's Changed
- [tn] english, fix crash on "" by @xingchensong in #249
- [tn] english, fix
by @xingchensong in #251
- [install] upgrade pynini to 2.1.6 by @xingchensong in #252
- [tn] add whitelist for you're by @xingchensong in #253
- [doc] Update README.md by @xingchensong in #254
- [itn] fix issue#237, digit + union("百", "千", "万") + digit + unit by @weimeng23 in #255
- [tn] delete prefix space by @xingchensong in #262
- [itn] add whitelist by @xingchensong in #263
Full Changelog: 1.0.2...1.0.3
1.0.2
What's Changed
- [fix] tn chinese, add punc by @xingchensong in #242
- [tn] chinese, append traditional_to_simple by @xingchensong in #243
- [itn] fix 八百一千=>800 1000 二十一千=>20 1000, 零千 零万 by @weimeng23 in #246
Full Changelog: 1.0.1...1.0.2
1.0.1
What's Changed
- [fix] fix tn, week range by @xingchensong in #238
- [fix] fix tn, punct with space by @xingchensong in #239
- [fix] fix tn, remove useless mapping in whitelist by @xingchensong in #240
- [wheel] disable global logging config by @xingchensong in #241 (取消全局日志配置,避免覆盖其他程序的日志等级)
Full Changelog: 1.0.0...1.0.1
1.0.0
Breaking Changes
- support english tn, see #202 , Most of the english rules were copied from NeMo, but the difference is that we made a significant simplification of the rules, those changes result in
- FST size comparison: 76M (NeMo) vs. 7M (Ours)
- Building time comparison (when you want to develop new rules): 777s (NeMo) vs. 41s (Ours)
NeMo | WeText |
- support online building of fst, enjoy wetext without pain #230
pip install wetextprocessing
from itn.chinese.inverse_normalizer import InverseNormalizer
from tn.chinese.normalizer import Normalizer as ZhNormalizer
from tn.english.normalizer import Normalizer as EnNormalizer
zh_tn_text = "你好 WeTextProcessing 1.0,船新版本儿,船新体验儿,简直666,9和10"
zh_itn_text = "你好 WeTextProcessing 一点零,船新版本儿,船新体验儿,简直六六六,九和六"
en_tn_text = "Hello WeTextProcessing 1.0, life is short, just use wetext, 666, 9 and 10"
zh_tn_model = ZhNormalizer(remove_erhua=True, overwrite_cache=True)
zh_itn_model = InverseNormalizer(enable_0_to_9=False, overwrite_cache=True)
en_tn_model = EnNormalizer(overwrite_cache=True)
print("中文 TN (去除儿化音,重新在线构图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (小于10的单独数字不转换,重新在线构图):\n\t{} => {}".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))
print("英文 TN (暂时还没有可控的选项,后面会加...):\n\t{} => {}\n".format(en_tn_text, en_tn_model.normalize(en_tn_text)))
zh_tn_model = ZhNormalizer(overwrite_cache=False)
zh_itn_model = InverseNormalizer(overwrite_cache=False)
en_tn_model = EnNormalizer(overwrite_cache=False)
print("中文 TN (复用之前编译好的图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (复用之前编译好的图):\n\t{} => {}".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))
print("英文 TN (复用之前编译好的图):\n\t{} => {}\n".format(en_tn_text, en_tn_model.normalize(en_tn_text)))
zh_tn_model = ZhNormalizer(remove_erhua=False, overwrite_cache=True)
zh_itn_model = InverseNormalizer(enable_0_to_9=True, overwrite_cache=True)
print("中文 TN (不去除儿化音,重新在线构图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (小于10的单独数字也进行转换,重新在线构图):\n\t{} => {}\n".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))
Minor changes
- [refactor] support building fst online by @xingchensong in #230
- [fix] remove redundant mapping in whitelist by @xingchensong in #231
- [tn] english tn, support range by @xingchensong in #233
- [fix] fix itn 三四十万 一万六七 by @xingchensong in #234
- [fix] fix itn 洞>0,拐>7 by @xingchensong in #235
- [fix] fix tn, remove useless mapping in english tn by @xingchensong in #236
Full Changelog: 0.2.1...1.0.0
0.2.1
What's Changed
- [itn] fix idcard number ends with X by @weimeng23 in #193
- fix #190 by @pengzhendong in #194
- fix #155 by @pengzhendong in #196
- [itn] 帮我导航到中关村一百零一号 by @xingchensong in #197
- [itn] 车牌号5位6位,包含零 by @weimeng23 in #198
- feat(tn): [cr_id_skip] Support english tn, cardinal and word by @xingchensong in #203
- [tn] english tn, support ordinal by @xingchensong in #204
- [tn] english tn, support date by @xingchensong in #205
- [tn] english tn, support decimal by @xingchensong in #207
- [tn] english tn, support fraction by @xingchensong in #209
- [tn] english tn, support time by @xingchensong in #210
- [tn] english tn, support measure by @xingchensong in #211
- [tn] english, support money by @xingchensong in #212
- [tn] english, support telephone by @xingchensong in #213
- [tn] english, support electronic by @xingchensong in #214
- [tn] tn english, support roman by @xingchensong in #215
- [tn] english tn, support whitelist by @xingchensong in #216
- [format] add copyright by @xingchensong in #217
- [tn] set whitelist weight = 1.0 by @xingchensong in #218
- [runtime] support english tn by @xingchensong in #219
- [runtime] fix english tn by @xingchensong in #220
- [tn] simplify tn by @xingchensong in #221
- [runtime] fix english tn order by @xingchensong in #222
- [fix] english tn by @xingchensong in #224
- [tn] support punct by @xingchensong in #225
- [fix] remove punc in decimal by @xingchensong in #226
- [fix] remove punc in measure by @xingchensong in #227
- [fix] english tn, whitelist exclude punct by @xingchensong in #228
- [cicd] update wheels by @xingchensong in #229
New Contributors
- @weimeng23 made their first contribution in #193
Full Changelog: 0.1.12...0.2.1
WeTextProcessing v0.1.12
What's Changed
- [itn] fix 三百九十八三盒 by @xingchensong in #181
- [android] revert to shared lib by @pengzhendong in #182
- [tn] fix 九零后 by @xingchensong in #183
- [tn] fix 手机尾号2349,100兆 by @xingchensong in #184
- [itn] fix 一二三四 by @xingchensong in #187
Full Changelog: 0.1.11...0.1.12