Skip to content

Releases: wenet-e2e/WeTextProcessing

1.0.4

01 Aug 10:06
a8efdf7
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 1.0.3...1.0.4

1.0.3

04 Jul 09:27
703433b
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.0.2...1.0.3

1.0.2

20 Jun 08:33
053507e
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.0.1...1.0.2

1.0.1

06 Jun 16:24
d9f47ca
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.0.0...1.0.1

1.0.0

05 Jun 10:55
0f386d8
Compare
Choose a tag to compare

Breaking Changes

  1. support english tn, see #202 , Most of the english rules were copied from NeMo, but the difference is that we made a significant simplification of the rules, those changes result in
    • FST size comparison: 76M (NeMo) vs. 7M (Ours)
    • Building time comparison (when you want to develop new rules): 777s (NeMo) vs. 41s (Ours)
nemo wetext
NeMo WeText
  1. support online building of fst, enjoy wetext without pain #230
pip install wetextprocessing
from itn.chinese.inverse_normalizer import InverseNormalizer
from tn.chinese.normalizer import Normalizer as ZhNormalizer
from tn.english.normalizer import Normalizer as EnNormalizer

zh_tn_text = "你好 WeTextProcessing 1.0,船新版本儿,船新体验儿,简直666,9和10"
zh_itn_text = "你好 WeTextProcessing 一点零,船新版本儿,船新体验儿,简直六六六,九和六"
en_tn_text = "Hello WeTextProcessing 1.0, life is short, just use wetext, 666, 9 and 10"
zh_tn_model = ZhNormalizer(remove_erhua=True, overwrite_cache=True)
zh_itn_model = InverseNormalizer(enable_0_to_9=False, overwrite_cache=True)
en_tn_model = EnNormalizer(overwrite_cache=True)
print("中文 TN (去除儿化音,重新在线构图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (小于10的单独数字不转换,重新在线构图):\n\t{} => {}".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))
print("英文 TN (暂时还没有可控的选项,后面会加...):\n\t{} => {}\n".format(en_tn_text, en_tn_model.normalize(en_tn_text)))

zh_tn_model = ZhNormalizer(overwrite_cache=False)
zh_itn_model = InverseNormalizer(overwrite_cache=False)
en_tn_model = EnNormalizer(overwrite_cache=False)
print("中文 TN (复用之前编译好的图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (复用之前编译好的图):\n\t{} => {}".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))
print("英文 TN (复用之前编译好的图):\n\t{} => {}\n".format(en_tn_text, en_tn_model.normalize(en_tn_text)))

zh_tn_model = ZhNormalizer(remove_erhua=False, overwrite_cache=True)
zh_itn_model = InverseNormalizer(enable_0_to_9=True, overwrite_cache=True)
print("中文 TN (不去除儿化音,重新在线构图):\n\t{} => {}".format(zh_tn_text, zh_tn_model.normalize(zh_tn_text)))
print("中文ITN (小于10的单独数字也进行转换,重新在线构图):\n\t{} => {}\n".format(zh_itn_text, zh_itn_model.normalize(zh_itn_text)))

image

Minor changes

Full Changelog: 0.2.1...1.0.0

0.2.1

05 Jun 05:23
385e35f
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.1.12...0.2.1

WeTextProcessing v0.1.12

10 Mar 03:47
4673bba
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.1.11...0.1.12

WeTextProcessing v0.1.11

27 Dec 03:21
2a60045
Compare
Choose a tag to compare
  • fix(tn): add 摄氏度 #166
  • fix(itn): cardnal number for ID card #161
  • feat(itn): 支持配置百万及以上数字的转换格式 #172
  • set pynini==2.1.5 #167
  • feat(all): format all files #174
  • [itn] fix 给xxxxxxxxxxx打电话 #179

WeTextProcessing v0.1.10

15 Nov 11:20
2c0e38f
Compare
Choose a tag to compare
  • fix(tn): 全角数字 #157
  • fix(tn): 300w张 50000票 #156
  • fix(tn): add >= <= #155 #153
  • fix(tn): support 中文冒号 #154

WeTextProcessing v0.1.9

13 Nov 10:58
1169250
Compare
Choose a tag to compare
  • fix(tn): singer by2 #151
  • feat(tn): remove_erhua #149
  • fix(tn): 人均200 #148
  • fix(tn): "给12315打个电话" #146
  • feat(tn): support xxx-xxxxxxxx #144
  • feat(tn): add args #141
  • fix(tn): 11 -> 十一 #140
  • fix(itn): ip address #138