Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

识别不要转化的数字 #28

Open
Ailln opened this issue Sep 23, 2020 · 2 comments
Open

识别不要转化的数字 #28

Ailln opened this issue Sep 23, 2020 · 2 comments

Comments

@Ailln
Copy link
Owner

Ailln commented Sep 23, 2020

1、
输入:原价都是全国统一零售价它是幺三八
输出:原价都是全国统10售价它是138
统一零售价不用转吧
2、
输入:卖到几十块钱
输出:卖到几10块钱
我理解几十块钱也不需要转吧

Originally posted by @mengxifeng in #26 (comment)

@Ailln
Copy link
Owner Author

Ailln commented Sep 23, 2020

目前我想到的方法有:

  1. 分词。这是一种比较简单的方法,但经过测试,分词有时很难把数字分对。
  2. NER。这种方法比较复杂,有可能要引入类似于 Torch 这样的 600 MB 左右的框架(太大可能会对用户安装造成困难),而且这种方法我还没找到合适的公开数据集...

@Beants
Copy link
Contributor

Beants commented Jul 6, 2021

建议直接用正则处理约数和包含数字的词语

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To do
Development

No branches or pull requests

2 participants