Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

可能还可以优化的地方 #4

Open
xmflswood opened this issue Jan 25, 2018 · 4 comments
Open

可能还可以优化的地方 #4

xmflswood opened this issue Jan 25, 2018 · 4 comments

Comments

@xmflswood
Copy link

1.拼音做断字处理更符合搜索习惯
如:糖饼(tang bing) 现在输入 gb 也能搜到,可以在participle()的时候处理下

2.可能整个匹配算法(主要是做组合的时候,实际上是笛卡尔积?)需要做优化,这点现在也没什么好思路,只是看了下微信app能处理得非常好
pinyin-engine在处理长多音字的时候是存在问题的,比如:
‘曾大曾大曾大曾大曾大曾大曾大曾大曾大曾大曾大曾大’ (zeng ceng, da dai tai)这里总共20个多音字,有6^10次方个组合,会直接导致内存撑爆,浏览器卡死,在测试16个字的时候需要处理近一秒(chrome 61)

建议可以暂时限制处理的多音字的个数

@aui
Copy link
Owner

aui commented Jan 25, 2018

感谢建议,我考虑下优化

@xmflswood
Copy link
Author

xmflswood commented Jan 25, 2018

断字处理可能可以这么做?
participle 的时候处理成 ['tangbing', 'bing', 'tb', 'b']
接着handle

s = '-tangbing-bing-tb-b'
query的时候用s.indexOf('-‘ + keyword) 来判断

@lomo1
Copy link

lomo1 commented Mar 2, 2018

xi'an 试试 :(

@xmflswood
Copy link
Author

发布了pinyin-match模块,解决了分词、长多音字串的问题,也请支持下~
pinyin-match

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants