-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
设置停止词失效! #58
Comments
更新:
|
文件的换行符是什么呢?建议用 LF 换行符,可以用文本编辑工具检查一下。 |
是换行符的问题,谢谢 |
1.运行环境:
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] devtools_1.13.3 jiebaR_0.9.1 jiebaRD_0.1 ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.9 withr_2.0.0 digest_0.6.12 assertthat_0.1 R6_2.2.0 grid_3.4.1
[7] plyr_1.8.4 gtable_0.2.0 git2r_0.19.0 scales_0.4.1 httr_1.2.1 curl_2.8.1
[13] lazyeval_0.2.0 tools_3.4.1 munsell_0.4.3 compiler_3.4.1 colorspace_1.3-2 memoise_1.1.0
[19] tibble_1.2
2.我在重复中文分词文档(https://qinwenfeng.com/jiebaR/section-3.html#-workerstop_word)的以下内容时发生错误:
3.0.5 添加停止词 worker(stop_word = “…”)
!!!! 对于分词,请不要修改默认加载的停止词文本,即 jiebaR::STOPPATH,请使用自定义的停止词路径。
目录下有一个 stop.txt 文件,内容如下
readLines("stop.txt")
#> [1] "停止"
分词器 = worker(stop_word = "stop.txt")
segment("这是一个停止词", 分词器)
#> [1] "这是" "一个" "词"
3.以下是我的代码,其中stop.txt里就一个词,格式另存为utf-8,文件放在我的工作目录下。
但是发现并没有去掉停止词,这是为什么呢?
The text was updated successfully, but these errors were encountered: