Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix setting pause symbol for non-kana symbol #8

Open
wants to merge 1 commit into
base: 1.11
Choose a base branch
from

Conversation

sophiefy
Copy link

@sophiefy sophiefy commented Sep 18, 2023

Maybe this is more of a problem with the dictionary...

njd_set_pronunciation sets read, pron and other features for symbols with 0 mora size. Specifically, non-kana symbols will be set as 読点.

In the following example, is incorrectly parsed as 名詞 using MeCab and naist-jdic (whereas it should be 助詞).

1933年~1937年
1933	名詞,数,*,*,*,*,*
年	名詞,接尾,助数詞,*,*,*,年,ネン,ネン,1/2,C3
~	名詞,サ変接続,*,*,*,*,*
1937	名詞,数,*,*,*,*,*
年	名詞,接尾,助数詞,*,*,*,年,ネン,ネン,1/2,C3

Since its mora size is 0, its read, pron are set to and pos is set to 記号. Consequently, its features would be the following, which is weird.

~,記号,サ変接続,*,*,*,*,~,、,、,0,0,*,0

So I think pos_group, ctype and cform should also be modified and its features become:

~,記号,読点,*,*,*,*,~,、,、,0,0,*,0

tsukumijima pushed a commit to tsukumijima/open_jtalk that referenced this pull request Jul 29, 2024
* Add: 辞書コンパイルに-qを追加

* Delete: 関数のquietオプションを削除

* Fix: quietオプションが残っているのを修正

* Fix: quiet引数を使っていたのを修正

* Fix: thread_localのフォールバックを追加

* Fix: msvcでの動作を修正

* Update src/mecab/src/dictionary_generator.cpp

Co-authored-by: Hiroshiba <[email protected]>

* Delete: thread_localを削除

* Change: CHECK_DIEはquietでも出るように

* Fix: まだ出力が出ていたのを修正

* cerr2箇所を戻す

---------

Co-authored-by: Hiroshiba <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant