Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new phrase cut into multiple phrases #207

Open
qas612820704 opened this issue Apr 9, 2017 · 13 comments
Open

Add new phrase cut into multiple phrases #207

qas612820704 opened this issue Apr 9, 2017 · 13 comments

Comments

@qas612820704
Copy link

Like #98.

Adding new phrase will cut into more than 1 phrase, and also contains bopomofo.

ie. When I add this

Phrase 歐你媽個頭
Bopomofo ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ

It will split into multiple phrase, like the right part of below figure.

screenshot from 2017-04-09 15-16-37

More addition, the new phrase contains bopomofo.

Is this the correct behavior or something got wrong?

@jserv
Copy link
Member

jserv commented Apr 9, 2017

@qas612820704 , use chewing-editor -d to dump and analyze the log. Always attach text messages.

@qas612820704
Copy link
Author

@jserv

Debug: Add "歐你媽個頭" ( "ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ" ) ((null) :0)
Debug: [chewingio.c:1996 chewing_userphrase_add] API call:  ((null) :0)
Warning: chewing_userphrase_add() returns 0 ((null) :0)
Debug: [chewingio.c:1859 chewing_userphrase_enumerate] API call:  ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: 歐 ㄡ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: 你媽 ㄋㄧˇ ㄇㄚ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: 個頭 ㄍㄜ˙ ㄊㄡˊ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄡ ㄡ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄋ一ˇ ㄋ ㄧ ˇ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄚ ㄚ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄇㄚ ㄇ ㄚ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄜ ㄜ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄍㄜ˙ ㄍ ㄜ ˙ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄊㄡˊ ㄊ ㄡ ˊ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: Total userphrase 10 ((null) :0)
Debug: 10 ((null) :0)

It looks like same issue of #206.

I will trace the source code.

@samwhelp
Copy link

會發生這個例外,應該是您把「注音」的「ㄧ」,輸成「中文單字」的「一」。

您可以再確認一下,上面的「ㄋ一ˇ」,是「中文單字」的「一」。

注音: ㄧ
U+3127
http://www.fileformat.info/info/unicode/char/3127/index.htm

單字: 一
U+4e00
http://www.fileformat.info/info/unicode/char/4e00/index.htm

以上提供參考

:-)

@david50407
Copy link
Member

david50407 commented Apr 10, 2017

@samwhelp The issue you mentioned should be solved after PR #169 which replaced all into (also replaced all into )

@david50407
Copy link
Member

oops, @samwhelp you're right, that U+4e00(一) was mis-typed in bopomofo and #169 didn't catch.

I re-checked #169, that catch the wrong word while replacing U+3127 to U+3127 (yes, the same word).

And this issue should be related to #108.

@samwhelp
Copy link

samwhelp commented Apr 10, 2017

補充一下,我測試的環境

  • Xubuntu 16.04 amd64

執行

$ dpkg -l '*chewing*'

顯示

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                            Version              Architecture         Description
+++-===============================-====================-====================-====================================================================
ii  chewing-editor                  0.0.1-3              amd64                user dictionary editor for the chewing input method
ii  fcitx-chewing                   0.2.2-1              amd64                Fcitx wrapper for Chewing library
ii  hime-chewing:amd64              0.9.10+git20150916+d amd64                support library to use Chewing in HIME
un  libchewing                      <none>               <none>               (no description available)
un  libchewing-data                 <none>               <none>               (no description available)
un  libchewing-dev                  <none>               <none>               (no description available)
un  libchewing1-dev                 <none>               <none>               (no description available)
un  libchewing2-dev                 <none>               <none>               (no description available)
ii  libchewing3:amd64               0.4.0-4              amd64                intelligent phonetic input method library
ii  libchewing3-data                0.4.0-4              all                  intelligent phonetic input method library - data files
ii  libchewing3-dev                 0.4.0-4              amd64                intelligent phonetic input method library (developer version)
un  scim-chewing                    <none>               <none>               (no description available)

我使用「chewing-editor -d」來測試,

輸入

phrase = "歐你媽個頭"
bopomofo = "ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ"

得到下面的結果

Debug: Add "歐你媽個頭" ( "ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ" ) ((null) :0)
Debug: [chewingio.c:1998 chewing_userphrase_add] API call:  ((null) :0)
Warning: chewing_userphrase_add() returns 0 ((null) :0)

輸入

phrase = "歐你媽個頭"
cbopomofo = "ㄡ ㄋㄧˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ"

得到下面的結果

Debug: Add "歐你媽個頭" ( "ㄡ ㄋㄧˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ" ) ((null) :0)
Debug: [chewingio.c:1998 chewing_userphrase_add] API call:  ((null) :0)
Debug: [userphrase-sql.c:179 LogUserPhrase] userphrase 歐你媽個頭, phone = 0x0040 0x0e83 0x0608 0x1219 0x0c42 , orig_freq = 1, max_freq = 1, user_freq = 1, recent_time = 58958 ((null) :0)
Debug: "歐你媽個頭 (ㄡ ㄋㄧˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ)" ((null) :0)

然後下載「chewing-editor」的「Source Package」來觀看,

$ apt-get source chewing-editor

執行

$ grep 'checkBopomofo' chewing-editor-0.0.1/* -R

沒有顯示

執行

$ grep 'UserphraseModel::add' chewing-editor-0.0.1/* -R -A 18

顯示

chewing-editor-0.0.1/src/model/UserphraseModel.cpp:void UserphraseModel::add(std::shared_ptr<QString> phrase, std::shared_ptr<QString> bopomofo)
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-{
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    add(*phrase.get(), *bopomofo.get());
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-}
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-void UserphraseModel::importUserphrase(std::shared_ptr<UserphraseImporter> importer)
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-{
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    size_t old_count = userphrase_.size();
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    if (!importer.get()->isSupportedFormat()) {
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        emit importCompleted(false, importer.get()->getPath(), 0, old_count);
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        return;
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    }
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    auto result = importer.get()->getUserphraseSet();
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    for (auto& i: result) {
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        add(i.phrase_, i.bopomofo_);
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    }
--
chewing-editor-0.0.1/src/model/UserphraseModel.cpp:void UserphraseModel::add(const QString &phrase, const QString &bopomofo)
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-{
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    auto ret = chewing_userphrase_add(
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        ctx_.get(),
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        phrase.toUtf8().constData(),
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        bopomofo.toUtf8().constData());
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    if (ret > 0) {
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        emit beginResetModel();
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        userphrase_.insert(Userphrase{
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-            phrase,
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-            bopomofo
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        });
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        emit endResetModel();
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        emit addNewPhraseCompleted(userphrase_[userphrase_.size()-1]);
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    } else {
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        qWarning() << "chewing_userphrase_add() returns" << ret;
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    }
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-}

看起來我目前使用「chewing-editor」的這個版本「0.0.1-3」,應該是還沒有修正前的版本。

然後我也有測試「libchewing3」,結果也是相同的,

輸入

phrase = "歐你媽個頭"
bopomofo = "ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ"

執行「chewing_userphrase_add」會回傳「0」。

輸入

phrase = "歐你媽個頭"
cbopomofo = "ㄡ ㄋㄧˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ"

執行「chewing_userphrase_add」會回傳「1」。

關於 #206 我有測過,應該也是同樣的情形。

單字

phrase = "鞭數十"
bopomofo = "ㄅ一ㄢ ㄕㄨˋ ㄕˊ"

注音

phrase = "鞭數十"
bopomofo = "ㄅㄧㄢ ㄕㄨˋ ㄕˊ";

報告完畢

:-)

@david50407
Copy link
Member

After #210, this issue should be solved now, @qas612820704 can you try again for this issue?

And thanks for the help, @samwhelp, the auto-conversion is published after 0.1.1.

BTW, we still need a good solution to #108.

@qas612820704
Copy link
Author

Hi @david50407 , @samwhelp is right.
I typos 一 as ㄧ.

Changing

phase = 歐你媽個頭
bopomofo = ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ

into

phase = 歐你媽個頭
bopomofo = ㄡ ㄋㄧˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ

works fine. Thx.

@qas612820704
Copy link
Author

qas612820704 commented Apr 11, 2017

@david50407, and that right, #169
Two "一" are U+3127 at

// src/model/UserphraseModel.cpp:197
QString UserphraseModel::checkBopomofo(const QString &bopomofo) const
{
    ...
    replaceBopomofo.replace(QString::fromUtf8(""),QString::fromUtf8(""));
    ...
}

needs change to

// src/model/UserphraseModel.cpp:197
QString UserphraseModel::checkBopomofo(const QString &bopomofo) const
{
    ...
    replaceBopomofo.replace(QString::fromUtf8(""),QString::fromUtf8(""));
    ...
}

Change the first "ㄧ"(U+3127) into "一"(U+4E00)

Should I make a pull request to fix it?

@jserv
Copy link
Member

jserv commented Apr 11, 2017

@qas612820704, The idea of your preliminary work is to implement fuzzy match logic, which is worthy for sending pull request(s). Can you improve it by accepting more characters such as ?

@qas612820704
Copy link
Author

@jserv, is there another characters like ? I just know Y in English, and already fixed in #169.
I have no idea with others bopomofo-like characters.

@david50407
Copy link
Member

@qas612820704 @jserv, I already fixed that at #210 (and merged) yesterday, and I don't think english charecter Y is that easily to be mistaken here.

and look same as and in the IME input box under some fonts, so I think just take these two cases is fine.

@jserv
Copy link
Member

jserv commented Apr 11, 2017

I defer to @david50407 for the idea not to take alphabet Y into consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants