Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #652: process whole conversion chain #656

Closed
wants to merge 6 commits into from
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion src/rime/gear/simplifier.cc
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,13 @@ bool Simplifier::Convert(const an<Candidate>& original,
if (forms[i] == original->text()) {
result->push_back(original);
} else {
PushBack(original, result, forms[i]);
string simplified;
const bool success_converted = opencc_->ConvertText(original->text(), &simplified);
if (success_converted && original->text().compare(simplified) != 0) {
PushBack(original, result, simplified);
} else {
PushBack(original, result, forms[i]);
}
Copy link
Author

@amorphobia amorphobia Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested your suggestion yet. I have a question. Would it lose results if a character can be converted into multiple ones?

For example, 「才」 would be converted to 「才」 and 「纔」, but it will use ConvertText first, get result 「才」 (or 「纔」, not test yet), and go to success branch. Then the result 「纔」 will lose.

Copy link
Contributor

@groverlynn groverlynn Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested your suggestion yet. I have a question. Would it lose results if a character can be converted into multiple ones?

For example, 「才」 would be converted to 「才」 and 「纔」, but it will use ConvertText first, get result 「才」 (or 「纔」, not test yet), and go to success branch. Then the result 「纔」 will lose.

fair point. I did a bit testing, and realize the problem is at the logic of the function. The chained conversion is never realized. It's not about one-to-one vs one-to-multi, but even the definition of success is wrong.

ConvertWord only converts at most one level in the chain, which is based on the assumption that rime would convert from zh-hant, and it wouldn't need a second or third instance of (chained) conversion, which is not true if one were to convert from zh-hant to zh-hans-CN / zh-hans-SG / zh-hans-MY (there are minor differences between standard variants, in addition to the concurrent use of zh-hans-MY and zh-hant-MY.

ConvertText only returns the first chained conversion, if any. However, more fatal than fail to get possible conversions is fail to keep correct conversions. ConvertText wrongly treats successful conversions A->B->A as failed conversions (e.g. zh-hans 才 -> zh-hant 纔 -> zh-hant-TW / zh-hant-HK 才). In this case, ConvertWord stops at zh-hant 纔, failing to convert it back to zh-hant-TW 才 or zh-hant-HK 才.

}
}
} else {
Expand Down