-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
latin-camelcase feature make wrong segmentation #289
Comments
Hello @hamano, What do you expect in terms of segmentation? Thank you! |
|
The inability to segment words not found in the dictionary was a characteristic in Japanese. Please ignore that. |
The word |
Hello @hamano, |
I'm concerned not just about the term "OpenSSL" but about countless similar terms.
I don't think all these terms should be added to the dictionary. New terms emerge one after another. |
If you say this is the expected behavior of the latin-camelcase feature, then that's fine. I'll disable it. However, it's too inconvenient as a default feature, so I reported it, especially for use in technical documentation. |
Understood, we may disable it from the default features, let should_group = if last_char_was_lowercase && char.is_letter_uppercase() {
false
} else {
true
};
last_char_was_lowercase = char.is_letter_lowercase();
should_group or even let should_group = !(last_char_was_lowercase && char.is_letter_uppercase());
last_char_was_lowercase = char.is_letter_lowercase();
should_group Should solve your issue However, the word |
I am unsure how "OpenSSLError" should be segment. Perhaps in the context of a program constant name appearing in documentation, it is expected not to be segment. |
Hey @hamano, |
The default featre has issue with proper noun segmentation like
OpenSSL
.main.rs:
default feature:
disable default feature
The text was updated successfully, but these errors were encountered: