Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: o.o.c.segmentation.SRX to load conf and save srx in more robust way and remove warning message #1159

Merged
merged 7 commits into from
Nov 12, 2024

Conversation

miurahr
Copy link
Member

@miurahr miurahr commented Oct 12, 2024

Pull request type

  • Bug fix

Which ticket is resolved?

What does this PR change?

  • Update SRXTest to run with locale JA, DE and EN
  • fix: LanguageCodes to handle Germany localized name in ancient OmegaT versions.
  • feat: make SRX.saveToSrx robust for localized name of language with pattern detection
  • feat: LanguageCodes to detect language from pattern.

Other information

#1158

@miurahr miurahr changed the title refactor: segmentation codes fix: segmentation rule parser without localized name Oct 31, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Oct 31, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Oct 31, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Oct 31, 2024
@miurahr miurahr force-pushed the topic/miurahr/srx/unkown-langauge-error branch from f6951c9 to c0defb6 Compare October 31, 2024 11:54
@miurahr miurahr changed the title fix: segmentation rule parser without localized name fix: segmentation.conf rule parser to migrate to srx file Oct 31, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Oct 31, 2024
@miurahr
Copy link
Member Author

miurahr commented Oct 31, 2024

There are two test failures.

  • SRXTest$SRXMigrateJaTest.testSrxMigration
11:57:15.890: Warning: Unknown language code カタルーニャ語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.900: Warning: Unknown language code チェコ語(Czech) specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.960: Warning: Unknown language code ドイツ語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.985: Warning: Unknown language code 英語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.987: Warning: Unknown language code スペイン語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.989: Warning: Unknown language code フィンランド語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.990: Warning: Unknown language code フランス語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.991: Warning: Unknown language code イタリア語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.996: Warning: Unknown language code 日本語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.997: Warning: Unknown language code オランダ語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:16.012: Warning: Unknown language code ポーランド語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:16.047: Warning: Unknown language code ロシア語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:16.048: Warning: Unknown language code スウェーデン語 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:16.054: Warning: Unknown language code スロバキア語(Slovak) specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:16.102: Warning: Unknown language code 中国語(Chinese) specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:16.103: Warning: Unknown language code 初期値 specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:16.104: Warning: Unknown language code テキストファイル specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:16.105: Warning: Unknown language code HTML, XHTML, ODF と Infix ファイル specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:16.106: Info: using segmentation rules from test/data/segmentation/migrate/locale_ja/segmentation.conf (SRX_RULE_FROM) 
  • SRXTest$SRXMigrateOldDeTest.testSrxMigration
11:57:15.521: Warning: Unknown language code Katalanisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.614: Warning: Unknown language code Deutsch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.650: Warning: Unknown language code Englisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.653: Warning: Unknown language code Spanisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.657: Warning: Unknown language code Finnisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.659: Warning: Unknown language code Französisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.662: Warning: Unknown language code Italienisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.672: Warning: Unknown language code Japanisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.673: Warning: Unknown language code Niederländisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.704: Warning: Unknown language code Polnisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.753: Warning: Unknown language code Russisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.754: Warning: Unknown language code Schwedisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.806: Warning: Unknown language code Chinesisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.808: Warning: Unknown language code Standard specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.810: Warning: Unknown language code Segmentierung der Textdateien specified (CORE_SRX_RULES_UNKNOWN11:57:15.521: Warning: Unknown language code Katalanisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.614: Warning: Unknown language code Deutsch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.650: Warning: Unknown language code Englisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.653: Warning: Unknown language code Spanisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.657: Warning: Unknown language code Finnisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.659: Warning: Unknown language code Französisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.662: Warning: Unknown language code Italienisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.672: Warning: Unknown language code Japanisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.673: Warning: Unknown language code Niederländisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.704: Warning: Unknown language code Polnisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.753: Warning: Unknown language code Russisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.754: Warning: Unknown language code Schwedisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.806: Warning: Unknown language code Chinesisch specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.808: Warning: Unknown language code Standard specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.810: Warning: Unknown language code Segmentierung der Textdateien specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.810: Warning: Unknown language code Segmentierung von HTML-, XHTML-, ODF- und Infix-Dateien specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.811: Info: using segmentation rules from test/data/segmentation/migrate/locale_de_54/segmentation.conf (SRX_RULE_FROM) 	_LANGUAGE_CODE) 	
11:57:15.810: Warning: Unknown language code Segmentierung von HTML-, XHTML-, ODF- und Infix-Dateien specified (CORE_SRX_RULES_UNKNOWN_LANGUAGE_CODE) 	
11:57:15.811: Info: using segmentation rules from test/data/segmentation/migrate/locale_de_54/segmentation.conf (SRX_RULE_FROM) 	

@miurahr miurahr force-pushed the topic/miurahr/srx/unkown-langauge-error branch 2 times, most recently from 176fa88 to 16ab78e Compare November 2, 2024 02:10
@omegat-org omegat-org deleted a comment from github-actions bot Nov 2, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 2, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 2, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 2, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 2, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 2, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 2, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 2, 2024
@miurahr miurahr added bug and removed refactoring labels Nov 2, 2024
@miurahr miurahr force-pushed the topic/miurahr/srx/unkown-langauge-error branch from cda89d4 to 6cae905 Compare November 2, 2024 03:03
@miurahr miurahr marked this pull request as ready for review November 2, 2024 03:03
- reafactor SRXTest class
- Add germany locale conf file built from OmegaT 5.4.0 as test data
- refactor SRX class to help testing
- Load resource bundle in specified test locale

Signed-off-by: Hiroshi Miura <[email protected]>
- Harden the save method to robust for localized language name.
- Even when MapRule has a localized language code, it detects language from a language pattern and write standard name.

Signed-off-by: Hiroshi Miura <[email protected]>
- rulename for text in Germany was changed in v5.5
- when reading "segmentation.conf" generated before v5.4,
  migration is failed.
- Add workaround to detect ancient rulename

Signed-off-by: Hiroshi Miura <[email protected]>
@miurahr miurahr force-pushed the topic/miurahr/srx/unkown-langauge-error branch from 6cae905 to a69a418 Compare November 2, 2024 04:05
Copy link
Member Author

@miurahr miurahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several points to be improved. I leave some comments for the reviewers.

src/org/omegat/core/segmentation/MapRule.java Outdated Show resolved Hide resolved
/** Language Name */
private String languageCode;

public MapRule(Languagemap languagemap, List<Rule> rules) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the ctor with Languagemap as argument. It reduces a dependency of classes. A retrieval of fields of languagemap is done in the caller.

public MapRule(String language, String pattern, List<Rule> rules) {
this.setLanguage(language);
String code = LanguageCodes.getLanguageCodeByPattern(pattern);
this.setLanguage(code != null ? code : language);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got language from the pattern, if it got "EN.*" as pattern we can know it is for English.

src/org/omegat/core/segmentation/MapRule.java Outdated Show resolved Hide resolved
Use non localized message for debug level
- Update LanguageCode.getLanguageCodeByName
    - add null check at first
    - move a migration heuristics code from MapRule
- Update MapRule javadoc descriptions

Signed-off-by: Hiroshi Miura <[email protected]>
Signed-off-by: Hiroshi Miura <[email protected]>
public static class SRXMigrateJaTest {

@org.junit.Rule
public final LocaleRule localeRule = new LocaleRule(new Locale("ja"));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hack to change System Locale to run unit test cases.

Because LocaleRule utility class is not related to the OmegaT functions, please ignore it if you don't know.

@miurahr miurahr changed the title fix: segmentation.conf rule parser to migrate to srx file fix: o.o.c.segmentation.SRX to load conf and save srx in more robust way and remove warning message Nov 2, 2024
@miurahr miurahr merged commit 2469dd8 into master Nov 12, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant