Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some dictionaries have a double quote in their stylesheet link. #1585

Closed
cuongha0701 opened this issue Jun 15, 2024 · 7 comments · Fixed by #1586
Closed

Some dictionaries have a double quote in their stylesheet link. #1585

cuongha0701 opened this issue Jun 15, 2024 · 7 comments · Fixed by #1586

Comments

@cuongha0701
Copy link

cuongha0701 commented Jun 15, 2024

Describe the bug
Some dictionaries cannot load css because of double quote

Affected Dictionaries
Stardict dictionaries

Expected behavior
Remove the double quote in stylesheet link, and everything works again.

Screenshots
Screenshot_20240615_095930

OS and software versions
KDE Plasma 6.0.5
Goldendict-ng 24.05.13.4ac44e9a at 2024-06-14T21:18:58Z
Qt 6.7.1 GCC 14.1.1 20240522 arch linux 6.9.4-arch1-1 x86_64-little_endian-lp64
Flags: MAKE_ZIM_SUPPORT USE_ICONV MAKE_CHINESE_CONVERSION_SUPPORT

@shenlebantongying
Copy link
Collaborator

I don't think this related code is wrong.

QRegularExpression linkRe( R"((<\s*link\s+[^>]*href\s*=\s*["']?)(?!(?:data|https?|ftp):))",
QRegularExpression::CaseInsensitiveOption
| QRegularExpression::InvertedGreedinessOption );
articleText.replace( imgRe, "\\1bres://" + QString::fromStdString( getId() ) + "/" )
.replace( linkRe, "\\1bres://" + QString::fromStdString( getId() ) + "/" );

https://regex101.com/r/gVtCyO/1

Remove the double quote in stylesheet link, and everything works again.

Upload or provide a download link for this dict. The content before replacement might be malformed.

@cuongha0701
Copy link
Author

cuongha0701 commented Jun 15, 2024

@shenlebantongying
Here you are: OxfordThesaurusofEnglish.zip

@shenlebantongying
Copy link
Collaborator

shenlebantongying commented Jun 15, 2024

Seems a combination of

InvertedGreedinessOption converts * to *? and ? to ??.

For <link href="style.css">,
with InvertedGreedinessOption:
(<\s*link\s+?[^>]*href\s*?=\s*["']*)(?!(?:data|https?|ftp):)   < correct
(<\s*link\s+?[^>]*href\s*?=\s*["']?)(?!(?:data|https?|ftp):)   < error as in this issue
(<\s*link\s+?[^>]*href\s*?=\s*["']??)(?!(?:data|https?|ftp):)  < correct, because `??` converted back to `?` by InvertedGreedinessOption

Without any extra option, and for text abc"
["]? matches abc"
["]?? matches abc

@xiaoyifang
Copy link
Owner

remove QRegularExpression::InvertedGreedinessOption seems ok.

image

@shenlebantongying
Copy link
Collaborator

shenlebantongying commented Jun 15, 2024

There are performance implications

InvertedGreedinessOption or Ungreedy only needs 96 steps

while cancel this option needs 144 steps

I think changing the single ? to ?? is a better fix.

@xiaoyifang
Copy link
Owner

xiaoyifang commented Jun 15, 2024

the syntax ?? is not easy to understand.

As for performance issue, we should not take it too far , the regex match may occupy only 1% of the whole process,

@shenlebantongying
Copy link
Collaborator

I agree that InvertedGreedinessOption is 💩💩 that shouldn't exist.

For performance, I have no opinion. Making the code reliable first is ok to me too. Maybe I can revise all regex for performance later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants