-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when trying to access some strings with glossary terms #10002
Comments
Strange, I've never seen such an error. I don't think it is related to project size, actually the performance improvements in 5.0.x were tuned on bigger project (https://weblate.eso-spolszczenie.eu/projects/eso-spolszczenie/#information). Exception info from the e-mail:
Related code: weblate/weblate/glossary/models.py Lines 77 to 85 in 5f4dede
Based on the error, it seems that So most likely ahocorasick returns wrong offsets here. What operating system and architecture are you using? |
This is Docker 24.0.5 on Debian 11 (x86_64), Linux kernel version - 5.10.0-25-amd64. |
I've asked ahocorasick_rs maintainer about this, it might be an issue in that library, see G-Research/ahocorasick_rs#83 |
Can you please run following Python script? It will output more information to diagnose this.
from itertools import chain
import ahocorasick_rs
from weblate.trans.models import Unit
from weblate.trans.models.component import prefetch_glossary_terms
from weblate.trans.util import PLURAL_SEPARATOR
unit = Unit.objects.get(pk=12519736)
parts = []
for text in unit.get_source_plurals():
text = text.lower().strip()
if text:
parts.append(text)
source = PLURAL_SEPARATOR.join(parts)
project = unit.translation.component.project
prefetch_glossary_terms(project.glossaries)
terms = set(
chain.from_iterable(glossary.glossary_sources for glossary in project.glossaries)
)
# Build automaton for efficient Aho-Corasick search
automaton = ahocorasick_rs.AhoCorasick(
terms,
implementation=ahocorasick_rs.Implementation.ContiguousNFA,
store_patterns=False,
)
print("TERMS:")
print(terms)
print("STRING:")
print(repr(source))
print("MATCHES:")
print(automaton.find_matches_as_indexes(source, overlapping=True)) |
I think the crash is caused by blank terms in a glossary (see G-Research/ahocorasick_rs#83 (comment)). The above snippet should confirm that, or there might a different issue as well. I will fix the issue with a blank term and close this issue as that is the most likely cause. |
The issue you have reported is now resolved. If you don’t feel it’s right, please follow its labels to get a clue for further steps.
|
Here is the output of the script:
And yes, we do have some blank terms in glossaries, as in w/ source but w/o translation. |
I guess this is with disabled glossaries, so that doesn't expose the bug... |
Oh yeah, makes sense. I'll provide a new one in a bit. |
The output is too big, so I've put it in a file. |
Thanks, that confirms what i expected, so this issue should be fixed. You can work around it by removing empty terms from a glossary. |
Gotcha, thanks for the help! |
Describe the issue
We've recently upgraded our self-hosted Weblate from 4.18.2 to 5.0.2. Since the upgrade, translators can't access some of the strings in components - the server throws an internal error. By 'access' I mean going to the string page (
/translate/<project>/<component>/<language>/<id_or_some_other_query>
) or going into Zen-mode (/zen/<project>/<component>/<language>/
).We figured from the error that it has something to do with the glossaries, specifically with strings having glossary terms in them, though not all of those strings were affected. Disabling all of the glossaries 'fixed' the issue as in it's now possible to access those problematic strings, but without glossary highlights of course.
I should also mention that this is a large project - 4m+ strings across 6k+ components. The glossaries aren't big at all - about maybe 500-1000 terms in total across all glossaries. Latest 5.x updates did bring a lot of huge performance improvements, and we're very thankful for that.
I already tried
Steps to reproduce the behavior
Expected behavior
The string page should successfully open OR The Zen-mode should successfully open.
Screenshots
The user sees a generic Internal Server Error message.
Exception traceback
No response
How do you run Weblate?
Docker container
Weblate versions
Weblate deploy checks
Additional context
Admin receives an email describing the error. Here is one of the emails:
[Weblate] ERROR (EXTERNAL IP)_ Internal Server Error_ translate_ffxiv-translation_quest-040-luckmk105_04062_ru.zip
The text was updated successfully, but these errors were encountered: