-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not accurate source language autodetection #74
Comments
Thanks for reporting this! This is strange, in my case even class GoogleTranslate doesn't recognize the language correctly. Problems seem to be on Google server side ➜ translate git:(main) ipython3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import translatepy
In [2]: translatepy.translators.google.GoogleTranslateV1().language("casa")
Out[2]: LanguageResult(service=Google, source=casa, result=eng) ➜ translate git:(main) ipython3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import translatepy
In [2]: translatepy.translators.google.GoogleTranslateV2().language("casa")
Out[2]: LanguageResult(service=Google, source=casa, result=eng) ➜ translate git:(main) ipython3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import translatepy
In [2]: translatepy.translators.google.GoogleTranslate().language("casa")
Out[2]: LanguageResult(service=Google, source=casa, result=eng)
|
Thanks for the response!
But, if you use Reverso first, then the result will be correct when using Google Translate:
Could this be related to the cache mechanism? |
Yes, I would guess the same !
This is normal, because some translators, such as Google Translate, already returns the source language with their translation endpoint, and some need to first call the language endpoint. So, even if you called the language endpoint first with Google Translate, the source language would be the one returned by the translation endpoint. The weirdest thing is that Google Translate returned Spanish though. Looking at the official website, we see that indeed the detected language is English |
Now this is weird, because it shouldn't lol This is the part where the translate/translatepy/utils/request.py Lines 179 to 181 in 490767c
For the translator cache, here is the part where it gets the cache translate/translatepy/translators/base.py Lines 318 to 320 in 490767c
But that's weird because we clearly see that you are creating two different instances of the Translator class
|
Wow now that's interesting... I guess it might be a feature to guess better the expected result. |
But then it might change the result based on the service URL used 🤔 |
Just confirmed it: >>> from translatepy.translators.google import GoogleTranslate
>>> g = GoogleTranslate(service_url="translate.google.es")
>>> g.language("casa")
LanguageResult(service=Google, source=casa, result=spa)
>>> g = GoogleTranslate(service_url="translate.google.fr")
>>> g.language("casa")
LanguageResult(service=Google, source=casa, result=spa)
>>> g.clean_cache()
>>> g.language("casa")
LanguageResult(service=Google, source=casa, result=por) And yes something is happening with the caches |
Well, that is something lol, I tried checking in the source code before, but my python skills are not that sharp 😅 maybe you have a better eye to catch what's going on lol |
It's not a bug, it's a feature. When I designed the V2 translatepy architecture, I make a one cache instance avaible for all BaseTranslate class instances. In practice, it doesn't seem to be a good idea. If required, I can make PR to fix this, and integrate new LRU cache logic (#58). translate/translatepy/translators/base.py Lines 51 to 62 in 490767c
Caches initializes as class attributes, not instance. More info: https://stackoverflow.com/a/207128/13452914 |
Yes, I think this should be changed because people using translators separately expect different results from each instance. Moreover, if they want a shared cache, they might just use the Translate class. Also yea you can PR the new LRU logic anytime you want ! |
Thank you all guys for the help 🙌🙌 |
New PR done: #76 ➜ translate git:(main) ipython
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.8.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from translatepy.translators.google import GoogleTranslate
In [2]: g = GoogleTranslate(service_url="translate.google.es")
In [3]: g.language("casa")
Out[3]: LanguageResult(service=Google, source=casa, result=spa)
In [4]: g = GoogleTranslate(service_url="translate.google.fr")
In [5]: g.language("casa")
Out[5]: LanguageResult(service=Google, source=casa, result=por) |
Hi!
First of all wanted to say that I love the project, have been using it for a while now.
I came across some bizarre behavior that maybe you could check or maybe explain to me (I tried checking the source code for the functions but did not see anything relevant that could be causing this).
In this case, it seems that the source language autodetection is a bit off when giving it short and single words. I reproduced it with Spanish, but I don't know if it does happen in other languages too.
In this case, if you give the words "casa" or "hola" for example, it will detect the source language as English instead of Spanish.
For example using the base translator:
Then I tried using the translators explicitly, in this case Reverso and Google, then using the base translator again, and it worked correctly (I guess because of the cache, but I may be wrong):
But interestingly enough, then, in the same session, using the base translator with the method translate(), the detection was off again:
Any ideas of why could be this happening? I guess the workaround by know would be to run the GoogleTranslate().language() method, and then the Translator().translate() method to get accurate results, like so:
Anyway, wanted to ask about this and see if there is any reasoning behind it.
Sorry for the long message and thanks in adavance !
The text was updated successfully, but these errors were encountered: