Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoogleTranslateV1 does not return the same results as translate.google.com (the website) #22

Open
Animenosekai opened this issue Sep 4, 2021 · 27 comments

Comments

@Animenosekai
Copy link
Owner

Hey guys, have you tested https://github.com/Animenosekai/translate/blob/main/translatepy/translators/google.py#L89 with the same text you want to translate to english, then tested it on the actual site, to see if it returns the same result?? Am I doing something wrong or does it do that for you too?

text:

كانت الحياة هناك ، على ما يبدو حلما ، على وشك أن تتحول إلى كابوس. "(وقفة درامية)" مرحبًا يا رفاق! لقد وجدت للتو طريقة جديدة رائعة لحلاقة شعر خصلاتي!

actual site:

Life there, apparently a dream, was about to turn into a nightmare. (dramatic pause) Hey guys! I just found a great new way to shave my tresses!

python module:

Life was there, apparently a dream, about to turn into a nightmare."(Dramatic pause)" Hello you guys!I just found a wonderful new way to shave my hair!

Originally posted by @NawtJ0sh in #21 (comment)

@NawtJ0sh
Copy link

NawtJ0sh commented Sep 4, 2021

my bad sorry lol

@Animenosekai Animenosekai changed the title Hey guys, have you tested https://github.com/Animenosekai/translate/blob/main/translatepy/translators/google.py#L89 with the same text you want to translate to english, then tested it on the actual site, to see if it returns the same result?? Am I doing something wrong or does it do that for you too? Hey guys, have you tested GoogleTranslateV1 with the same text you want to translate to english, then tested it on the actual site, to see if it returns the same result?? Am I doing something wrong or does it do that for you too? Sep 4, 2021
@Animenosekai
Copy link
Owner Author

Animenosekai commented Sep 4, 2021

@NawtJ0sh Alright so I tested what you were talking about on my machine and it seems that translate.google.com actually uses GoogleTranslateV2 results for some reason (which is weird considering that GoogleTranslateV1 actually uses the batchexecute API, so does translate.google.com)

Test results:

>>> from translatepy.translators.google import GoogleTranslateV1, GoogleTranslateV2
>>> GoogleTranslateV1().translate('كانت الحياة هناك ، على ما يبدو حلما ، على وشك أن تتحول إلى كابوس. "(وقفة درامية)" مرحبًا يا رفاق! لقد وجدت للتو طريقة جديدة رائعة لحلاقة شعر خصلاتي!', "eng")
TranslationResult(service=Google, source=كانت الحياة هناك ، على ما يبدو حلما ، على وشك أن تتحول إلى كابوس. "(وقفة درامية)" مرحبًا يا رفاق! لقد وجدت للتو طريقة جديدة رائعة لحلاقة شعر خصلاتي!, source_language=ara, destination_language=eng, result=Life was there, apparently a dream, about to turn into a nightmare."(Dramatic pause)" Hello you guys!I just found a wonderful new way to shave my hair!)
##### python was restarted to lose the caches
>>> GoogleTranslateV2().translate('كانت الحياة هناك ، على ما يبدو حلما ، على وشك أن تتحول إلى كابوس. "(وقفة درامية)" مرحبًا يا رفاق! لقد وجدت للتو طريقة جديدة رائعة لحلاقة شعر خصلاتي!', "eng")
TranslationResult(service=Google, source=كانت الحياة هناك ، على ما يبدو حلما ، على وشك أن تتحول إلى كابوس. "(وقفة درامية)" مرحبًا يا رفاق! لقد وجدت للتو طريقة جديدة رائعة لحلاقة شعر خصلاتي!, source_language=ara, destination_language=eng, result=Life there, apparently a dream, was about to turn into a nightmare. (dramatic pause) Hey guys! I just found a great new way to shave my tresses!)

Results from the website

gtranslatev1

@Animenosekai Animenosekai changed the title Hey guys, have you tested GoogleTranslateV1 with the same text you want to translate to english, then tested it on the actual site, to see if it returns the same result?? Am I doing something wrong or does it do that for you too? GoogleTranslateV1 does not return the same results as translate.google.com (the website) Sep 4, 2021
@ZhymabekRoman
Copy link
Contributor

ZhymabekRoman commented Sep 6, 2021

I did some experiments and found out that web app uses JSON RPC interface and mobile app uses regular API (HTTP?) interface to communicate with server. In translatepy JSON RPC interface is implemented in class GoogleTranslateV1 and API in class GoogleTranslateV2.

And now back to the problem - why doesn't GoogleTranslateV1 return the same result as a web application? After all, they work through the same interface. The answer is simple - Google has implemented a special mechanism to prevent the abuse of the free text translation loophole. While previously it was implemented by getting TKK token, now (in JSON RPC) they did it by providing an encrypted header x-goog-batchexecute-bgr, I was able to find this out after several hours of experimentation. If the x-goog-batchexecute-bgr header is correct, the server returns a neater translation (like in the web app), and if not, it returns an alternative translation of the text (less neat) (lol). Now look at the screenshots below, and compare with the result from the GoogleTranslateV1 class, are they similar?

IMG_20210906_223026_328.jpg
IMG_20210906_223029_844.jpg
IMG_20210906_223031_756.jpg

It would seem to solve the problem, just insert the x-goog-batchexecute-bgr header value into the GoogleTranslateV1 implementation and everything would work fine. I thought so too, I inserted the value of x-goog-batchexecute-bgr, but the server still returned an alternate translation. I don't even know where to dig here, so I'll leave this my working CURL command to translate the text, which works fine.

curl 'https://translate.google.ru/_/TranslateWebserverUi/data/batchexecute?rpcids=MkEWBc&bl=boq_translate-webserver_20210901.06_p&soc-app=1&soc-platform=1&soc-device=1&rt=c' -H $'x-goog-batchexecute-bgr: ["\u00210tGl0ZzNAAZPKEfETyhCDHMiHOB9Utc7ACkAIwj8RiHUydhHz-81xQF2psngtfZAtJ_ELhihW-E27jZN0N0HFiMwSwIAAAS3UgAAAJBoAQeZArkSN_k7ys6RaYIMMgUwh2g-lS4RWFAMMjM94g7mIIAshFQzOBNHyCSBtkdj_5geKG3wgAly0QuecLToWWZ9EyS8qU5OmzeGBoEuVsXwycTYI_xg-E0fLotkpZKakgPpmbKafS-vFlWvw_YRkEhlyyV0ezKCX0Jl5HRj74llXP-jsFm3SZ9Gb8svzKma14A9hIL1TqRfoNmFMCQlzCdZDb_76mx36mSuT2oJJjev45ZOvpbf5JNzjdkC2074R1K7xghMfz6cclEuO-zqTnPmAopLN2AgWmMZaFgWsB70lFiUx-Tx_Xeap51u_oDJKxtNyEEQ1hKretBPu1IPd4xx9tROaHuTwyRwOnQJLZdgjE3AGkQBbQFQN-gUPTeqXGJt3yyD174mnDPf-tr4hqLH5HTDtRxY5xyiFnZtXmEzNilIVKRba1sttfsuDxyPHugxG4YUEbx8TmfDInJIkf7Zq-G3eRmgK9hkfiTPMh7yVJaVfssIq0DdJaCbg1cVMMYLUkEruqNNiL53IXpEJRCQJZYPQ49Fj6n2B67B_L0Ak0aa55KOJfviMe0n1k_5Hk6fGvVvhHQKheSTW1T0THTEB1mNYhsBsLuKMaUxeTUQ5gqs40X0bwiWOYKrSQjV70tHELUMKBRTdo84OFeRRc756WRdSsvoecT3BnTizAkqyFqJLom3tazTcJuJGmGerb8-bi0JIJt_4k1vBUm6NopFtdfTSx7WKa4hpP8oUsQN_le8VU1CRtFuepHn6_-NosCf2BGnXbrxcQs1q5Ayh-SnXeQ1EDzgmyp-Ahs_od6mj78mPm965Y8tsvytrFGcpTUBERWVTW5nyy6Sfl1KqWxmEw6sgomdJgs3rQUQ8Nw2BpQ7NWAKRzhKpZ3dn_3W6vgEMEcNBlAZGFElvz3hVhcGcDgJu92JEbvaFlNe",null,null,33212,2235,null,null,0]' -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36' --data-raw $'f.req=%5B%5B%5B%22MkEWBc%22%2C%22%5B%5B%5C%22hell%5C%22%2C%5C%22auto%5C%22%2C%5C%22en%5C%22%2Ctrue%5D%2C%5Bnull%5D%5D%22%2Cnull%2C%22generic%22%5D%5D%5D' --compressed

@Animenosekai
Copy link
Owner Author

@ZhymabekRoman Wait so is the JSON RPC result "less good" (GoogleTranslateV1) than the normal API one (GoogleTranslateV2) without the x-goog-batchexecute-bgr header?

@ZhymabekRoman
Copy link
Contributor

ZhymabekRoman commented Sep 7, 2021

@ZhymabekRoman Wait so is the JSON RPC result "less good" (GoogleTranslateV1) than the normal API one (GoogleTranslateV2) without the x-goog-batchexecute-bgr header?

Yes.

See also: vitalets/google-translate-api#70 vitalets/google-translate-api#79 UlionTse/translators#35 vitalets/google-translate-api#71
Saravananslb/py-googletranslation#27

@GeniusBroccoli
Copy link

Hi guys. GoogleV2 doesn't work for me, json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0), whats wrong ...
Google and V1 works.

@ZhymabekRoman
Copy link
Contributor

ZhymabekRoman commented Sep 19, 2021

@nnolex, I tested GoogleTranslateV2, and it works fine:

ubuntu@ip:~/translate$ python3
Python 3.8.10 (default, Jun  2 2021, 10:49:15)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from translatepy.translators.google import GoogleTranslateV2
>>> dl = GoogleTranslateV2()
>>> dl.translate("hello", "ru")
TranslationResult(service=Google, source=hello, source_language=eng, destination_language=rus, result=Привет)
>>>

@Animenosekai
Copy link
Owner Author

Animenosekai commented Sep 19, 2021

@nnolex

Hi guys. GoogleV2 doesn't work for me, json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0), whats wrong ...
Google and V1 works.

Please open a new issue ~

(also, could you provide some example to reproduce your issue)

@ZhymabekRoman
Copy link
Contributor

I will try to start investigating how tokens are generated. (God, please save me). If there are any new updates, I will post them there.

@ZhymabekRoman
Copy link
Contributor

Seems like this guy tried to make some drafts, but nada: https://github.com/lzy1960/google-translate/blob/main/packages/src/translate.ts

@Animenosekai
Copy link
Owner Author

I will try to start investigating how tokens are generated. (God, please save me). If there are any new updates, I will post them there.

Good luck lmao I tried before but didn't have time to finish (the scripts I copied are in the playground I think)

@ZhymabekRoman
Copy link
Contributor

Probably Google Translate uses this toolkit to minify code: https://github.com/google/closure-compiler

@Animenosekai
Copy link
Owner Author

Probably Google Translate uses this toolkit to minify code: https://github.com/google/closure-compiler

Yup maybe, I already used it before

@ZhymabekRoman
Copy link
Contributor

ZhymabekRoman commented Oct 15, 2023

I'm still alive after trying to debug Google Transalte lol. I spent about 120 hours trying to understand how excatly tokens are generated, and unfortunately it's really hard.

First I tried to debug it manually and that was too "fast" (sarcasm). Because Google Translate is built on top of some framework (can anyone guess what framework is used?) and minified code, tracing the value manually is not possible. So I tried to automate this process - set debugger breakpoint and step out of all values and end of result check which function generates token by searching values.

I rent VDS server with max possible by service RAM value 16 gb and run Chrome with Python script that press F9 to step in and mitmproxy that captures all Chrome debugger values. And ... Chrome ate all the memory and crashed. I tried to connect pagefile and swapfile with 100 GB - same results on Linux and Windows. Idk how we can debug, probably edit v8 JS engine lol or try to use Firefox, I think it can properly use such big swap/pagesys file through full size.

Any other ideas?

@Animenosekai
Copy link
Owner Author

Lmaoooo what how did it crash

@ZhymabekRoman
Copy link
Contributor

ZhymabekRoman commented Oct 15, 2023

something like: render process gone i'm f... tired . On Linux kernel sends SIGKILL, why Linux didn't give to him empty 90 GB swap space to manage idk :/ swap file was on fastest NVME I have ever had lol (I still use HDD on my main setup btw).

PS. Ahhh. I didn't mention in previous message that Chrome (or correctly to say kernel) didn't give to browser full swap/pagesys space, only like 2-3%. And just kernel (both in Windows and Linux) kills browser process because of lack of space in RAM, but having like 95 GB of empty swap/pagesys :/

New article in the biggest news paper: How to make Chrome eat up all your PC's RAM.

PS 1. I also tried to set JITless mode for V8 engine - same result :/

@Animenosekai
Copy link
Owner Author

I was working on Google's batchexecute and everything was going smoothly until I saw a big chunk of code which I guess is actually generating the token...

@Animenosekai
Copy link
Owner Author

This is what I need to reverse engineer now 🎐

function(r) {
                switch (r.g) {
                case 1:
                    c = c.trim();
                    c.length > f.i && (_.RF("translateText query over character limit. Length: " + c.length + " Limit: " + f.i),
                    c = c.substring(0, f.i).trim());
                    var u = new _.Eq;
                    u = _.Id(u, 2, a);
                    u = _.Id(u, 3, b);
                    g = _.Dc(u, 4, _.Ob(d), !1).Tb(c);
                    e && (u = new _.DX,
                    u = _.zj(u, 1, e),
                    _.H(g, _.DX, 5, u));
                    u = new _.Oq;
                    u = _.H(u, _.Eq, 1, g);
                    var v = f.g;
                    var w = new DZ;
                    v = v.g;
                    w = _.Aj(w, 1, Zfb(v.W ? 3 : v.s));
                    k = _.H(u, DZ, 2, w);
                    _.Rf(r, 2, 3);
                    m = Date.now();
                    return _.C(r, f.j.g(_.Xja.qb(k)), 5);
                case 5:
                    n = r.i;
                    f.g.g.qa = _.I(n, 3);
                    u = f.g;
                    w = {
                        kp: Date.now() - m,
                        a2: _.BC(n),
                        S3: _.Nq(n)
                    };
                    w = void 0 === w ? {} : w;
                    v = w.kp;
                    var x = w.a2
                      , E = w.S3;
                    w = _.aN(u, 338);
                    if (v) {
                        var D = new _.iO;
                        v = _.wj(D, 1, v);
                        _.H(w, _.iO, 82, v)
                    }
                    if (x) {
                        v = new HZ;
                        D = _.Lq(x);
                        D = _.A(D);
                        for (var K = D.next(); !K.done; K = D.next())
                            K = $fb(K.value),
                            _.hj(v, 1, FZ, K);
                        (x = _.AC(x)) && (_.kj(w, 16) !== x.Qa() || _.kj(w, 1) !== _.I(x, 3) || _.kj(w, 52).trim() !== x.Ua()) && _.vj(v, 2, !0);
                        if (E) {
                            x = [];
                            E = _.A(E);
                            for (D = E.next(); !D.done; D = E.next())
                                if (K = _.Lq(D.value),
                                0 !== K.length) {
                                    D = new GZ;
                                    K = _.A(K);
                                    for (var T = K.next(); !T.done; T = K.next())
                                        T = $fb(T.value),
                                        _.hj(D, 1, FZ, T);
                                    x.push(D)
                                }
                            _.gj(v, 3, x)
                        }
                        _.H(w, HZ, 115, v)
                    }
                    _.bN(u, w);
                    if (u = !_.Pg(c))
                        a: {
                            if (_.dj(n, _.Kq, 2))
                                for (u = 0; u < _.Lq(_.BC(n)).length; u++)
                                    if (w = _.Lq(_.BC(n))[u],
                                    !_.Pg(_.dJ(w))) {
                                        u = !1;
                                        break a
                                    }
                            u = !0
                        }
                    u && (u = f.g,
                    w = c,
                    v = _.I(n, 3),
                    E = _.fN(u, 166),
                    _.WM(_.VM(_.GM(_.FM(E, a), b), v), w),
                    _.YM(u.i, 166),
                    _.bN(u, E));
                    return r.return(n);
                case 3:
                    _.Vf(r);
                    u = f.g;
                    w = _.ZM(u.g, {
                        Cz: !0,
                        uG: !0
                    });
                    w = _.zj(w, 31, 1);
                    _.bN(u, w);
                    f.g.g.g = 0;
                    _.Wf(r, 0);
                    break;
                case 2:
                    throw q = _.Uf(r),
                    _.RF("Error getting translation", q),
                    q;
                }
            }

@ZhymabekRoman
Copy link
Contributor

ZhymabekRoman commented Oct 15, 2023

Wow, this is... really bad. If we had a working debugger, we might be able to trace all the values, but for now... Probably Firefox can save us....

@Animenosekai
Copy link
Owner Author

I'm manually tracing all the values using Chromium (Arc)

@v1993
Copy link

v1993 commented Dec 23, 2023

@ZhymabekRoman Regarding memory issues - this might be related to a vm.overcommit_memory kernel setting, which you'd want to set to 1 in this scenario. I'm very much interested in whatever comes out of this investigation - I'd join, but I'm not exactly a JS pro.

@ZhymabekRoman
Copy link
Contributor

@NawtJ0sh, Probably... I'm just burnt out after that and waiting for the best moment to do some reverse engineering. Probably modify Chrome code (lol)

@parmodrana
Copy link

@ZhymabekRoman @Animenosekai Did Anyone succeed in reverse engineer?

@ZhymabekRoman
Copy link
Contributor

@parmodrana No result...

@Animenosekai
Copy link
Owner Author

@ZhymabekRoman @Animenosekai Did Anyone succeed in reverse engineer?

Not for now. I know that with enough will it is possible, but after hours of research I always feel like I wasted time lol.

But I'll still try to find a way.

@whode
Copy link

whode commented May 7, 2024

This is what I need to reverse engineer now 🎐

It's one of the easiest code samples you can find reverse-engineering google :)

@bropines
Copy link

I assume that no one has found success?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants