Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch 500/503 response from 2captcha? #162

Closed
iwasherefirst2 opened this issue Apr 7, 2022 · 11 comments
Closed

Catch 500/503 response from 2captcha? #162

iwasherefirst2 opened this issue Apr 7, 2022 · 11 comments

Comments

@iwasherefirst2
Copy link

iwasherefirst2 commented Apr 7, 2022

I am using this script now for a while and from time to time I get 500/503 responses from 2captcha.

This happens to others as well: #153 and #157

The problem is that on a 500/503 response the response text is not "CAPCHA_NOT_READY" and thus it will break the while-loop in https://github.com/flathunters/flathunter/blob/main/flathunter/abstract_crawler.py#L173
and failing at https://github.com/flathunters/flathunter/blob/main/flathunter/abstract_crawler.py#L185

How should we solve it? I could imagine that we check for the response status, and if its a 5** response status, then we just keep trying, i.e. not exiting the loop.

Although I am not sure if its maybe better just to stop trying and maybe raising an exception an skipping the immobilienscout page and check in 10min again. Sometimes, my script is trying to solve the geetest for more then
40min in a row and then failing with 500 (see also #160 where a lot of users had the issue for a short period of time). So maybe if geetest problem can't be solved by geetest during a specific time or because they respond with 500, we should just leave it and try later again, as 2captcha is probably at a too high load.

The problem is if the script fails often like this, one has to cleanup the chromedriver and chrome instances, otherwise you run into a ton of problems #155 #148 #145 Also its a bit cumbersome to keep track of the daemon.

Quicknote: The 2captcha problem was not solved in #158 (see my comment there for explanation).

Also maybe we should think of implementing the API of an alternative captcha-solver provider.

@codders
Copy link

codders commented Apr 8, 2022

Merged in #161 from @ozeidan which might fix this. Can you try the latest?

@ozeidan
Copy link

ozeidan commented Apr 8, 2022

Regarding the support of a different captcha-solver: I already have an implementation of imagetypers locally. I can push it soon if there is interest.

@ozeidan
Copy link

ozeidan commented Apr 9, 2022

Also, the captcha solver only does 3 tries before it fails, so that the other scrapers don't get blocked indefinitely.

@markuswestphal
Copy link

markuswestphal commented Apr 11, 2022

@ozeidan Requesting a pull of your Imagetypers implementation would be highly appreciated. 🙏 Currently, i am once again struggling with the Captchas not being solved. As previously discussed in #153 & #160.

@iwasherefirst2
Copy link
Author

Yes, I also cannot crawl immobilienscout24 since days now. I have a long chat history with 2captcha but they are really not helpful. They just say the fault must be on my site..

Anyway, I am also interested for imagetypers!

@panoptikum
Copy link

The same here.

@ozeidan
Copy link

ozeidan commented Apr 12, 2022

Done, PR is here: #166.

@iwasherefirst2 I got the same response from 2captcha. It must be an error on their side. It's possible to set ones 2captcha account into sandbox mode, which means that you can also download the software that their workers use to solve captchas and will receive your own captcha solving requests in there. When solving ones own captchas, their service works just fine.

Anyway, although the imagetyperz website/API don't make a technically good impression, it has worked reliably for me so far.

@codders
Copy link

codders commented Apr 12, 2022

#166 is merged now. Thanks for the great contribution @ozeidan !

@markuswestphal
Copy link

Many thanks from my side also! 🙏🏼 @ozeidan

@panoptikum
Copy link

Thanks. I'll try it out later today. This morning it worked for a short period of time.

@iwasherefirst2
Copy link
Author

Fixed by #166.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants