-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch 500/503 response from 2captcha? #162
Comments
Regarding the support of a different captcha-solver: I already have an implementation of imagetypers locally. I can push it soon if there is interest. |
Also, the captcha solver only does 3 tries before it fails, so that the other scrapers don't get blocked indefinitely. |
Yes, I also cannot crawl immobilienscout24 since days now. I have a long chat history with 2captcha but they are really not helpful. They just say the fault must be on my site.. Anyway, I am also interested for imagetypers! |
The same here. |
Done, PR is here: #166. @iwasherefirst2 I got the same response from 2captcha. It must be an error on their side. It's possible to set ones 2captcha account into sandbox mode, which means that you can also download the software that their workers use to solve captchas and will receive your own captcha solving requests in there. When solving ones own captchas, their service works just fine. Anyway, although the imagetyperz website/API don't make a technically good impression, it has worked reliably for me so far. |
Many thanks from my side also! 🙏🏼 @ozeidan |
Thanks. I'll try it out later today. This morning it worked for a short period of time. |
Fixed by #166. |
I am using this script now for a while and from time to time I get 500/503 responses from 2captcha.
This happens to others as well: #153 and #157
The problem is that on a 500/503 response the response text is not "CAPCHA_NOT_READY" and thus it will break the while-loop in https://github.com/flathunters/flathunter/blob/main/flathunter/abstract_crawler.py#L173
and failing at https://github.com/flathunters/flathunter/blob/main/flathunter/abstract_crawler.py#L185
How should we solve it? I could imagine that we check for the response status, and if its a 5** response status, then we just keep trying, i.e. not exiting the loop.
Although I am not sure if its maybe better just to stop trying and maybe raising an exception an skipping the immobilienscout page and check in 10min again. Sometimes, my script is trying to solve the geetest for more then
40min in a row and then failing with 500 (see also #160 where a lot of users had the issue for a short period of time). So maybe if geetest problem can't be solved by geetest during a specific time or because they respond with 500, we should just leave it and try later again, as 2captcha is probably at a too high load.
The problem is if the script fails often like this, one has to cleanup the chromedriver and chrome instances, otherwise you run into a ton of problems #155 #148 #145 Also its a bit cumbersome to keep track of the daemon.
Quicknote: The 2captcha problem was not solved in #158 (see my comment there for explanation).
Also maybe we should think of implementing the API of an alternative captcha-solver provider.
The text was updated successfully, but these errors were encountered: