-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Captcha failing repeatedly after successful first run #153
Comments
We had this reported by another user, that now reports that it is fixed for them (#158). Are you still having the problem? |
"Another user" here 🙋🏼♂️: the issue has been present again for the last 4 days. |
I tinkered with what the 2Captcha support suggested and tried to do simple GET requests instead of accessing driver.page_source. In the process I learnt a lot about the abstract_crawler.py code. Though i know understand that it won't be that easy, I think there might be something to what they suggest. Do you think this could be something @codders ? |
Sounds interesting. Can you be more concrete about what they suggest and
what you've tried and any results you had?
Thanks!
Markus Westphal ***@***.***> schrieb am Mi., 30. März 2022,
01:26:
… I tinkered with what the 2Captcha support suggested and tried to do simple
GET requests instead of accessing driver.page_source. In the process I
learnt a lot about the abstract_crawler.py code. Though i know understand
that it won't be that easy, I think there might be something to what they
suggest. Do you think this could be something @codders
<https://github.com/codders> ?
—
Reply to this email directly, view it on GitHub
<#153 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEK5UWK23C3IBTCPH346TVCOGR3ANCNFSM5N6WA7XQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I will investigate a little further and post my thoughts and findings. In the meantime I checked and found that 2Captcha changed their API to incorporate Geetest V4 support on March 24, 2022. https://2captcha.com/de/2captcha-api#recent_changes I just checked my search history and found that this is exactly the date that the Immoscout crawler stopped working for me. This might be a hot clue. The old Geetest request seems to be constructed as before but the timing here is really odd. |
Could be that immobilienscout24 switched to geetest_4 ? In this case we would need to change flathunter/flathunter/abstract_crawler.py Line 169 in a3c948c
However, I can't test it at the moment, as I either get session timeouts (#145) or a new error is RemoteDisconnected "urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))".. it really looks like setting up pyton on my VPS with selenium is just not working for me. |
IM24 did not switch to geetest_4, I checked. |
The problem is on 2captchas side. Sometimes they can't handle the server load and then they just can't solve the geetests. We should think of setting a time-limit for trying out geetests and definetly handling 500er responses from their API. I would suggest to close this issue and continue discussion at #162 |
|
I've recently restarted using flathunter with 2Captcha. I'm running with a 1hr cycle, and usually the first run works perfectly - captcha takes 15-20 seconds to resolve and the script continues. Mostly on the 2nd attempt (but not always), the captcha will not get solved which results in what looks like multiple retries. If I check my 2Captcha account there are dozens of charges - but accordingly also refunds processed for them.
Below is an example log from my last run. You can see that at 19:13 and at 20:14 everything worked flawlessly, but then on the next cycle (for some reason about half an hour late) - the captcha fails and is retried for about 50 minutes until the script fails.
I contacted the 2Captcha support in hope that they could reveal if the problem is on their side. This was their response:
2022-02-08 flathunter.txt
Any help appreciated!
The text was updated successfully, but these errors were encountered: