Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Captcha failing repeatedly after successful first run #153

Closed
namnoops opened this issue Feb 9, 2022 · 9 comments
Closed

Captcha failing repeatedly after successful first run #153

namnoops opened this issue Feb 9, 2022 · 9 comments

Comments

@namnoops
Copy link

namnoops commented Feb 9, 2022

I've recently restarted using flathunter with 2Captcha. I'm running with a 1hr cycle, and usually the first run works perfectly - captcha takes 15-20 seconds to resolve and the script continues. Mostly on the 2nd attempt (but not always), the captcha will not get solved which results in what looks like multiple retries. If I check my 2Captcha account there are dozens of charges - but accordingly also refunds processed for them.

Below is an example log from my last run. You can see that at 19:13 and at 20:14 everything worked flawlessly, but then on the next cycle (for some reason about half an hour late) - the captcha fails and is retried for about 50 minutes until the script fails.

I contacted the 2Captcha support in hope that they could reveal if the problem is on their side. This was their response:

Have you ever received a GeeTest token from our API?
I suppose you are taking the challenge value of a rendered GeeTest widget. In such case you will always get ERROR_CAPTCHA_UNSOLVABLE as GeeTest widget can not be rendered twice with the same challenge.
https://github.com/flathunters/flathunter/blob/main/flathunter/abstract_crawler.py#L164
I'm not good in python and selenium, but I'm pretty sure that driver.page_source returns the source of the page already rendered in a browser.
You can simply make a GET requests with requests library, parse the page source and find the challenge value that can be used to solve the GeeTest.
So, just make sure you use the challenge value that was never used to render a GeeTest widget.

2022-02-08 flathunter.txt

Any help appreciated!

@namnoops namnoops changed the title Captcha failing after long wait times Captcha failing repeatedly after successful first run Feb 9, 2022
@codders
Copy link

codders commented Mar 10, 2022

We had this reported by another user, that now reports that it is fixed for them (#158). Are you still having the problem?

@markuswestphal
Copy link

markuswestphal commented Mar 29, 2022

"Another user" here 🙋🏼‍♂️: the issue has been present again for the last 4 days.
Other users now report the same issue, as in #160

@markuswestphal
Copy link

I tinkered with what the 2Captcha support suggested and tried to do simple GET requests instead of accessing driver.page_source. In the process I learnt a lot about the abstract_crawler.py code. Though i know understand that it won't be that easy, I think there might be something to what they suggest. Do you think this could be something @codders ?

@codders
Copy link

codders commented Mar 30, 2022 via email

@markuswestphal
Copy link

markuswestphal commented Mar 30, 2022

I will investigate a little further and post my thoughts and findings. In the meantime I checked and found that 2Captcha changed their API to incorporate Geetest V4 support on March 24, 2022. https://2captcha.com/de/2captcha-api#recent_changes I just checked my search history and found that this is exactly the date that the Immoscout crawler stopped working for me. This might be a hot clue. The old Geetest request seems to be constructed as before but the timing here is really odd.

@iwasherefirst2
Copy link

Could be that immobilienscout24 switched to geetest_4 ? In this case we would need to change
in the method name in

f"http://2captcha.com/in.php?key={api_key}&method=geetest&gt={gt}&challenge={challenge}&api_server=api.geetest.com&pageurl={urllib.parse.quote_plus(driver.current_url)}"
to

f"http://2captcha.com/in.php?key={api_key}&method=geetest_4&gt={gt}&challenge={challenge}&api_server=api.geetest.com&pageurl={urllib.parse.quote_plus(driver.current_url)}"

However, I can't test it at the moment, as I either get session timeouts (#145) or a new error is RemoteDisconnected "urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))".. it really looks like setting up pyton on my VPS with selenium is just not working for me.

@markuswestphal
Copy link

IM24 did not switch to geetest_4, I checked.

@iwasherefirst2
Copy link

The problem is on 2captchas side. Sometimes they can't handle the server load and then they just can't solve the geetests. We should think of setting a time-limit for trying out geetests and definetly handling 500er responses from their API. I would suggest to close this issue and continue discussion at #162

@alexanderroidl
Copy link

The problem is on 2captchas side. Sometimes they can't handle the server load and then they just can't solve the geetests. We should think of setting a time-limit for trying out geetests and definetly handling 500er responses from their API. I would suggest to close this issue and continue discussion at #162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants