You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your work as it is very useful. Why do you make a second request if the first one works?
try:
res = requests.get(url, timeout=timeout, headers=headers)
except (ConnectionError, HTTPError, Timeout, TooManyRedirects):
raise URLUnreachable("The URL does not exist.")
except MissingSchema: # if no schema add http as default
url = "http://" + url
# throw URLUnreachable exception for just incase
try:
res = requests.get(url, timeout=timeout, headers=headers)
except (ConnectionError, HTTPError, Timeout, TooManyRedirects):
raise URLUnreachable("The URL is unreachable.")
Also, you can reduce the rate of failure of the first block if you check for schema before any request is made (with a regex). Which would therefore allow you to merge the 2 blocks in one...
The text was updated successfully, but these errors were encountered:
I'm a little late to the party, coming 4 years after the original question, but that's a good point. The first request can also be avoided when the content of the page is supplied. And we can check for the missing schema using a regex or urlparse method, without making a request.
If you are looking for the package with duplicated request avoided and a slightly improved parsing added on top, then I could recommend looking into a fork I made at web2preview. It is fully compatible with this package as well, so the same API can be used.
Hi,
Thanks for your work as it is very useful. Why do you make a second request if the first one works?
Also, you can reduce the rate of failure of the first block if you check for schema before any request is made (with a regex). Which would therefore allow you to merge the 2 blocks in one...
The text was updated successfully, but these errors were encountered: