duplicate requests sometimes not necessary #15

qwilbird · 2018-07-15T19:36:07Z

Hi,

Thanks for your work as it is very useful. Why do you make a second request if the first one works?

try:
            res = requests.get(url, timeout=timeout, headers=headers)
        except (ConnectionError, HTTPError, Timeout, TooManyRedirects):
            raise URLUnreachable("The URL does not exist.")
        except MissingSchema: # if no schema add http as default
            url = "http://" + url

        # throw URLUnreachable exception for just incase
        try:
            res = requests.get(url, timeout=timeout, headers=headers)
        except (ConnectionError, HTTPError, Timeout, TooManyRedirects):
            raise URLUnreachable("The URL is unreachable.")

Also, you can reduce the rate of failure of the first block if you check for schema before any request is made (with a regex). Which would therefore allow you to merge the 2 blocks in one...

The text was updated successfully, but these errors were encountered:

vduseev · 2022-05-24T18:41:20Z

I'm a little late to the party, coming 4 years after the original question, but that's a good point. The first request can also be avoided when the content of the page is supplied. And we can check for the missing schema using a regex or urlparse method, without making a request.

If you are looking for the package with duplicated request avoided and a slightly improved parsing added on top, then I could recommend looking into a fork I made at web2preview. It is fully compatible with this package as well, so the same API can be used.

This was referenced Feb 22, 2022

Refactor to allow content_length limitation to webpreview generation algoo/webpreview#1

Merged

Refactor to allow content_length limitation to webpreview generation #22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

duplicate requests sometimes not necessary #15

duplicate requests sometimes not necessary #15

qwilbird commented Jul 15, 2018

vduseev commented May 24, 2022

duplicate requests sometimes not necessary #15

duplicate requests sometimes not necessary #15

Comments

qwilbird commented Jul 15, 2018

vduseev commented May 24, 2022