Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain websites return 4xx HTTP Codes due to User Agent not being explicitly set #33

Open
PsukheDelos opened this issue Nov 27, 2017 · 2 comments

Comments

@PsukheDelos
Copy link

Certain websites return 4xx HTTP Codes due to User Agent not being explicitly set. Therefore, websites will appear in the Broken Links Report despite actually existing and being accessible.

Here are a few examples:
https://www.picscheme.org/
http://www.conflictarm.com

One suggestion would be to add a User Agent to our Curl request:

curl_setopt($handle, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0) SilverStripe-BrokenLinksReport-1.0.11 [https://github.com/silverstripe/silverstripe-externallinks]");
//maybe make this configurable

We could also consider making this configurable.

Ref: https://github.com/silverstripe/silverstripe-externallinks/blob/1.0.10/code/tasks/CurlLinkChecker.php#L37-L43

@alex-dna
Copy link

Links to LinkedIn are returning a http code 999, which breaks the reports as it throws "InvalidArgumentException: Unrecognised HTTP status code '999' "
A user agent would probably prevent that from happening.

@NightJar
Copy link

Thanks @alex-dna - do you feel like the suggested solution (configurable UA) is something you might have time to submit a PR for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants