Crawler: Ignore Avatars via robots.txt #10

Sapd · 2016-02-09T17:36:58Z

As I worked with a scalable web crawler (apache nutch), to scan my serverlist and outgoing links, I noticed that you didn't forbid crawler to scan the avatars.

I would suggest that you do so in robots.txt, to avoid unnecessary traffic from other crawlers (The problem is that you don't use the query syntax like ?avatar=, so the crawler thinks the avatars are individual pages).
e.g.

User-agent: *
Disallow: /avatars/*

The text was updated successfully, but these errors were encountered:

LukeHandle · 2016-02-15T00:42:12Z

I guess the toss of up is do we see any benefit from being indexed at the image level? That's probably (?) a no, but I would be open to opinion on that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawler: Ignore Avatars via robots.txt #10

Crawler: Ignore Avatars via robots.txt #10

Sapd commented Feb 9, 2016

LukeHandle commented Feb 15, 2016

Crawler: Ignore Avatars via robots.txt #10

Crawler: Ignore Avatars via robots.txt #10

Comments

Sapd commented Feb 9, 2016

LukeHandle commented Feb 15, 2016