Releases: bee4/robots.txt
Official RFC support
Now, with the help of @ranvis, the lib fully support the official RFC.
All the details are accessible here: http://www.robotstxt.org/norobots-rfc.txt
This release include :
A great move !
All issues has been fixed \o/
The test suite is now ran with atoum. It now has a better coverage and some integration tests on real world examples.
This lib is following all the robots.txt
official guidelines 😄 :
- Multiple User-Agent definition
- Case insensitive User-Agent rules
- Pattern building using
$
,*
This release also welcome different exceptions to identify more precisely the errors :
- InvalidUrlException
- InvalidContentException
Hope this help !
Hotfix release
This release include some updates about the parsing of empty robots.txt file.
Some rules was not properly handled by the v0.0.0 :
- If the file is empty, this is like an Allow All directive
- If the file contains two rules with the same UserAgent, a specific exception is thrown :
Bee4\RobotsTxt\Exception\DuplicateRuleException
First release !
For the implemented rules, you must Visit Google details here: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
It include a simple API to manipulate robots.txt files.