All notable changes to spatie/crawler
will be documented in this file.
- only crawl links that are completely parsed
- fix curl streaming responses (#295)
- add
setParseableMimeTypes()
(#293)
- fix LinkAdder not receiving the updated DOM (#292)
- allow tightenco/collect 7 (#282)
- respect maximum response size when checking Robots Meta tags (#281)
- allow Guzzle 7
- allow symfony 5 components
- allow tightenco/collect 6.0 and up (#261)
- fix crash when
CrawlRequestFailed
receives an exception other thanRequestException
- case-insensitive user agent bugfix (#249)
- fix bugs in
hasAlreadyBeenProcessed
THIS VERSION CONTAINS A CRITICAL BUG, DO NOT USE
- added
ArrayCrawlQueue
; this is now the default queue - deprecated
CollectionCrawlQueue
- Make user agent configurable (#246)
delayBetweenRequests
now usesint
instead offloat
everywhere
- remove incorrect docblock
- handle relative paths after redirects correctly
- add
getUrls
andgetPendingUrls
- Respect maximumDepth in combination with robots (#181)
- Properly handle
noindex,follow
urls.
- added capability of crawling links with rel= next or prev
- add
setDelayBetweenRequests
- fix an issue where the node in the depthtree could be null
- improve performance by only building the depth three when needed
- handlers will get html after JavaScript has been processed
- refactor to improve extendability
- always add links to pool if robots shouldn't be respected
- refactor of internals
- make it possible to override
$defaultClientOptions
- Bump minimum required version of
spatie/robots-txt
to1.0.1
.
- Respect robots.txt
- improved extensibility by removing php native type hinting of url, queue and crawler pool Closures
- do not follow links that have attribute
rel
set tonofollow
- Support both
Illuminate
's andTighten
'sCollection
.
- fix bugs when installing into a Laravel app
- the
CrawlObserver
andCrawlProfile
are upgraded from interfaces to abstract classes - don't crawl
tel:
links
- fix endless loop
- add
setCrawlObservers
,addCrawlObserver
- fix
setMaximumResponseSize
(someday we'll get this right)
CONTAINS BUGS, DO NOT USE THIS VERSION
- fix
setMaximumResponseSize
CONTAINS BUGS, DO NOT USE THIS VERSION
- fix
setMaximumResponseSize
CONTAINS BUGS, DO NOT USE THIS VERSION
- add
setMaximumResponseSize
- fix for exception being thrown when encountering a malformatted url
- use
\Psr\Http\Message\UriInterface
for all urls - use Puppeteer
- drop support from PHP 7.0
- allow symfony 4 crawler
- added the ability to change the crawl queue
- more performance improvements
- performance improvements
- add
CrawlSubdomains
profile
- add crawl count limit
- add depth limit
- add JavaScript execution
- fix deps for PHP 7.2
- add
EmptyCrawlObserver
- refactor to make use of Symfony Crawler's
link
function
- fix bugs around relative urls
- add
CrawlInternalUrls
- make sure the passed client options are being used
- second attempt to fix detection of redirects
- fix detection of redirects
- fix the default timeout of 5 seconds
- set a default timeout of 5 seconds
- fix for non responding hosts
- fix for the accidental crawling of mailto-links
- improve performance by concurrent crawling
- make it possible to determine on which url a url was found
- Ignore
tel:
links when crawling
- Added
path
,segment
andsegments
functions toUrl
- Updated the required version of Guzzle to a secure version
- Fixed a bug where the crawler would not take query strings into account
- Fixed a bug where the crawler tries to follow JavaScript links
- Add support for DomCrawler 3.x
- Fix for normalizing relative links when using non-80 ports
- Add support for custom ports
- Lower required php version to 5.5
- Make url's case sensitive
- First release