-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maxUrls config not honored #25
Comments
It should honor it. How did you set the maxUrlsPerSchemeAuthority parameter? |
@boldip I left |
OK, but this is not the meaning of the parameter Can you send us the property file, and a complete log at INFO level, and the list of crawled URLs of a crawl of this kind? |
It would be important to know also how many of the records are duplicate, as duplicate records are not part of the |
Er... it's a bit embarrassing, but we just realized that at some point we deleted the code that was performing the check and never reinstated it again. So you're entirely right—presently, |
My last comment above was probably misleading. I didn't expect the change to My current understanding (when everything works) from what I gathered above is:
That's awesome. My current-config:
|
I have tried the crawler and everything runs fine, except that the
maxUrls
parameter does not seem to get honored correctly. Admittedly, I set it to a rather low value of 10K. Is there something I am missing?The text was updated successfully, but these errors were encountered: