Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: Some domains only return one / a few links back. #18

Closed
jacqueline-chan opened this issue Dec 10, 2020 · 5 comments
Closed

Issue: Some domains only return one / a few links back. #18

jacqueline-chan opened this issue Dec 10, 2020 · 5 comments
Assignees

Comments

@jacqueline-chan
Copy link
Contributor

Needs investigation and a solution

To see the list of domains that have this issue visit http://199.241.167.146:80 or look at the database on the server

@jacqueline-chan
Copy link
Contributor Author

some suspicions: paywall

@kstapelfeldt
Copy link
Member

kstapelfeldt commented Dec 15, 2020

Sometimes we are only getting one link crawled.
http://972mag.com/ gets only one link - but doesn't have a paywall.
Need to investigate issue.

@jacqueline-chan
Copy link
Contributor Author

@RaiyanRahman to investigate

@kstapelfeldt
Copy link
Member

Most likely because the crawl is so big with so many links in the queue. No testing as of yet on this issue to verify.

@kstapelfeldt
Copy link
Member

Writing tests as part of the crawl review ticket (#19 ) - closing as the discussion is happening there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants