-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reader doesn't extract any content from this page even though its quite simple? #105
Comments
okay this is weird, i get the same empty result; however if i use pageshot mode it does return the full webpage could u look at it? @nomagick |
Thanks, @hanxiao. Just wanted to bring this to your attention! I will keep following the thread and help out if I can. |
Hi @oscar-o-oneill did you have same issues on other pages? I found that it seems there is some trick in this specific webpage that makes the browser treat the webpage isn't fully loaded until encountering the Timeout, which is 30s in this case by default. But I'm still trying to identify what's the trick in the page makes this situation. It would be helpful if you have more bad cases, so that I can find the common pattern |
Hi @mapleeit, no, I have not found this issue on many other pages. Reader usually works really well! I will definitely report any issues I may find with other web pages in the future. Thank you for making Jina AI Reader. |
It looks like some kind of bot-prevention mechanism from the "edgesuite". It seems to be replacing the DOM contents in a fraction and making Reader capture its warning messages. |
Hi, I love reader! It's so useful. I am playing around with it, and I noticed it isn't able to extract any content from this URL.
https://www.canada.ca/en/women-gender-equality/gender-based-violence/gender-based-violence-glossary.html
On navigating to the reader page for it, I just get this response:
What's going on? It's a fairly simple page.
The text was updated successfully, but these errors were encountered: