-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No explicit header set for the DOWNLOAD_SESSION #483
Comments
Hey, is this issue still open? I would like to work on this |
Absolutely @nikkuAg - I will assign you, thanks for volunteering! |
Hi @nikkuAg, are you still planning to work on this? |
Hey, is this issue still open? I would like to work on this |
Hi @manic012 I have assigned it to you. Thanks for contributing |
Hi @rtibbles if this issue is still unresolved I would love to be a part and work in it :) |
Does Ricecooker have it's seprate development guide?? |
Hi @Keshav123454 Here is a starting point https://ricecooker.readthedocs.io/en/latest/developer/index.html |
Hi @adoo100, I wanted to mention that Learning Equality will be closed from December 23 to January 5. |
Hi, is this issue still open to work? I would like to contribute to it! |
Hey @adoo100, are you still working on this? Or can we reassign? |
Unassigning due to lack of activity in several months. I will assign this to you @Divyanshi750! Please let us know if you have any question :) |
I have fixed the issue and also added test cases for the same. Please review @AlexVelezLl @rtibbles |
Observed behavior
The DOWNLOAD_SESSION that is used to download resources sets no explicit header - this proves to be an issue, for example, when downloading from wikimedia sites, because of their User Agent policy: https://meta.wikimedia.org/wiki/User-Agent_policy
Expected behavior
Ideally, we would follow the kind of User-Agent that the wikimedia policy spells out - we already retrieve the email for the user whose API token we are running with from Studio, so we should reuse this to set the header.
With that in place, we would then do the following for the User Agent:
User-facing consequences
Attempts to scrape without setting these headers may be treated as malicious.
Steps to reproduce
Attempt to download any file from wikimedia
Context
Ricecooker develop branch
The text was updated successfully, but these errors were encountered: