Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No explicit header set for the DOWNLOAD_SESSION #483

Open
rtibbles opened this issue Mar 13, 2024 · 14 comments · May be fixed by #563
Open

No explicit header set for the DOWNLOAD_SESSION #483

rtibbles opened this issue Mar 13, 2024 · 14 comments · May be fixed by #563

Comments

@rtibbles
Copy link
Member

rtibbles commented Mar 13, 2024

Observed behavior

The DOWNLOAD_SESSION that is used to download resources sets no explicit header - this proves to be an issue, for example, when downloading from wikimedia sites, because of their User Agent policy: https://meta.wikimedia.org/wiki/User-Agent_policy

Expected behavior

Ideally, we would follow the kind of User-Agent that the wikimedia policy spells out - we already retrieve the email for the user whose API token we are running with from Studio, so we should reuse this to set the header.

With that in place, we would then do the following for the User Agent:

f"Ricecooker/{ricecooker.__version__} bot ({user_email})"

User-facing consequences

Attempts to scrape without setting these headers may be treated as malicious.

Steps to reproduce

Attempt to download any file from wikimedia

Context

Ricecooker develop branch

@nikkuAg
Copy link
Contributor

nikkuAg commented Mar 26, 2024

Hey, is this issue still open? I would like to work on this

@rtibbles
Copy link
Member Author

Absolutely @nikkuAg - I will assign you, thanks for volunteering!

@MisRob
Copy link
Member

MisRob commented Jun 11, 2024

Hi @nikkuAg, are you still planning to work on this?

@ghost
Copy link

ghost commented Nov 8, 2024

Hey, is this issue still open? I would like to work on this

@AllanOXDi
Copy link
Member

Hi @manic012 I have assigned it to you. Thanks for contributing

@AllanOXDi AllanOXDi assigned ghost Nov 8, 2024
@adoo100
Copy link

adoo100 commented Nov 16, 2024

Hi @rtibbles if this issue is still unresolved I would love to be a part and work in it :)

@akolson
Copy link
Member

akolson commented Nov 18, 2024

Hi @adoo100! Thank you! I have assigned it to you! Please feel free to reach out to @rtibbles incase it is not clear what needs to be done while resolving the issue.

@Keshav123454
Copy link

Keshav123454 commented Dec 10, 2024

Does Ricecooker have it's seprate development guide??

@AllanOXDi
Copy link
Member

@MisRob
Copy link
Member

MisRob commented Dec 17, 2024

Hi @adoo100, I wanted to mention that Learning Equality will be closed from December 23 to January 5.

@Divyanshi750
Copy link

Hi, is this issue still open to work? I would like to contribute to it!

@AlexVelezLl
Copy link
Member

Hey @adoo100, are you still working on this? Or can we reassign?

@AlexVelezLl
Copy link
Member

Unassigning due to lack of activity in several months. I will assign this to you @Divyanshi750! Please let us know if you have any question :)

@AlexVelezLl AlexVelezLl assigned Divyanshi750 and unassigned adoo100 Jan 24, 2025
@Divyanshi750 Divyanshi750 linked a pull request Jan 24, 2025 that will close this issue
@Divyanshi750
Copy link

I have fixed the issue and also added test cases for the same. Please review @AlexVelezLl @rtibbles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants