-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Bad file urls in OAI results for theses #1201
Comments
See related issue at ualbertalib/metadata#457 |
The OAI instructions sent by LAC on 2019-12-20 still include the no-space requirement for links to files:
|
Here is a list of triple-encoded file urls based on an OAI harvest done on 2020-02-27 |
This particular set of encoding issues is now fixed with the launch of OAISys:
|
Describe the bug
The file urls in ETDMS and ORE responses for items where the file name contains a space are triple-urlencoded, and lead to a 404. Items whose filename does not contain a space are ok.
To Reproduce
Steps to reproduce the behavior:
etd_ms:identifier
field that points to a pdf: https://era.library.ualberta.ca/items/03d12bd1-3559-4d03-927c-dc7c7e7b8106/view/a40f5a31-ee78-42dc-8bc9-8e8648acf596/Hashemi_Seyed_Fall%2525202013.pdf%252520
https://era.library.ualberta.ca/items/03d12bd1-3559-4d03-927c-dc7c7e7b8106/view/a40f5a31-ee78-42dc-8bc9-8e8648acf596/Hashemi_Seyed_Fall-202013.pdf
. Here the space has been replaced with a hyphen rather than being encoded. The link works.Expected behavior
The link in the OAI record should lead to a successful download of the pdf. The link in the OAI record should also meet LAC's harvesting requirements, which are in the process of clarification (hence the WIP in the title of this issue).
Additional context
This is related to LAC's requirement that file names in download urls should not have "illegal characters", including spaces, when presented in OAI records. We need to investigate further to see whether we have other special characters than spaces, and whether they are handled well in the View urls. @sfarnel is investigating based on @leahvanderjagt 's email forwarding LAC's requirements.
This may be related to a problem on which we worked with them in 2017, when we found that their harvester url-decodes file urls before requesting them, causing a single-encoded space to become a simple space and break the http request. We don't know the current status of this bug. I've emailed the history of our 2017 investigation to them and will update this issue when we have more complete information.
The text was updated successfully, but these errors were encountered: