-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVCS-488] Improve docx rendering - MFR Part #304
base: develop
Are you sure you want to change the base?
[SVCS-488] Improve docx rendering - MFR Part #304
Conversation
Adding support for a `public_file` query param so the OSF can request a public renderer. Added office365 which is a public renderer. This uses office online to do .docx file conversions.
e2d6404
to
16c92e3
Compare
84b7b7d
to
d6cb8f8
Compare
Note: this PR cannot be tested locally since Office rendering service requires a public available file. Here is what shows up locally, and sandboxing does not cause any issue. Here is what shows up locally when hard-coding a staging download url with with iframe sandboxing turned on. Here is what shows up locally when hard-coding a staging download url with iframe sandboxing turned off. |
d6cb8f8
to
7b3a984
Compare
95c2eec
to
2b04f43
Compare
2b04f43
to
2eec820
Compare
from mfr.core.metrics import MetricsRecord | ||
from mfr.server import settings as server_settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is settings shadowed somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This is 6-month old code. Needs update. It no longer make sense to me.
from mako.lookup import TemplateLookup | ||
|
||
from mfr.core import extension | ||
from mfr.extensions.office365 import settings as office365_settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to be qualified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the previous comment. We no longer do alias if there is no shadowing issues. Needs update.
} | ||
</style> | ||
|
||
<iframe src=${url} frameborder='0'></iframe> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing it looks like is in here is another iframe - is it possible to return a 302 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your point but not sure how 302 works. OSF creates the parent iframe and we shouldn't modify it from MFR. Is embedded iframe a bad practice? For MFR itself, it should use iframe here since it loads content from an untrusted external web service.
response = await self._make_request('GET', download_url, allow_redirects=False, headers=headers) | ||
|
||
if response.status >= 400: | ||
if response.status >= HTTPStatus.BAD_REQUEST: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dig it.
|
||
async def download(self): | ||
"""Download file from WaterButler, returning stream.""" | ||
download_url = await self._fetch_download_url() | ||
headers = {settings.MFR_IDENTIFYING_HEADER: '1'} | ||
headers = {provider_settings.MFR_IDENTIFYING_HEADER: '1'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again here, wondering why this needs to be qualified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this should be pd_settings
as we have been using it for both WB and MFR.
'Unable to download the requested file, please try again later.', | ||
download_url=download_url, | ||
response=resp_text, | ||
provider=self.NAME, | ||
) | ||
|
||
self.metrics.add('download.saw_redirect', False) | ||
if response.status in (302, 301): | ||
if response.status in (HTTPStatus.MOVED_PERMANENTLY, HTTPStatus.FOUND): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a list rather than a tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sure. I had the same question. However, it was a tuple and the code base uses tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@birdbrained Thanks for the review. This PR has been very old and we haven't decided whether this is the approach yet. I will rebase with the latest develop
and fix the obsoleted import style.
More If You Are Interested
The blocking issues for using Microsofts rendering service are:
- This service is not a reliable service.
- This service cached their result for a few hours.
- This service may have rate limiting.
More discussion and investigation is needed before continue developing. - We can only render public files through this service.
from mfr.core.metrics import MetricsRecord | ||
from mfr.server import settings as server_settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This is 6-month old code. Needs update. It no longer make sense to me.
from mako.lookup import TemplateLookup | ||
|
||
from mfr.core import extension | ||
from mfr.extensions.office365 import settings as office365_settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the previous comment. We no longer do alias if there is no shadowing issues. Needs update.
|
||
async def download(self): | ||
"""Download file from WaterButler, returning stream.""" | ||
download_url = await self._fetch_download_url() | ||
headers = {settings.MFR_IDENTIFYING_HEADER: '1'} | ||
headers = {provider_settings.MFR_IDENTIFYING_HEADER: '1'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this should be pd_settings
as we have been using it for both WB and MFR.
'Unable to download the requested file, please try again later.', | ||
download_url=download_url, | ||
response=resp_text, | ||
provider=self.NAME, | ||
) | ||
|
||
self.metrics.add('download.saw_redirect', False) | ||
if response.status in (302, 301): | ||
if response.status in (HTTPStatus.MOVED_PERMANENTLY, HTTPStatus.FOUND): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sure. I had the same question. However, it was a tuple and the code base uses tuple.
} | ||
</style> | ||
|
||
<iframe src=${url} frameborder='0'></iframe> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your point but not sure how 302 works. OSF creates the parent iframe and we shouldn't modify it from MFR. Is embedded iframe a bad practice? For MFR itself, it should use iframe here since it loads content from an untrusted external web service.
Ticket
https://openscience.atlassian.net/browse/SVCS-488
OSF side PR: CenterForOpenScience/osf.io#8002
Purpose
This ticket replaces #282. Credit goes to @AddisonSchiller 🎆🎆.
.docx
rendering is very intensive on the OSF. By using Microsoft's online rendering service to render publicly available.docx
files, we can remove a lot of pressure from theunoconv
container.Changes
public_file
query param. This query param is optional.public_file=True
denotes that the file is public (the project it belongs to is public) , whilepublic_file=False
denotes that it is private. All other values forpublic_file
cause errors to be raised.ProviderMetadata
now has anis_public
flag, with default value set toFalse
.Side Effects
iframe
sandboxing may cause issue, need to verify on staging.QA Notes
The Office365 renderer does not use the
.pdf
renderer likeunoconv
used to, so the pdfs that get made by this renderer may not display exactly the same. More QA notes to come. There is also aREADME.md
in the renderer with more information about testing.Deployment Notes