Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitizing localization files #1354

Merged
merged 2 commits into from
Dec 23, 2024
Merged

Sanitizing localization files #1354

merged 2 commits into from
Dec 23, 2024

Conversation

berhalak
Copy link
Contributor

Context

Sanitizing localization files.

Proposed solution

Added a script that is executed during build time that will sanitize every key in every translations resource file.

Related issues

#1350

Has this been tested?

Tested manually.

@berhalak berhalak changed the title Sanitizing lolication files Sanitizing localization files Dec 23, 2024
Copy link
Contributor

@georgegevoian georgegevoian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @berhalak!

@berhalak berhalak merged commit e280ece into main Dec 23, 2024
12 checks passed
@berhalak berhalak deleted the sanitize-localization branch December 23, 2024 15:38

function purify(inputString) {
// This removes any html tags from the string
return DOMPurify.sanitize(inputString);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that DOMPurify.sanitize in fact removes all html tags from a string? Examples at https://github.com/cure53/DOMPurify?tab=readme-ov-file#some-purification-samples-please contract that, e.g.:

DOMPurify.sanitize('<img src=x onerror=alert(1)//>'); // becomes <img src="x">

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@berhalak, I'm wondering if it's better to revert this change.

This PR care of the XSS attack vector discussed here, but I think style elements can still be injected (if strings are not escaped). Furthermore, I don't think we explicitly set innerHTML to a translation string anywhere in our code, do we? Sanitizing here certainly doesn't hurt, but I don't think it fully replaces the need to escape HTML, and it seems like that's all that #1247 needs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that DOMPurify.sanitize in fact removes all html tags from a string?

@dsagal Indeed, it does not. Currently we don't localize texts with any html tags.
In the future, we might support some of them.

Would it make sense to whitelist html tags through ALLOWED_TAGS?
https://github.com/cure53/DOMPurify/blob/f41b45df18a9666a50c1ad2662cee259230cfef4/src/config.ts#L57

Suggested change
return DOMPurify.sanitize(inputString);
return DOMPurify.sanitize(inputString, { ALLOWED_TAGS: [ /* Place here tags you want to whitelist */ ] }););

This PR care of the XSS attack vector discussed here, but I think style elements can still be injected (if strings are not escaped).

@georgegevoian I made some tests, it looks like the <style> tags are removed:

$ node
> const createDOMPurify = require('dompurify');
undefined
> const { JSDOM } = require('jsdom');
undefined
> const window = new JSDOM('').window;
undefined
> const DOMPurify = createDOMPurify(window);
undefined
> DOMPurify.sanitize('<style>hello</style>');
''

I'm wondering if it's better to revert this change. [...] Furthermore, I don't think we explicitly set innerHTML to a translation string anywhere in our code, do we?

I would like to advocate for keeping this work. Actually even if we suppose the current state of the work would actually use textContent everywhere, preventing the XSS attack, we would have the charge to prove that now and most of all in the future.
Having this tool would remove any doubts and save us time not worrying about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's good to keep sanitization, but I think it makes sense to actually fully strip HTML tags (i.e. maybe just what you suggested @fflorent , with ALLOWED_TAGS: []), since keeping them creates the impression that they might be supported, or a temptation for someone to use innerHTML in the future. In reality, there is no example currently that uses HTML tags, and (hopefully) no use of innerHTML. We already support markdown, which has been enough (useful for links and emphasis) without actually needing any html tags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no current use, what do you think about just failing and refusing to build if material is added that would require sanitization @dsagal ? Might be better than accepting it and trying to sanitize it and not sounding an alarm. (came up in #1362)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That would be even better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants