Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitizing localization files #1354

Merged
merged 2 commits into from
Dec 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions buildtools/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ if [[ -e ext/buildtools/webpack.config.js ]]; then
fi

set -x
node buildtools/sanitize_translations.js
tsc --build $PROJECT
buildtools/update_type_info.sh app
webpack --config $WEBPACK_CONFIG --mode production
Expand Down
76 changes: 76 additions & 0 deletions buildtools/sanitize_translations.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
// This file should be run during build. It will go through all the translations in the static/locales
// directory, and pass every key and value through the sanitizer.

const fs = require('fs');
const path = require('path');
// Initialize purifier.
const createDOMPurify = require('dompurify');
const { JSDOM } = require('jsdom');
const window = new JSDOM('').window;
const DOMPurify = createDOMPurify(window);
DOMPurify.addHook('uponSanitizeAttribute', handleSanitizeAttribute);
function handleSanitizeAttribute(node) {
if (!('target' in node)) { return; }
node.setAttribute('target', '_blank');
}

const directoryPath = readDirectoryPath();

const fileStream = fs.readdirSync(directoryPath)
.map((file) => path.join(directoryPath, file))
// Make sure it's a file
.filter((file) => fs.lstatSync(file).isFile())
// Make sure it is json file
.filter((file) => file.endsWith(".json"))
// Read the contents and put it into an array [path, json]
.map((file) => [file, JSON.parse(fs.readFileSync(file, "utf8"))]);

console.debug(`Found ${fileStream.length} files to sanitize`);

const sanitized = fileStream.map(([file, json]) => {
return [file, json, sanitizedJson(json)];
});

const onlyDifferent = sanitized.filter(([file, json, sanitizedJson]) => {
return JSON.stringify(json) !== JSON.stringify(sanitizedJson);
});

console.debug(`Found ${onlyDifferent.length} files that need sanitizing`);

// Write the sanitized json back to the files
onlyDifferent.forEach(([file, json, sanitizedJson]) => {
console.info(`Sanitizing ${file}`);
fs.writeFileSync(file, JSON.stringify(sanitizedJson, null, 4) + "\n");
});

console.info("Sanitization complete");

function sanitizedJson(json) {
// This is recursive function as some keys can be objects themselves, but all values are either
// strings or objects.
return Object.keys(json).reduce((acc, key) => {
const value = json[key];
if (typeof value === "string") {
acc[key] = purify(value);
} else if (typeof value === "object") {
acc[key] = sanitizedJson(value);
}
return acc;
}, {});
}


function readDirectoryPath() {
// Directory path is optional, it defaults to static/locales, but can be passed as an argument.
const args = process.argv.slice(2);
if (args.length > 1) {
console.error("Too many arguments, expected at most 1 argument.");
process.exit(1);
}
return args[0] || path.join(__dirname, "../static/locales");
}

function purify(inputString) {
// This removes any html tags from the string
return DOMPurify.sanitize(inputString);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that DOMPurify.sanitize in fact removes all html tags from a string? Examples at https://github.com/cure53/DOMPurify?tab=readme-ov-file#some-purification-samples-please contract that, e.g.:

DOMPurify.sanitize('<img src=x onerror=alert(1)//>'); // becomes <img src="x">

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@berhalak, I'm wondering if it's better to revert this change.

This PR care of the XSS attack vector discussed here, but I think style elements can still be injected (if strings are not escaped). Furthermore, I don't think we explicitly set innerHTML to a translation string anywhere in our code, do we? Sanitizing here certainly doesn't hurt, but I don't think it fully replaces the need to escape HTML, and it seems like that's all that #1247 needs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that DOMPurify.sanitize in fact removes all html tags from a string?

@dsagal Indeed, it does not. Currently we don't localize texts with any html tags.
In the future, we might support some of them.

Would it make sense to whitelist html tags through ALLOWED_TAGS?
https://github.com/cure53/DOMPurify/blob/f41b45df18a9666a50c1ad2662cee259230cfef4/src/config.ts#L57

Suggested change
return DOMPurify.sanitize(inputString);
return DOMPurify.sanitize(inputString, { ALLOWED_TAGS: [ /* Place here tags you want to whitelist */ ] }););

This PR care of the XSS attack vector discussed here, but I think style elements can still be injected (if strings are not escaped).

@georgegevoian I made some tests, it looks like the <style> tags are removed:

$ node
> const createDOMPurify = require('dompurify');
undefined
> const { JSDOM } = require('jsdom');
undefined
> const window = new JSDOM('').window;
undefined
> const DOMPurify = createDOMPurify(window);
undefined
> DOMPurify.sanitize('<style>hello</style>');
''

I'm wondering if it's better to revert this change. [...] Furthermore, I don't think we explicitly set innerHTML to a translation string anywhere in our code, do we?

I would like to advocate for keeping this work. Actually even if we suppose the current state of the work would actually use textContent everywhere, preventing the XSS attack, we would have the charge to prove that now and most of all in the future.
Having this tool would remove any doubts and save us time not worrying about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's good to keep sanitization, but I think it makes sense to actually fully strip HTML tags (i.e. maybe just what you suggested @fflorent , with ALLOWED_TAGS: []), since keeping them creates the impression that they might be supported, or a temptation for someone to use innerHTML in the future. In reality, there is no example currently that uses HTML tags, and (hopefully) no use of innerHTML. We already support markdown, which has been enough (useful for links and emphasis) without actually needing any html tags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no current use, what do you think about just failing and refusing to build if material is added that would require sanitization @dsagal ? Might be better than accepting it and trying to sanitize it and not sounding an alarm. (came up in #1362)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That would be even better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included all comments in the new PR #1367

}