-
-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitizing localization files #1354
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
// This file should be run during build. It will go through all the translations in the static/locales | ||
// directory, and pass every key and value through the sanitizer. | ||
|
||
const fs = require('fs'); | ||
const path = require('path'); | ||
// Initialize purifier. | ||
const createDOMPurify = require('dompurify'); | ||
const { JSDOM } = require('jsdom'); | ||
const window = new JSDOM('').window; | ||
const DOMPurify = createDOMPurify(window); | ||
DOMPurify.addHook('uponSanitizeAttribute', handleSanitizeAttribute); | ||
function handleSanitizeAttribute(node) { | ||
if (!('target' in node)) { return; } | ||
node.setAttribute('target', '_blank'); | ||
} | ||
|
||
const directoryPath = readDirectoryPath(); | ||
|
||
const fileStream = fs.readdirSync(directoryPath) | ||
.map((file) => path.join(directoryPath, file)) | ||
// Make sure it's a file | ||
.filter((file) => fs.lstatSync(file).isFile()) | ||
// Make sure it is json file | ||
.filter((file) => file.endsWith(".json")) | ||
// Read the contents and put it into an array [path, json] | ||
.map((file) => [file, JSON.parse(fs.readFileSync(file, "utf8"))]); | ||
|
||
console.debug(`Found ${fileStream.length} files to sanitize`); | ||
|
||
const sanitized = fileStream.map(([file, json]) => { | ||
return [file, json, sanitizedJson(json)]; | ||
}); | ||
|
||
const onlyDifferent = sanitized.filter(([file, json, sanitizedJson]) => { | ||
return JSON.stringify(json) !== JSON.stringify(sanitizedJson); | ||
}); | ||
|
||
console.debug(`Found ${onlyDifferent.length} files that need sanitizing`); | ||
|
||
// Write the sanitized json back to the files | ||
onlyDifferent.forEach(([file, json, sanitizedJson]) => { | ||
console.info(`Sanitizing ${file}`); | ||
fs.writeFileSync(file, JSON.stringify(sanitizedJson, null, 4) + "\n"); | ||
}); | ||
|
||
console.info("Sanitization complete"); | ||
|
||
function sanitizedJson(json) { | ||
// This is recursive function as some keys can be objects themselves, but all values are either | ||
// strings or objects. | ||
return Object.keys(json).reduce((acc, key) => { | ||
const value = json[key]; | ||
if (typeof value === "string") { | ||
acc[key] = purify(value); | ||
} else if (typeof value === "object") { | ||
acc[key] = sanitizedJson(value); | ||
} | ||
return acc; | ||
}, {}); | ||
} | ||
|
||
|
||
function readDirectoryPath() { | ||
// Directory path is optional, it defaults to static/locales, but can be passed as an argument. | ||
const args = process.argv.slice(2); | ||
if (args.length > 1) { | ||
console.error("Too many arguments, expected at most 1 argument."); | ||
process.exit(1); | ||
} | ||
return args[0] || path.join(__dirname, "../static/locales"); | ||
} | ||
|
||
function purify(inputString) { | ||
// This removes any html tags from the string | ||
return DOMPurify.sanitize(inputString); | ||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that
DOMPurify.sanitize
in fact removes all html tags from a string? Examples at https://github.com/cure53/DOMPurify?tab=readme-ov-file#some-purification-samples-please contract that, e.g.:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@berhalak, I'm wondering if it's better to revert this change.
This PR care of the XSS attack vector discussed here, but I think
style
elements can still be injected (if strings are not escaped). Furthermore, I don't think we explicitly setinnerHTML
to a translation string anywhere in our code, do we? Sanitizing here certainly doesn't hurt, but I don't think it fully replaces the need to escape HTML, and it seems like that's all that #1247 needs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsagal Indeed, it does not. Currently we don't localize texts with any html tags.
In the future, we might support some of them.
Would it make sense to whitelist html tags through
ALLOWED_TAGS
?https://github.com/cure53/DOMPurify/blob/f41b45df18a9666a50c1ad2662cee259230cfef4/src/config.ts#L57
@georgegevoian I made some tests, it looks like the
<style>
tags are removed:I would like to advocate for keeping this work. Actually even if we suppose the current state of the work would actually use
textContent
everywhere, preventing the XSS attack, we would have the charge to prove that now and most of all in the future.Having this tool would remove any doubts and save us time not worrying about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's good to keep sanitization, but I think it makes sense to actually fully strip HTML tags (i.e. maybe just what you suggested @fflorent , with
ALLOWED_TAGS: []
), since keeping them creates the impression that they might be supported, or a temptation for someone to useinnerHTML
in the future. In reality, there is no example currently that uses HTML tags, and (hopefully) no use ofinnerHTML
. We already support markdown, which has been enough (useful for links and emphasis) without actually needing any html tags.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's no current use, what do you think about just failing and refusing to build if material is added that would require sanitization @dsagal ? Might be better than accepting it and trying to sanitize it and not sounding an alarm. (came up in #1362)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 That would be even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Included all comments in the new PR #1367