-
-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change: sanitize markdown to prevent JS execution in documentation output #692
Comments
Hi again @MaddyGuthridge, thanks for the request.
I often use To update a few things in my insiders pages: <script src="../js/insiders.js"></script>
<script>updateInsidersPage('pawamoy');</script> To make some SVGs zoomable in a specific page: <script>
document.addEventListener("DOMContentLoaded", function(){
const divs = document.getElementsByClassName("interactiveSVG");
for (let i = 0; i < divs.length; i++) {
if (!divs[i].firstElementChild.id) {
divs[i].firstElementChild.id = `interactiveSVG-${i}`
}
svgPanZoom(`#${divs[i].firstElementChild.id}`, {});
}
});
</script> Generally speaking, Markdown is a superset of HTML, so I don't find it surprising that HTML (including However I agree that it could make sense to provide an option at the level of mkdocstrings, to escape HTML (or just So the solution I suggest here is to use a Griffe extension that will escape HTML import bleach
import griffe
# I checked bleach's docs, but unfortunately it doesn't look
# like there's a way to only escape script elements.
class EscapeHTMLExtension(griffe.Extension):
def on_instance(self, obj: griffe.Object, **kwargs) -> None:
if obj.docstring:
obj.docstring.value = bleach.clean(obj.docstring.value) WDYT? |
If you are using bleach to sanitize Markdown output, I might suggest yourcelf/bleach-allowlist which contains a comprehensive set of bleach configs. |
Thanks @waylan! One thing I'm unsure of, is whether bleach is able to escape Markdown, without breaking it. For example, will the following be left intact, or will a = 0
"""Something about `<script>`.""" |
Bleach does not understand Markdown at all. One should only pass the HTML output of Markdown to Bleach. Of course, if valid HTML code blocks (or spans) contain Markdown, Bleach will understand that those blocks are not HTML and not try to parse them, but I'm not sure what escaping it will do in that scenario. However, if Bleach is passed Markdown which is not in a valid HTML code block or span, then all bets are off. Assume Bleach will not handle it in a sensible way. |
Thanks. Escaping docstrings with Bleach is probably not a good idea then. Maybe mkdocstrings could allow hooking into its |
Is your change request related to a problem? Please describe.
I recently discovered in this mind-bending issue that most Markdown implementations don't sanitize their output, with Python-Markdown being no exception. This means that (contrary to popular belief), Markdown text cannot be trusted to be safe.
While developers should be careful to only accept merges after thoroughly reviewing code, there are lots of ways to subtlely embed JS into documents, which can be easily overlooked (eg using
onerror
in an<img>
tag). I cannot think of any non-malicious to embed executing JS code within documentation markdown, when it is so much easier to bundle additional JS usingmkdocs
(which reviewers would be much more suspicious of, and therefore much more careful of).Describe the solution you'd like
Sanitize the Markdown output within documentation to prevent executable JS code from being embedded in the output. Perhaps mozilla/bleach can be used.
Describe alternatives you've considered
Do nothing, but document that docstrings cannot be blindly trusted to be safe, as JS can be embedded within them.
Additional context
The text was updated successfully, but these errors were encountered: