A Post-Unicode Normalization Vulnerability

Summary

The next code snippet is vulnerable to post-Unicode normalization. It's a CWE-176.
Such a vulnerability happens when some security checks are performed before a Unicode normalization.

    /**
     * Sanitises a string so that it can be used as a div id
     *
     * @param name
     * @return Returns sanitized string
     */
    public static String cleanName(String name) {
        return Normalizer.normalize(HtmlUtil.encode(name.replace(" ", "_").replace("&", "").replace("(", "")
                .replace(")", "").replace(",", "").replace("+", "_"), HtmlUtil.ENCODE_TEXT), Normalizer.Form.NFC);
    }

As can be seen the function cleanName() sanitizes the name against spaces, ampersand and (),+ characters.
However, the late Unicode normalization using the NFC form algorithm may re-introduce back those characters.

Impact

This is a low-severity vulnerability. A mitigation would be to Unicode normalize first and then omit (replace) the unwanted characters.

As an example of a re-introduced characters check when the normalization operation is applied to U+1FEF (`), the resulting character will be U+0060 (`) under the NFC form. Same could happen to other cases.

References

https://sim4n6.beehiiv.com/p/unicode-characters-bypass-security-checks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Post-Unicode Normalization Vulnerability

Package

Affected versions

Patched versions

Description

Summary

Impact

References

Severity

CVE ID

Weaknesses

Credits