Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The unicode normalization step of the python interpreter can be abused #4

Open
wasi-master opened this issue Dec 30, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@wasi-master
Copy link

wasi-master commented Dec 30, 2021

Basically the suggesion in this reddit comment

From this article:

Python always applies NFKC
normalization to characters. Therefore, two distinct characters may actually
produce the same variable name. For example:

>>> ª = 1 # FEMININE ORDINAL INDICATOR
>>> a # LATIN SMALL LETTER A (i.e., ASCII lowercase 'a')
1

I've generated a mapping of these characters taken from this url.
The mapping can be found here. But beware that some characters may not be supported in python because I haven't tested every one of them.

I suggest adding another additional flag to enable this behaviour

I would have done it myself and opened a pr but I am too busy at the moment

@LeviBorodenko
Copy link
Owner

That sounds very promising! I like it. I am not sure if I find the time to implement it, but I am open for PRs.

@LeviBorodenko LeviBorodenko added the enhancement New feature or request label Dec 30, 2021
@MonliH
Copy link

MonliH commented Dec 30, 2021

I actually implemented this in uglier, which was pretty much a copy of this project. In addition to abusing the Unicode normalization, it also uses cyrillic characters (which look a lot like latin chars) to make all variables look like they have the same identifier.

This:

def add_values(n1, n2):
    return n1 + n2


def add_10_to_string(n):
    return str(add_values(int(n), 10))


num = add_10_to_string("10")
print(num)

turns to:

def ADDVALUES(хxxх, хxхх):


    return хxxх + хxхх



def ADDTOSTRING(НННН):
    return st𝓇(𝕬𝔇𝔇𝔙𝕬𝕷𝓤𝔈𝔖(𝒾𝕟𝑡(НННН), 10))

НННH = 𝕬𝕯𝕯𝕿𝕺𝔖𝕿𝕽𝕴𝕹𝔊('10')
𝓅𝓇𝒾𝕟𝑡(НННH)

(notice it also abuses the normalization for built-ins, using something like 𝒾𝕟𝑡 for the built-in int function)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants