Add typosquatting lint check #383

shonfeder · 2024-10-23T19:28:43Z

As per #378, we removed the part of name collision detection that used a Levenshtein distance. We found this was not a helpful metric, and it only gave false positives.

@punchagan did some research and found focused work to specifically detect typosquatting problems. We think this would be a helpful replacement for the removed Levenshtein distance check.

From #378 (comment)

There's some prior work done on other package archives (like PyPI, npm and Rust's crates) in this [paper=(https://arxiv.org/pdf/2003.03471), and the packages based on / related to it: typogard and typomania.

The paper (and the packages) primarily focus on malicious typo-squatting, and the package repositories are much larger than opam. But, we could adapt the Typosquatting Signals (Sec 3.3) explored in the paper for our use case 1 2. They use a concept of popular (and unpopular) packages for detecting malicious typosquatting, but we probably don't need that for our use case given we aren't doing strictly for malicious typosquatting checks, our repository size and the manual approval process for package addition/updates.

shonfeder assigned punchagan Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add typosquatting lint check #383

Add typosquatting lint check #383

shonfeder commented Oct 23, 2024

Add typosquatting lint check #383

Add typosquatting lint check #383

Comments

shonfeder commented Oct 23, 2024