Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for new typos in CI #1984

Merged
merged 6 commits into from
Dec 7, 2023
Merged

Check for new typos in CI #1984

merged 6 commits into from
Dec 7, 2023

Conversation

LilithHafner
Copy link
Member

@LilithHafner LilithHafner commented Dec 5, 2023

This is a close copy of JuliaLang/julia#51704

The design now is...

Look at each file that was edited and neither created nor deleted. Scan these all for typos longer than 3 characters using the rust typos crate and compute the union. This is our baseline of "false positives". Compile this list into a Python set.

Look at each file that was edited and not deleted. Scan these all for typos longer than 3 characters using the rust typos crate, skipping any typos that are in the set of false positives. For each remaining typo, provide a github annotation and, if the typo is longer than 5 characters (at least 88% true positivity rate for typos that long), mark the CI job as a failure but continue to report all remaining novel typos.

This achieves:

  • Close to the full sensitivity of the typos crate
  • Almost never erroneously fails CI
  • Few false positives reported (hopefully)
  • When there is a false positive, it is safe to simply ignore it and press merge, even if the typos CI check was failing.

Runtime on is 14s total, 5s doing the actual checks, typos goes through only the files that have been changed (and goes through them twice) bash handles lists of length equal to the number of files edited, python runs in O(number of reported typos). Runtime here should be negligibly shorter. 4 minute timeout.


See JuliaLang/julia#51704 (comment) for false positivity stats

I've used lower (by one character) thresholds here than on JuliaLang/julia because

  • A higher proportion of PRs actually edit english prose (the prior is different)
  • It is more important to catch typos
  • It is less bad to have a nonzero false positivity rate because there are fewer PRs

Because this only checks for new typos, there is no need to wait for #1983 to merge before merging this.

Copy link

github-actions bot commented Dec 5, 2023

Once the build has completed, you can preview your PR at this URL: https://julialang.netlify.app/previews/PR1984/ in ~15 minutes

@LilithHafner LilithHafner merged commit 98db220 into main Dec 7, 2023
2 checks passed
@LilithHafner LilithHafner deleted the lh/typos-ci branch December 7, 2023 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant