Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SwiftParser] Improve diagnostics for misspelled keywords #2794

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

AppAppWorks
Copy link
Contributor

fixes #2198 #2180

This is preliminary work on emitting better diagnostics for misspelled keywords. More discussions would be needed to ascertain the scope of misspelling corrections.

@AppAppWorks AppAppWorks marked this pull request as draft August 9, 2024 07:33
Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great changes! One thought after skimming over the PR:

I think it would be great if we could make misspelling correction even easier to use. What do you think of the following:

  • Every keyword has a list of possible misspellings + maybe an implicit list of single-character typos (the latter is probably more involved and could be a separate PR)
  • canRecover always takes misspellings into account and automatically generates a TokenConsumptionHandle that generates a missing token if it discovers a misspelling.
  • expect takes a parameter to decide whether it should take misspellings into account.

@mateusrodriguesxyz
Copy link
Contributor

mateusrodriguesxyz commented Aug 9, 2024

FWIW I've implemented a few months ago a general solution for keyword misspelling using levenshtein distance:
https://github.com/mateusrodriguesxyz/swift-syntax/tree/keywords-correction

MisspelledKeywordsTest

@AppAppWorks
Copy link
Contributor Author

  • Every keyword has a list of possible misspellings

I think we could store the mapping of keyword misspellings in some resource files and make use of code generation to create boilerplate code (there would be a lot of them). Resource files would be friendly for crowdsourcing too.

maybe an implicit list of single-character typos (the latter is probably more involved and could be a separate PR)

@mateusrodriguesxyz has done some great work on correcting keywords with single-character typos based on Levenshtein Distance, but I'm not sure if we should precompute typo permutations statically as computation for Levenshtein Distance takes O(mn) time.

  • canRecover always takes misspellings into account and automatically generates a TokenConsumptionHandle that generates a missing token if it discovers a misspelling.

Would it lead to performance regression as the search space for the parser might explode?

@AppAppWorks
Copy link
Contributor Author

AppAppWorks commented Aug 10, 2024

FWIW I've implemented a few months ago a general solution for keyword misspelling using levenshtein distance: https://github.com/mateusrodriguesxyz/swift-syntax/tree/keywords-correction

MisspelledKeywordsTest

Great work! It'll take some time for me to digest it.

@mateusrodriguesxyz
Copy link
Contributor

Great work! It'll take some time for me to digest it.

Thanks! Most of the relevance code is in TokenSpec. I do quite a few check before even trying to find the correct keyword to avoid false diagnostics and unnecessary distance computation.

(cherry picked from commit 2a3e108bf7b36539930298a9b78333983eb61e76)
@AppAppWorks
Copy link
Contributor Author

AppAppWorks commented Aug 15, 2024

With the latest commit, I've created a kitchen sink to facilitate crowdsourcing of keyword "false friends" in popular programming languages and other general misspellings that won't be captured by one-character Levenshtein Distance permutations.

Truth be told, it might be infeasible to keep in sync with the evolution of all these languages in the future, but at this stage let's just brainstorm :)

Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a great list. I don’t think it needs to be 100% complete but just having the infrastructure to add more typo Fix-Its is amazing because it makes it very easy to extend.

I think the next big step is to integrate this into the main parsing infrastructure so typo-correction is taken care of automatically in a variety of places, as I described in #2794 (review).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants