-
-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lint: add typos check #1888
lint: add typos check #1888
Conversation
I've noticed a considerable number of areas in the diff where correct names are made incorrect ("rela" stands for "relative" and I don't think there are any occurrences where it should be changed to "real", and there are some others). This is not limited to the GPG signature and project-name cases that you've identified. In addition, I'm not sure any changes should be made in files in However, I've also noticed that you've marked this as a draft, and maybe you aware of the other issues. If you think it would be helpful for me to leave a review with comments on the individual problematic cases, I'd be pleased to do so. Otherwise I will assume as long as this is a draft that such a review might be more of a distraction than a help, and refrain from it. There are also some areas where at least the fixes are clearly a huge improvement, particularly in |
To make sure it is not lost track of, and also to report the results of some manual testing because the affected xfail markings cover some things not produced on CI, I've opened #1893 for the bug you've discovered in Although those are definitely not the only typos found here that should be fixed, it seems to me that their elevated importance and relationship to the correctness of the tests justifies a separate PR to fix them, especially if such a PR would result in their being fixed sooner (and then they would no longer have to be worried about here). If you are amenable to this idea, then I suggest opening that, as you deserve the credit for it. But I would be pleased to open that PR instead if you prefer (I would list you in the While another option may be to wait for the change to come in with this PR, I think it is better that it not be delayed while figuring out if and how automated spell checking can be added safely and with an acceptably low rate of false positives. |
Thanks for sharing this draft, I am happy it could already find a genuine issue (#1893) despite a high rate of false positives. Thank you |
With my other projects, I have been using several typing tools, and this seems to be at first, lower effort, but as mentioned, it produces a significant number of false positives, and with the next version, there could be even more (just opened issues for crate-ci/typos#966 and crate-ci/typos#969) So I'll open a separate PR for the fixes and most likely pivot this PR to use another typing alternative :) |
pivoting to https://github.com/codespell-project/codespell |
@EliahKagan @Byron, would you mind having a look at the updated version? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for making it happen!
Now it looks like the tool is usable, and it's nice to see that it caught a couple of real errors.
I will wait for @EliahKagan approval though before merging in case I am missing some more obscure aspects of the tool and as it's integrated into the tooling of GitPython.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the keyword argument spelling fixes in test/test_index.py
justify adding automated spell-checking, even though various cases remain where spell-checking seems to have led to incorrect or sub-optimal changes.
These can be fixed, and the risk that spell-checking would lead to such cases being introduced later is, in my opinion, outweighed by the benefits of catching misspellings that, due to the dynamic nature of Python and its idiomatic uses, may affect the behavior of GitPython or its tests.
I've looked at each change and commented about the ones that I think should not be done or otherwise still need improvement. Some comments cover multiple changes, so the absence of a comment on a specific change does not mean that I think it is correct.
I recommend that this PR be marked as fixing #1893.
Edit: If Cygwin tests fail with "dubious ownership" errors when more commits are pushed to this pull request, that is not any fault of this PR, but also happens without the changes here. I've opened #1916 to fix it. If that pull request is merged, then merging from main or (perhaps better) rebasing this PR feature branch onto main should allow new Cygwin runs to pass here too.
git/index/base.py
Outdated
@@ -439,9 +439,9 @@ def raise_exc(e: Exception) -> NoReturn: | |||
# END glob handling | |||
try: | |||
for root, _dirs, files in os.walk(abs_path, onerror=raise_exc): | |||
for rela_file in files: | |||
for relative_fpath in files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the name relative_fpath
better than rela_file
? Why was this chosen instead of relative_path
? If the f
in fpath
is unimportant, then it should be removed. If it is important, then it should be spelled out. If explicitness is not required, then presumably rela_file
is also okay, in which case it should not be changed just to make the spell checker happy.
This applies to most occurrences of relative_fpath
, including in other files.
My guess is that this should be relative_path
. The nonpublic _items_to_rela_paths
method was renamed to _items_to_relative_paths
. Assuming that change is good, which I think it is, it seems like relative_fpath
should just be relative_path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rela
was marked as typo so I found easier to use full name without affecting API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think relative_fpath
is a full name. It looks like a typo that is meant to be relative_path
without the f
.
Maybe it is not a typo. Maybe f
is an abbreviation for something that should be spelled out (if important) or omitted (if unimportant).
The key point is that I do not know what relative_fpath
means in the places where this PR has introduced it, and I have not been able to figure that out. (I have been able to guess that the f
stands for "file," but I am not certain of this, and without knowing the old variable name rela_file
, I would likely not even have been able to guess this.) I expect that other current or future readers may also not know what it means.
I recommend changing it, probably to relative_path
.
git/remote.py
Outdated
# uptodate encoded in control character | ||
# up-to-date encoded in control character |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change may seem at first glance to be obviously correct, but I think it actually may be wrong, and that if it is to be kept then it requires a specific technical justification.
I think uptodate
is a specific technical term in Git. In the Git source code, it often appears capitalized, but it also appears lower-case in multiple places, which also seems to be intentional. As one example, in fetch.c
:
/* uptodate lines are only shown on high verbosity level */
if (verbosity <= 0 && oideq(&ref->peer_ref->old_oid, &ref->old_oid))
continue;
It seems like that specific technical meaning is the one relevant here. If this has been verified not to be the case, then the change here is okay. Otherwise, either the change should be undone and uptodate
added as a correct spelling, or it should be investigated.
Although this feels minor, making technical terms harder to search for can accumulate and make a codebase difficult to work with. That is both potentially relevant to this specific change, and a potential risk of automated spell-checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case making it a command associate would be better
pyproject.toml
Outdated
@@ -79,3 +79,9 @@ lint.unfixable = [ | |||
"test/**" = [ | |||
"B018", # useless-expression | |||
] | |||
|
|||
[tool.codespell] | |||
skip = 'test/fixtures/reflog_*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of the files in fixtures
that represent test input or expected output should be spell-checked. I think the *.py
files in fixtures
, which are actually run as code, should be spell-checked, and that other files should not.
It seems to me that the question to ask is, if code appearing inside a fixture were found to have a logic error, should that bug be fixed? A number of fixture files have Ruby code or diffs thereof, but these are just test data. If logic errors in that code (which isn't run) shouldn't be fixed, then either the same files should not be spell-checked, or the justification for spell-checking them should be made clear. The issues with these kinds of changes are:
- Churn in test data may make it so that changes to test data that are actually done to improve the tests are hard to identify.
- Changes in test data need to be reviewed to evaluate whether they could have any impact on the tests. It is possible, in general, for a change to test data to keep a test passing, while preventing it from catching regressions that it would have caught before the test data changed.
The first concern is minor and may well be overcome by the slight readability improvement of avoiding typos. The second concern is less minor and it seems to me that this is not worth the risk, even if small. Tests can assert things that are affected by the presence or absence of specific strings or that involve specific lengths.
This also applies to all changes in test/fixutres/diff_mode_only
. I have not posted separate comments there.
Edit: See also #1920 (review).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes all sounds reasonable to me, so how about splitting this into two PRs?
- add typos check with exclude fixtures
- revisit fixtures' typos and eventually remove ignoring this folder from typo's check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EliahKagan Did you see this message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good approach to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now officially setting this PR to a state that indicates that some modifications are needed.
@@ -52,7 +52,7 @@ | |||
|
|||
_streams_n_substrings = ( | |||
None, | |||
"steram", | |||
"stream", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably linked to fixtures / test data
@EliahKagan @Byron reverted most of my additional changes so keep it just with adding check and fixing all flagged issues, also excluding fixtures... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the minification of the PR, it looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Some of the CI tests use WSL. This switches the WSL distribution from Debian to Alpine, which might be slightly faster. For the way it is being used here, the main expected speed improvement would be to how long the image would take to download, as Alpine is smaller. (The reason for this is thus unrelated to the reason for the Alpine docker CI test job added in gitpython-developers#1826. There, the goal was to test on a wider variety of systems and environments, and that runs the whole test suite in Alpine. This just changes the WSL distro, used by a few tests on Windows, from Debian to Alpine.) Two things have changed that, taken together, have unblocked this: - Vampire/setup-wsl#50 was fixed, so the action we are using is able to install Alpine Linux. See: gitpython-developers#1917 (review) - gitpython-developers#1893 was fixed in gitpython-developers#1888. So if switching the WSL distro from Debian to Alpine breaks any tests, including by making them fail in an unexpected way that raises the wrong exception, we are likely to find out.
Just a suggestion to add a check for typos and maybe let's fix some without breaking API