Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre-commit hook for codespell #307

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

weiji14
Copy link
Contributor

@weiji14 weiji14 commented Jul 23, 2024

Catch misspellings using codespell.

Added a pyproject.toml file with a [tool.codespell] section to configure a list of words to ignore (e.g. LINZ).

Original list of typos from https://results.pre-commit.ci/run/github/698210830/1721775297.Gdhue1HnRoC75KjYK2T9Eg (note that there are some false positives from jupyter notebook binary outputs):

codespell................................................................Failed
- hook id: codespell
- exit code: 65

LICENSE-MODEL.md:40: therefrom ==> there from
docs/clay-v0/clay-v0-location-embeddings.ipynb:226: dimentional ==> dimensional
docs/clay-v0/clay-v0-location-embeddings.ipynb:234: Preform ==> Perform
docs/clay-v0/specification-v0.md:22: trainning ==> training
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:245: iNH ==> in
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:687: te ==> the, be, we, to
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:916: WEe ==> we
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:985: Nd ==> And, 2nd
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:985: FO ==> OF, FOR, TO, DO, GO
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:985: OT ==> TO, OF, OR, NOT, IT
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:985: bu ==> by, be, but, bug, bun, bud, buy, bum
docs/finetune/regression.md:160: addtion ==> addition
docs/release-notes/changelog-v1.0.md:40: wavelenghts ==> wavelengths
docs/tutorials/clay-v1-wall-to-wall.ipynb:18: Analyise ==> Analyse
docs/tutorials/clay-v1-wall-to-wall.ipynb:336: formate ==> format
docs/tutorials/clay-v1-wall-to-wall.ipynb:377: formate ==> format
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: oly ==> only
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: teH ==> the
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: FO ==> OF, FOR, TO, DO, GO
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: tRU ==> through, true
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: AAs ==> ass, as
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
finetune/classify/classify.py:22: inteface ==> interface
finetune/regression/regression.py:22: inteface ==> interface
CODE_OF_CONDUCT.md:9: socio-economic ==> socioeconomic
docs/clay-v0/data_labels.md:29: bechmark ==> benchmark
docs/clay-v0/model_finetuning.md:5: relevent ==> relevant
docs/clay-v0/model_finetuning.md:24: identifiying ==> identifying
docs/clay-v0/model_finetuning.md:153: evalution ==> evaluation, evolution
docs/clay-v0/partial-inputs.ipynb:394: oT ==> to, of, or, not, it
docs/clay-v0/partial-inputs.ipynb:394: fPr ==> for, far, fps
docs/clay-v0/patch_level_cloud_cover.ipynb:701: searchs ==> searches
docs/clay-v0/run_region.md:6: strucutre ==> structure
docs/clay-v0/run_region.md:90: discontinous ==> discontinuous
docs/finetune/classify.md:23: recieves ==> receives
docs/finetune/classify.md:75: Classifcation ==> Classification
docs/finetune/finetune-on-embeddings.ipynb:368: shoudl ==> should
finetune/segment/segment.py:22: inteface ==> interface
trainer.py:22: inteface ==> interface

Automate typo-finding using [codespell](https://github.com/codespell-project/codespell/tree/v2.3.0?tab=readme-ov-file#pre-commit-hook). Added LINZ to the ignore list as a start.
@weiji14 weiji14 added the documentation Improvements or additions to documentation label Jul 23, 2024
@weiji14 weiji14 self-assigned this Jul 23, 2024
A couple of words in these legal documents to ignore.
Binary outputs in Jupyter Notebooks are not skipped by codespell, xref codespell-project/codespell#2138. So manually skipping these files after true-positive typos have been fixed.
Comment on lines +7 to +11
skip = [
"docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb",
"docs/clay-v0/partial-inputs.ipynb",
"docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb",
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codespell catches some false positive misspellings in Jupyter Notebook binary outputs (see also codespell-project/codespell#2138), so skipping these files after the true positive misspellings have been fixed:

docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:245: iNH ==> in
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:687: te ==> the, be, we, to
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:916: WEe ==> we
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:985: Nd ==> And, 2nd
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:985: FO ==> OF, FOR, TO, DO, GO
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:985: OT ==> TO, OF, OR, NOT, IT
docs/clay-v0/tutorial_digital_earth_pacific_patch_level.ipynb:985: bu ==> by, be, but, bug, bun, bud, buy, bum
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: oly ==> only
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: teH ==> the
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: FO ==> OF, FOR, TO, DO, GO
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: tRU ==> through, true
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: AAs ==> ass, as
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/tutorials/v1-inference-simsearch-naip-stacchip.ipynb:708: ALo ==> also
docs/clay-v0/partial-inputs.ipynb:394: oT ==> to, of, or, not, it
docs/clay-v0/partial-inputs.ipynb:394: fPr ==> for, far, fps

@weiji14 weiji14 marked this pull request as ready for review July 24, 2024 00:56
@weiji14 weiji14 removed their assignment Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant