-
Notifications
You must be signed in to change notification settings - Fork 777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linting with Ruff #2033
Linting with Ruff #2033
Conversation
I was just reading through #1978 when I saw this PR, thanks for the work! As soon as that PR is merged, I'll take the time to look through this one. |
#1978 is merged. :) |
@afuetterer Thanks for the reminder! Note that I typically work on my open-source packages in the evening/weekend, so sometimes it might take a while for me to answer. Having said that, pinging generally helps 😉
Agreed, for these some manual check is important since this is the first passthrough of Ruff. After that, we can optimize/automate stuff but for now it is indeed important to understand which changes are pushed through automatically and how they might affect the codebase.
I think disabling would be preferred here as that would require rewriting part of the docstrings which I feel are quite clear at the moment. Also, even if rewriting those would be helpful I think the impact would still be small considering the amount of work that is required (which could be spend better on other features/issues/tasks instead).
Although most could be moved below module-level imports, I believe (but I'm not sure) that the warnings needed to be before some of the imports. I added a while back ago so I'm not sure whether it is still needed but Huggingface's Transformers had many irrelevant warnings back then.
Agreed. Some of these were intended to broad to catch many different issues but others were either just because of a lack of time. For now, ignoring these make sense until a separate PR goes into the specifics of this. I'm worried if we will go over all of them now, it might be a bit out of scope. But it is indeed important to go over them at some point. |
Yep, these need to stay as they are at the moment and noqa comments would be sufficient here.
Cool, definitely helps! How do you want to continue? Because of the merge of the PR I mentioned previously there are some conflicts and the ruff.toml needs to be moved to the pyproject.toml. Other than that, if I start merging other PRs that are currently open, that will continue to conflict with this one I think. Having said that, there are a couple of others that I actually want to merge first since they were already open for a while and nearing completion (specifically #1929 and #1894). |
Those all sound like good suggestions. I should be able to look at writing an actions workflow and reducing/ignoring errors this evening and weekend. I'll update this branch just to get the Once merged this will cause two problems, but there are ways around them:
|
Awesome, thanks! Take your time, there's no rush.
Hmmm, although a very nice solution we'll have to consider the experience of users that opened some of the PRs. Looking at the link you provided it seems to be a degree of complexity not everyone is familiar. Especially those who opened up minor changes/typos. Although there are currently 12 PRs open (ignoring this PR), there are 6 PRs that have recent changes/updates (#2039, #2013, #2002, #1985, #1929, and #1894). Of those, I believe the latter 4 PRs would be the main difficulty here if this PR is merged before those. How about I merge those latter 4 PRs (when they are done) and then come back to this? It would minimize the number of issues and I think they can be merged this week.
That's nice. I typically already squash all commits when merging with the main branch so fortunately that already is similar to current processes thus far. |
Hi @MaartenGr, sorry. I did not want to rush you in any way, I just wanted to ping @freddyheppell, that the other PR was merged and that, if desired, the config could move to pypyroject.toml, as suggested. |
Don't be sorry! Just wanted to be transparent as to when you can (sort of) expect a response from me and why things might take a bit longer. Also, in all transparency, I didn't even notice that it wasn't you who created the PR. I just kinda assumed, so definitely miscommunication from my side 😉 |
I think in practice it'll be relatively straightforward - in fact what I linked to is probably unnecessary in practice. A very simple PR may be mergeable anyway, or produce an easily resolvable merge conflict. In more complex PRs, the inverse of what I'll have to do in this PR can be done, i.e. accept changes from the PR branch in favour of
Just to clarify with this - it's only this initial linting PR which causes issues with git blame (as it changes the vast majority of lines as far as git's concerned) so will need to go in the |
I've now rebased to get the |
@freddyheppell P.S. If you have extra time, a quick note on Ruff in the CONTRIBUTING.md file here would be great. |
FYI enabling dependency caching on |
Is that within the same run or also between runs? Aside from here I cannot find a quick answer to this. If it is between runs, it might not use newer versions of dependencies. Using the latest version of dependencies generally helps in quickly finding compatibility issues. |
@@ -75,6 +75,7 @@ spacy = [ | |||
test = [ | |||
"pytest>=5.4.3", | |||
"pytest-cov>=2.6.1", | |||
"ruff~=0.4.7", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason you used the ~= operator here? I can imagine using it likely makes dependency management more stable but I like using >= to quickly find out when new major/minor versions are released and things break 😅 That way, I do not have to make changes to support new releases and only change when something breaks (which people are more likely to communicate rather than when things work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up going with this version constraint because if, in a PR, someone were to install a newly released version, it may introduce new rules which reformat/start erroring on code unrelated to the PR.
Arguably the linter version can be kept constant forever, but it can be safely incremented in a separate PR every so often to avoid this.
In the keybert repository there is a pre-ommit-config defined. Should this be done here as well? |
The cache key used by the setup-python action is a combination of the hash of the dependency file, OS and Python version. Assuming CI is used frequently enough to keep the cache, it will keep using this forever (until the dependency file is changed). I suppose what's important is what gets cached. If it just caches the downloaded packages/built wheels then packages will still get upgraded. If it caches the actual installed packages directory then they won't. I think it's the former, based on the fact when it installs from cache it states it's using a cached download, rather than that there's an installed package meeting the dependency constraints already. This would suggest that dependency resolution is still happening. I can look into this more after this PR though - I can leave it in just for the lint or take it out for now. Alternatively, the cache could be modified to include the PR branch in the key. That way it would be cached for repeated runs in the same PR, but each PR would be freshly installed. |
I did think about this but I'm not too familiar with precommit, and I'm not quite sure how to handle the fact we have auto-formatting, autofixable checks and non-autofixable checks. And I've never been a huge fan of doing it automatically, as sometimes the changes are suboptimal (e.g. no-op code is replaced with more explicitly no-op code, rather than removed), and doing it in a precommit hook hides that somewhat. Also it's probably more necessary when you're using all the separate packages instead of Ruff as an all-in-one. I will add a Make task to use locally (and put this in the contributing doc). In CI, we need to run a different command anyway, as we want to fail instead of doing any auto-fixes. |
Now the basics are working, do we want to enable additional rulesets? Ruff has some related to additional packages, and some that enforce extra standards. Some I think are worth considering - these are all relatively small and focused (and have good auto fix coverage) so even enabling all of them won't be overly strict:
This one is useful but is a bit stricter so likely to flag up a fair bit:
|
@freddyheppell Thanks for all your work on this! Went through your questions/comments:
I think leaving it out for now would be best as we can look into this in a separate PR more in-depth. I want to prevent increasing the scope of this PR as much as possible.
I typically like to have all linting done within the PR itself and use the workflows to check whether the formatting was done properly. As such, using pre-commit does allow new users to quickly and easily format their code without too much hassle. That said, running
Yep, definitely don't want auto-fixes in the workflow.
I believe this one might be a bit tricky since I'm suppressing some warning before and after imports to disable warnings specific to certain imports. BERTopic/bertopic/_bertopic.py Lines 1 to 11 in 0a28916
Not sure how isort handles these cases as the order cannot be changed here. Do note though that this was implemented a couple of years ago, so it might not be necessary anymore.
Not familiar with this one but does seem like a nice addition. I am, however, quite hesitant to add packages as dependency conflicts is something that starts creeping up.
This would be nice. Numpy changes are important and detecting them early on would be helpful.
Pandas might actually be replaced with something else in the future, so let's skip this for now.
Although nice features, the size/popularity/downloads of these packages make it difficult for me to rely on at the moment since I cannot be confident these will be maintained for years on end.
Ah, that's definitely something I want to decide myself as part of the discussion on for loops vs. comprehensions are related to readability and that's a decision that I want to make myself. |
Okay thanks for that, will address later. FYI - all rulesets are included with Ruff already. They come from various packages originally but Ruff has to reimplement them in Rust, the names of the original packages are just for convenience. |
You can use ruff as pre-commit hook. Instead of fixing anything the hook can fail in CI (--exit-non-zero-on-fix), which is the behavior you want, I guess. I use it like this: - repo: https://github.com/astral-sh/ruff-pre-commit
rev: f42857794802b6a77b0e66f08803575aa80d3c8f # frozen: v0.4.7
hooks:
- id: ruff
args: [--fix, --show-fixes, --exit-non-zero-on-fix]
- id: ruff-format |
Oh, and here I thought there were a bunch of additional dependencies... 😅 |
In the interests of getting this merged, I think we should leave precommit and extra rules for now. I did test some of the extra rulesets and they generally had a very small number of issues (e.g. the NPY ruleset was only to replace I've added the Make rules and documented in the contributing file. So I think this is now ready to merge. |
Remember that this needs to be squashed down to a single commit and the revision added to I could squash all the commits in the PR and add this file myself, but then we'd lose the commit history in this PR. Perhaps it would be better for you to squash merge it, then add the ignore revs file yourself? |
@freddyheppell Thank you for all the work on this! Just did a last check and everything looks great. I squashed the commits so that should be good to go. Whilst I answer some other issues, would you mind adding the ignore revs? If not, I'll likely do it somewhere next week. |
I think the top level .flake8 config file can be deleted now? |
@afuetterer Yes, it can be deleted! |
WIP linting with ruff (fixes #2022).
Still to-do:
ruff.toml
intopyproject.toml
make decision about dependency cachingdecide if we want to enable any additional rulesOld discussion about enabling/disabling lint rules
## Current Lint StateCurrent outstanding issues reported by
ruff check
, which are not auto-fixable.Those marked with [*] can be fixed automatically by a rule marked as unsafe. These fixes are not enabled by default since they are currently not well implemented and/or tested enough to be sure they won't modify the behaviour of code. However, they're fine to use one-off so long as their results are checked.
Most of these are relatively small issues which I can work on fixing, the bigger ones are:
D205 - 1 blank line required between summary line and description
Majority appear to be caused by functions not having a short single-line summary, instead opening with a longer explanation, e.g.
Really the issue here is the first line (logical line, not literal line in the code file) is overly long, but there's not actually a rule that captures that. Perhaps this rule could be disabled.
E402 - Module level import not at top of file
There's a lot of these but they're all caused by the main bertopic file mixing top-level imports with conditional imports. The conditional imports can be moved to below all the module-level imports if possible. Some (like the py3.8 conditional) could potentially be removed.
E722 - Do not use bare
except
These may be hard to fix if the exception if the specific exception it's intended to catch isn't obvious. But arguably that's a problem in general, not just something that needs to be fixed to appease the linter.
F821 - Undefined name ...
There are some variables in a loop in
BERTopic.topics_over_time
which it thinks are unassigned:I think this is because they're assigned later in the loop and these parts of the code won't run before then, but of course a static analyser can't determine this. These may need to be specifically ignored with
# noqa
comments.It's also flagged up a mistake from my PR #1984 where
self
is used in a@classmethod
! I'll PR a fix for that too...