Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lint: handle multiline jinja2 code #6552

Open
wants to merge 4 commits into
base: 8.4.x
Choose a base branch
from

Conversation

wxtim
Copy link
Member

@wxtim wxtim commented Jan 11, 2025

Closes #6348
Closes #6507

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@wxtim wxtim self-assigned this Jan 11, 2025
@wxtim
Copy link
Member Author

wxtim commented Jan 11, 2025

@oliver-sanders Please could you have a look at this proof of concept before I bother testing it?

@wxtim wxtim requested a review from oliver-sanders January 11, 2025 11:26
@wxtim wxtim marked this pull request as draft January 11, 2025 11:26
@oliver-sanders oliver-sanders removed their request for review January 13, 2025 10:59
@wxtim wxtim force-pushed the lint.state_object branch from 7284e7c to 8335bd2 Compare January 13, 2025 11:00
@wxtim wxtim marked this pull request as ready for review January 13, 2025 11:01
@wxtim wxtim force-pushed the lint.state_object branch from 4477ce9 to becc52e Compare January 13, 2025 11:02
@wxtim wxtim changed the base branch from master to 8.4.x January 13, 2025 11:02
@wxtim wxtim force-pushed the lint.state_object branch from becc52e to 9191eae Compare January 13, 2025 11:05
@wxtim wxtim force-pushed the lint.state_object branch from 9191eae to 7437cfa Compare January 13, 2025 11:07
@oliver-sanders oliver-sanders changed the title Lint.state object lint: handle multiline jinja2 code Jan 13, 2025
@oliver-sanders
Copy link
Member

I think this is going to be the quickest way to get over the state barrier, but going statefull is a good opportunity to review our approach here before we lay down a pattern which ends up biting us later.

I haven't really given much thought to this so far, I don't have working knowledge of modern linters, a bit of reading would be a good idea...

Mid to long term, it depends how far we want to go with the linter. The initial idea was a quick and dirty line-by-line regex tool, but we thought we would leave it there (we decided it would not be statefull). Now we are clearly at the point where we want to develop this into something more serious.

I think the optimal solution for this sort of problem is something more tokeniser'ey. I.e, a program that reads the file into labeled bits rather than lines, allowing rules to be run on each "bit" in a more structured way than regex would allow by itself.

E.g, here's an example of how a tokeniser'ey type thing might see the file:

>>> for type, content, state in tokeniser(file):
...    print((type, content))
('shebang', '#!Jinja2', <file:flow.cylc, lines:[0], preproc:jinja2, path:>)
('blankline', '', <file:flow.cylc, lines:[1], preproc:jinja2, path:>)
('section', '[scheduling]', <file:flow.cylc, lines:[2], preproc:jinja2, path:[scheduling]>)
('config', '    initial cycle point = 2000', <file:flow.cylc, lines:[3], preproc:jinja2, path:[scheduling]initial cycle point>)
('cycle point', '2000', <file:flow.cylc, lines:[3], preproc:jinja2, path:[scheduling]initial cycle point>)
('section', '    [[graph]]', <file:flow.cylc, lines:[4], preproc:jinja2, path:[scheduling][graph]>)
('config', '        R1 = """\nfoo => bar\nbar => baz\n    """', <file:flow.cylc, lines:[5,6,7,8], path:[scheduling][graph]>)
('graph', 'foo => bar\nbar => baz', <file:flow.cylc, lines:[6,7,8], path:[scheduling][graph]R1>)
('jinja2', '{% if ... \n ... %}', <file:flow.cylc, lines:[9,10], path:[scheduling][graph]>)
('jinja2', '{% include ... %}', <file:flow.cylc, lines:[11], path:?>)

It's similar (but less detailed & more annotated) to what our syntax highlighters see. Note, we are carrying the Cylc config "path" where the information is available allowing us to apply specific rules to particular sections or configs and each entry could represent multiple lines. This would allow us, for example, to apply rules, only to graph sections, and to do clever things, like read a few lines ahead to check if different task configs are compatible.

It might be relatively simple to get a paired-down version of this working, e.g something that has only awareness of "plain" Cylc lines and Jinja2 expressions.

There might also be the option of building a linter into one of the Cylc grammars we have already developed? I wouldn't be surprised if this is becoming common in the LSP era.

I mention these things largely to curtail the scope of what we do now. It might be that the Cylc linter never gets much further than it is now, in which case this is a pretty decent pattern. But it is also possible that we end up developing this much further and the community starts contributing rulesets. So we should be cautious of how far we push this pattern, if it's starting to look like this utill might balloon, we should be ready to change tack rather than bang our heads against the pattern we've got. If there's a pattern out there we could easily adopt, this might be a good opportunity to do so before we start trying to get clever with our line reader. And if we do go ahead with this (and I don't think that would be a bad option), we should be cautious over how far we push it.

@hjoliver
Copy link
Member

Much as I like the linter (I do!) we should think carefully, and maybe look at how widely it is used, before sinking a lot of effort into making it significantly more general than it is now (I think this gels with what Oliver is saying, but in more general terms). (Priorities vs resources, as ever...)

@wxtim
Copy link
Member Author

wxtim commented Jan 15, 2025

Much as I like the linter (I do!) we should think carefully, and maybe look at how widely it is used, before sinking a lot of effort into making it significantly more general than it is now (I think this gels with what Oliver is saying, but in more general terms). (Priorities vs resources, as ever...)

We already have validate, and I'm against trying too hard with lint. If that means that eventually there are things we can't deal with, so be it. It's already a goodish way from "a sed script to identify 7 to 8 changes".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cylc lint is catching things in multi-line strings cylc lint: errors raised for multi-line Jinja2 blocks
3 participants