-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Libraries and "Standardization" #25
Comments
Collaborating more seems like a great idea to me. I think that the tools that we're making are used together so often that striving for some shared documentation and consistency across the tools makes sense to me. Plus, as you say, we're aiming for the same audience. One thing to include is that we could give a comparison of each tool. What is their aim and what are they better at compared to the other tools? This would help users think about which tools they want to use for their purpose, and would also help us, as developers, to think more carefully about "is this feature appropriate for this tool" and "is this feature better suited in another tool"? There's a certain amount of overlap in what each tool provides (in terms of the messages that they output at least) and either reducing that or at least documenting it seems like a good idea to me. If someone is using a combination of PyCQA tools, they'll want to know what messages to turn off so that they aren't getting duplicates. |
I think it would also be helpful for us to share more general learnings too. For example, I was talking to @ambv about black configuration and shared the pain that has been managing the precedence/merging/discovery of files for pycodestyle and flake8. We could generate advice to new PyCQA (or PyCQA-alike) projects to restrain themselves to 1 project file that they read to manage things. It's also looking like the community is moving towards tooling being configured in |
This definitely sounds like a good idea, and would enforce more the concept of PyCQA as a community rather than a loose group with tangentially related interests on code quality. I also agree with Ashley on having a comparison for each tool, so that the users could pick the right tool for the job. I also have a theory that overall it would be better for Python to just bring everything together under the same umbrella, with the user deciding what to needs from a code quality tool, be it formatting checks or more advanced checks. We have now both For something actionable, we could decide as a group what libraries do we actually want to reuse? At least from |
Right. I think I was hesitant to encourage that to start with because OpenStack started that way and became a clumsy, lumbering group with a lot of politics. I wanted us to avoid those politics but in reality, we share a lot of things anyway.
I agree. Something like
I agree. I've been thinking about that. I've also been thinking about the fact that Python suffers from the fact that we don't have a tool like
By tree, you mean something to provise the AST right? I think it's hard to find a good one. There's
It also limits whether Flake8 would be able to refactor/reformat comments and other things that aren't included in an AST by default
That's a good question. While Flake8 runs pylint against itself, I don't franky know pylint well enough to know what it could use. |
Especially for the AST part, also see the discussion at davidhalter/jedi#630 - i.e. parso which was pulled out from jedi by @davidhalter. |
I'd be happy to adopt that in Flake8. It would possibly require some effort to make PyFlakes happy using that as an AST though. |
parso was pulled out of jedi and is now a separate project. The parser is a fork of lib2to3 with most of the parser generator's code base still matching lib2to3's parser. I guess the idea was to give it a better API and some more features. The only thing that's lacking there at the moment is that it's a bit annoying to refactor comments. Since whitespace/comments are preserved, it's of course possible, but not as convenient as it could be. parso is pretty much the only tool around that works with refactoring and Python 2 & 3 support (it's debatable though if Python 2 support is still that important at this point for code analysis tools in the future). parso is by the way also able to show you multiple syntax errors in your files. This could come in really handy for something like pyflakes. The only real issue I see with it is that it's significantly slower than |
Do you keep a list of changes from lib2to3? Are you still looking at development that happens on lib2to3? Black also has a vendored version of lib2to3 but I only do that because my patches to lib2to3 often will only be present in 3.6.6 or 3.7.0 or sometimes go straight to 3.8 which puts them years away from users' hands. But Black does upstream all patches. Currently we have a few things that aren't yet but only since they don't apply cleanly so it's a bit more work (like Unicode identifiers support, eh?). So it seems basing off of lib2to3 is a common thing that we could standardize. I would encourage you to not deviate from from the version in cpython master too much so that you can at least accept upstream changes to the grammar and the tokenizer. I hate it when third-party parsers are unable to parse latest bits of Python grammar because the authors weren't aware of them. As for me, I am planning to document lib2to3 for Python 3.7.0 so that parts of its API at least will become "stable" and more than just a 2to3 implementation detail. I'll be maintaining lib2to3 in the standard library going forward. Would be great for parso to upstream some of its improvements and start tracking lib2to3's updates to the grammar and the tokenizer. @sigmavirus24, lib2to3's pytree is a CST, not an AST. While you can totally write a linter on top of it, it's going to be more annoying for people to use because low-level implementation details bleed through to the tree. You will see naked grammar's nodes and not the abstractions that an AST gives you. This might or might not be worthwhile. On the one hand, using this CST will unbundle the version of Python you are able to analyze from the version of Python flake8 runs within, which was always my biggest complaint. You will also be able to warn on things related to dangerous usage of commas (like one-tuples without parentheses or missing commas in multiline lists of strings). On the other hand, this is not backwards compatible. The tree is way more verbose. The parser is slower. pyflakes would need to change. Plugins would need to change. Authors would for sure cry out for docs. |
No I don't. It was a hard fork. However the parser generator was written by Guido and is pretty much finished. It's not that I copied all of lib2to3. I just copied pgen2. That's the part that doesn't need changes. The only big change is probably how I parse f-strings (however if you guys don't like it we can fall back to the old way). But then again this was not scope for lib2to3, f-strings are simply tokens and only relevant in the Python 3 part. Also lib2to3 is using a kind of weird grammar that allows both Python 2 and Python 3 code to be parsed IIRC. parso either parses Python 2.7 code or one of the Python 3 releases. Another thing that was added to parso is error recovery. This is necessary for completions and other IDE stuff. It's pretty neat, you can turn it on or off and it's not really a modification of pgen2.
Well I think that has already happened. I think the only changes that are a bit hard to port are the tokenizer changes. But then again, I don't really care, the changes there are usually so minimal and only happen every 1.5 years (or even less). parso still uses pretty much the original grammar files so we're fine there.
The problem is probably especially the tokenizer. parso's tokenizer allows tokenizing different Python versions in one version. I have removed the potential errors that the tokenizer generates. It's now all error tokens and there's no exceptions anymore. I think this is way better and Python should probably also move into that direction. Have a look here: https://github.com/davidhalter/parso/blob/fa0bf4951c329a2ebdd60dbeabe070ce24f4d272/parso/python/tokenize.py (I'm intentionally showing you an older version, to avoid the f-string shenanigans parso's now doing there). |
In fact, I would prefer f-strings to be proper nodes in lib2to3. I made them a stupid string prefix to bandaid parse failures after 3.6's release. For a CST it would make a lot of sense to return proper nodes for formatted values within f-strings. This is unfortunately too late for Python 3.7 but would be a worthwhile contribution for 3.8. All in all, I'm pretty impressed with Parso and some of its behavior should find its way to lib2to3. Given a higher-level "AST-like" API over it will be a good match for Flake8 and enable writing back. |
100% agreed. The ast.c implementation is just suboptimal. I was thinking about contributing this back to CPython. I'm just a bit scared of patching the C code of the tokenizer. It's quite a big modification (being more aware of the context). Do you think a PEP would be needed or is this a modification that could just happen?
Thanks! I really hope we can get some changes back. IMO I especially would like to replace the tokenizer with parso's, but that might be a bit difficult considering backwards compatibility and stuff. I have had quite a few discussions with people using the tokenizer that were unhappy with both the "StringIO.readline" API and the errors raised in some rare cases when code was invalid. So consider this an offer. I'm happy to contribute things back, I'm just not really well connected with Core Devs and then there's also my "own" projects that use quite a bit of my time. We could also have a quick call. |
No PEP needed, just a bugs.python.org issue. Please consider doing it! I'll help you with the implementation and review. I patched the current C implementation's lineno, column reporting so I'm aware it's a big hack. What I meant also is that lib2to3 implements f-strings simply as a string prefix. At the moment the tree doesn't include formatted values at all. Replacing this behavior with what you're doing in Parso would make a lot of sense for Python 3.8. @davidhalter, speaking of contributing the new tokenizer wholesale, yeah, it's going to be problematic due to backwards compatibility and that would require a bit more discussion. I know you're in Switzerland but will you be at PyCon US by any chance? In any case, I'm @llanga on Twitter, let's schedule further collaboration there. I hijacked this issue enough (sorry @sigmavirus24!). |
I'd be happy to create a AST/CST PyCQA team so you could take advantage of GitHub discussions if you'd all like or you could take this to [email protected] |
Hi all,
We recently had PyCQA/pydocstyle#307 and PyCQA/pycodestyle#739 opened on two projects looking for a new feature that already exists in
pylint
. It got me thinking. We're a loose coalition of folks with the same mission building similar tools and with more-or-less the same audience. Is there value in us producing some amount of documentation (a little or a lot, I don't really mind either way) around best practices for command-line options, and other behaviours? Should we be aggressively creating libraries to help each other deal with things like this? For example, pycodestyle, pydocstyle, and Flake8 all handle# noqa
and they all do so a little differently. Should that be a distinct library or should it be part of a bigger one? Are the subtle differences there fine? Pylint has a far more advanced way of disabling checks, should we all be moving to that? Do people think this conversation is even worth having?cc @PyCQA/pylint-dev @PyCQA/pydocstyle-dev @PyCQA/pep8-dev
The text was updated successfully, but these errors were encountered: