Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabling bytecode generation #7

Open
dsegan opened this issue Jun 15, 2023 · 2 comments
Open

Disabling bytecode generation #7

dsegan opened this issue Jun 15, 2023 · 2 comments

Comments

@dsegan
Copy link

dsegan commented Jun 15, 2023

FWIW, I've seen significant speedups, especially with pytest, on large codebases when bytecode generation is enabled.

The nitty gritty details are that:

  • Pytest does assertion rewriting for all the modules being imported: with bytecode generation enabled, rewritten bytecode is stored in .pyc files.
  • Bytecode generation is fast in itself, but much slower when pytest needs to rewrite all the assertions
  • If you have a large number of test files, and especially non-ideally factored code where a large part of the world gets imported when running tests, it was a significant boost not to have to rewrite .pyc files: IIRC, it was going from 45s for pytest --collect to 15s for this particular codebase.

So if you are after fast pytest runs, especially for single test runs, I'd revisit the advise on PYTHONDONTWRITEBYTECODE.

@zupo
Copy link
Owner

zupo commented Jun 15, 2023

Oh, wow, that sounds incredible! Could you tell me more about the specific use case? Ideally, we'd have a large repo of dummy/generated code that we can run pytest again, with PYTHONDONTWRITEBYTECODE enabled or disabled, and compare results. This repo could then be linked as a source for the claim to enable/disable PYTHONDONTWRITEBYTECODE.

@dsegan
Copy link
Author

dsegan commented Jun 15, 2023

It's not that hard to reproduce. I've tried to think of a decent large Python project using pytest, and sqlalchemy came to mind:

$ git clone https://github.com/sqlalchemy/sqlalchemy.git && cd sqlalchemy
$ python3 -m venv venv
$ . venv/bin/activate
$ pip3 install pytest
$ pytest --collect-only test
...
======================================= 30386 tests collected in 12.13s ========================================

real	0m14,501s
user	0m13,827s
sys	0m0,335s
$ pytest --collect-only test
...
======================================== 30386 tests collected in 6.21s ========================================

real	0m7,938s
user	0m7,664s
sys	0m0,151s
$ find -name '*.pyc' |xargs rm  # TO REPLICATE NO .pyc files being written
$ pytest --collect-only test
...
======================================= 30386 tests collected in 12.03s ========================================

real	0m14,388s
user	0m14,045s
sys	0m0,252s

Basically, if you had PYTHONDONTWRITEBYTECODE set, you'd always be getting that 14s start before getting any single test run, whereas with .pyc files there (you usually only change a few files with any change, and those will be rewritten), you are down to 8s.

Note that there is a workaround: instead of using pytest pattern matching for test names, you can specifically call out to path/to/test_file.py::TestClass::test_function when pytest does not go through the collection step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants