Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static typing #1408

Open
4 tasks done
kir0ul opened this issue Sep 21, 2021 · 2 comments
Open
4 tasks done

Static typing #1408

kir0ul opened this issue Sep 21, 2021 · 2 comments
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)

Comments

@kir0ul
Copy link

kir0ul commented Sep 21, 2021

Is your feature request related to a problem? Please describe.
I'm trying to get a fulling working type checking pipeline on my codebase with Mypy and I'm getting the following kind of errors:

error: Skipping analyzing "pynwb.base": found module but no type hints or library stubs
error: Skipping analyzing "pynwb.image": found module but no type hints or library stubs
error: Skipping analyzing "pynwb.ophys": found module but no type hints or library stubs

Describe the solution you'd like
Not a top priority but I was wondering if you would consider adding static typing or stub files at some point?

Describe alternatives you've considered
At the moment the alternative is to completely ignore PyNWB regarding type checking.

Checklist

  • Have you ensured the feature or change was not already reported?
  • Have you included a brief and descriptive title?
  • Have you included a clear description of the problem you are trying to solve?
  • Have you checked our Contributing document?
@oruebel oruebel added category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) labels Sep 22, 2021
@oruebel
Copy link
Contributor

oruebel commented Sep 22, 2021

PyNWB uses the docval decorator from HDMF for type checking. Using docval allows us to enforce types and standardize documentation of types (also type hints didn't exist in Python at that time). This will require some further investigation, but all the type information is already in the docval definitions for the functions and I think the easiest and most reliable way to add type hints in PyNWB would be to update the docval decorator to add that information to the functions. The source code for the docval decoratory is here https://github.com/hdmf-dev/hdmf/blob/695bd87dcb37fd8969ca510e8e35f893a6bddc61/src/hdmf/utils.py#L458 If this can be done by updating docval, then I think the code necessary to add this should not be too bad. An alternative approach could also be to have a function that looks at the docval of functions and generates the stubfiles for mypy. However, I think the more useful approach would be if the type hints can be added by docval.

@ajtritt @rly thoughts?

@sneakers-the-rat
Copy link
Contributor

sneakers-the-rat commented Jul 17, 2024

checking to see if this had been raised -

everything that docval can do can now be done by python typing, and beartype is a super fast drop in validator for function signatures.

couple reasons why deprecating docval would be a good idea imo

  • as OP said, the function signatures are opaque, which is bad for everyday usage and development. signatures could be forwarded by assigning to __annotation__ and __signature__, but this would require one to construct the python type annotations, so if that's being done you might as well just hook it up to a script that replaces @docval
  • after this one deepcopy call, doctest is the single largest contributor to bad performance: in a 396s run of the integration tests, fully 289.8s cumulative time (72%) is spent in _check_args. Note that that cumulative time does not include the time spent calling the function it wraps. Another way of putting that is to say that @docval wrapping singlehandedly makes pynwb 4x slower. Each individual call is fast-ish, but when every function is wrapped in docval, and the call trees are as deep as hdmf's, it adds up to be enormous (eg. in this run that i cancelled bc it was taking too long, there are 6.5m calls to __parse_args, 24m calls to check_type, 98m calls to isinstance and so on. Most of that is from the body of __parse_args, so there isn't really a cheap way to make that faster, there are just a lot of operations done and objects created in there.
  • no function can be memoized/cached - hdmf and pynwb both have a lot of functions that are called many times with the same arguments (sort of the nature of data format libraries), but since every argument takes **kwargs it can't be cached with eg. functools.lru_cache.
  • static analysis tools won't work - ruff and i believe pyright both are literal static analysis tools - they don't run the code, so even if function signatures were added dynamically when wrapping, this would be invisible to type checkers, which makes development with pynwb much harder than it otherwise could be
  • lots and lots of boilerplate - every pynwb and hdmf function starts by unpacking kwargs
  • double the size of tracebacks: reading tradebacks is a pain because func_call happens between every actual call.

imo it would be a huge improvement to both pynwb and hdmf, developer quality of life is a big determinant of contributorship, and when i can't use static analysis tools and running the tests takes 35 minutes and i spend a whole day trying to make a 20 line contribution, i'm less likely to do so, which means all the labor ends up shunted over to y'all.

fortunately it would be a pretty easy swap from what i can tell - beartype is fast and requires no additional code, and if you want to validate shapes during function calls good news because i also wrote that for ya, and that can still be memoized/cached as long as the arguments are hashable, which is the usual requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s)
Projects
None yet
Development

No branches or pull requests

3 participants