Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shortcuts for long typing.Annotated type hints #103

Open
lecode-official opened this issue Nov 20, 2023 · 3 comments
Open

Add shortcuts for long typing.Annotated type hints #103

lecode-official opened this issue Nov 20, 2023 · 3 comments

Comments

@lecode-official
Copy link

Some frameworks, such as FastAPI make heavy use of typing.Annotated to add metadata to function parameters. This becomes unwieldy very fast, if the entire annotation has to be added in the docstring. For example, consider the following pseudo-code for a FastAPI function that gets invoked for an HTTP GET request to "/{id}":

@app.get(path='/{id}')
async def get_entity_by_id(
    entity_id: Annotated[
        int,
        Path(
            alias='id',
            title='Entity ID',
            description='The ID of the entity that is to be retrieved.',
            gt=0,
            examples=[1, 123]
        )
    ]
) -> Entity:
    ...

The metadata can become arbitrarily large and I would bet there are more frameworks out there that also use typing.Annotated in this way.

I want to propose to add an exception for typing.Annotated. I have read Cases that pydoclint is not designed to handle and Notes on writing type hints and I understand that using the AST module does not allow us to derive any semantic meaning about type annotations, but I think it would still be beneficial to make an exception on matching names alone, i.e., when the name of the type starts with Annotated[ or typing.Annotated[, then the user can substitute the actual type hint with a short hand.

Since the actual type that results from the annotation is the type specified as the first argument to typing.Annotated, I would propose that the actual type should be allowed as a short hand. So instead of adding the entire type to the docstring, it would suffice to specify the first argument of Annotated. For example, for the function definition above, PyDocLint should not raise a validation error, when the following docstring would be added:

@app.get(path='/{id}')
async def get_entity_by_id(
    entity_id: Annotated[
        int,
        Path(
            alias='id',
            title='Entity ID',
            description='The ID of the entity that is to be retrieved.',
            gt=0,
            examples=[1, 123]
        )
    ]
) -> Entity:
    """Retrieve the entity with the specified ID.

    Args:
        entity_id (int): The ID of the entity that is to be retrieved.

    Returns:
        Entity: The entity with the specified ID.
    """
    ...

I have had a look at PyDocLint's code and I believe (besides updating tests and docs), the only thing that needs to be changed is in the Arg._typeHintsEq method in the pydoclint/utils/arg.py file. I believe the following heuristic would be enough to implement this:

@classmethod
def _typeHintsEq(cls, hint1: str, hint2: str) -> bool:
    try:
        hint1_: str = unparseAnnotation(ast.parse(stripQuotes(hint1)))
    except SyntaxError:
        hint1_ = hint1

    try:
        hint2_: str = unparseAnnotation(ast.parse(stripQuotes(hint2)))
    except SyntaxError:
        hint2_ = hint2

    return (
        hint1_ == hint2_ or
        hint1_.startswith(f'typing.Annotated[{hint2_},') or hint1_.startswith(f'Annotated[{hint2_},') or
        hint2_.startswith(f'typing.Annotated[{hint1_},') or hint2_.startswith(f'Annotated[{hint1_},')
    )

Before submitting a pull request, I wanted to get feedback from you. Would you be willing to include an exception for typing.Annotated?

@jsh9
Copy link
Owner

jsh9 commented Nov 21, 2023

Yes, sure! Contributions are very welcome.

@lecode-official
Copy link
Author

Great, I will fork the repository and send a merge request

@lecode-official
Copy link
Author

While working on the new feature, I came along the stripQuotes function. I do not quite understand the regular expression r'Literal\[[^\]]+\]|[^L]+', which either matches "Literal[...]" or all strings that do not contain an uppercase "L". Why an uppercase "L"? It does not seem to be a problem, for example, in the type List['str'] it matches the sub-string "ist['str']" and removes the quotes, so it still works as intended. I just don't understand why the "L" was chosen.

There is, however, a more pressing issue with the stripQuotes approach: type annotations are allowed to contain function invocations as arguments, which in turn can have string arguments. This is, for example, used by Pydantic, where the Field function can be used with Annotated, and by FastAPI, where the Path, Query, Header, etc. functions can be used with Annotated. Stripping quotes here can result in syntax errors, which are gracefully handled by the try: ... except SyntaxError: ... block in the _typeHintsEq method, but this also means that the type hints are now compared without the unparsing. For example, the following code leads to an error, as the type annotation in the function definition gets parsed and therefore its double quotes are normalized to single quotes. The parsing of the return type fails, because the quotes were stripped, which results in the string Annotated[str, annotate(This is a parameter.)] that is syntactically incorrect. Therefore, the return type hint keeps its double quotes, which makes the two type hints unequal:

def annotate(name: str) -> str:
    """A field.

    Args:
        name (str): The name of the field.

    Returns:
        str: Returns the name of the field.
    """

    return name

def test(a: Annotated[str, annotate("This is a parameter.")]) -> str:
    """Performs a test.

    Args:
        a (Annotated[str, annotate("This is a parameter.")]): The parameter a.

    Returns:
        str: Returns the value of the parameter a.
    """

    return a

This is obviously wrong, since both type hints are the same to the dot. "Annotated" could be added to the regular expression in stripQuotes, but this would only be a band aid, since users can create custom types that allow for string arguments. And trying to find out whether the string is a quoted type or an actual string is impossible, because we are lacking the type information.

I was thinking that the quotes could be stripped after the unparsing, i.e., two type hints are considered equal if they only differ in quotes. In this case all quotes could be removed without having to deal with special cases like Literal and function invocations. This would solve the problem of false positives, such as in the example above. It would, however, cause false negatives, because the type hints Literal['a', 'b', 'c'] and Literal['a, b', 'c'] would also be considered the same. But I don't think that this would actually happen in real world code and even if it happens, it is still better to not get an error for unequal type hints than getting an error for equal type hints.

Also, this is definitely an edge case that will probably never happen, but the stripping of the single backticks in the stripQuotes function should test if the string is longer than 2 not 3. Otherwise, types that only have a single character would not get their backticks stripped. For example, the following code would result in an error:

T = TypeVar('T')

def test2(t: T) -> None:
    """Performs a test.

    Args:
        t (`T`): The parameter.
    """

    pass

Finally, I have noticed that unlike the Arg class, the ReturnArg class just compares the types directly without stripping quotes and unparsing. This can result in similar false positives, as in the first example, e.g., the following code would result in an error, although the return type hints are exactly the same:

def do_work() -> Literal["Success", "Error"]:
    """Performs work.

    Returns:
        Literal["Success", "Error"]: Returns "Success" or "Error".
    """

    return "Success"

The reason is that the return type in the function declaration was parsed, which results in the double quotes being normalized to single quotes, while return type in the docstring does not get parsed, so the double quotes remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants