Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type annotation for Index/MultiIndex.names is incorrect #804

Open
tswast opened this issue Oct 26, 2023 · 11 comments
Open

type annotation for Index/MultiIndex.names is incorrect #804

tswast opened this issue Oct 26, 2023 · 11 comments
Labels
Blocked Requires external dependency before progressing good first issue Index Related to the Index class or subclasses

Comments

@tswast
Copy link

tswast commented Oct 26, 2023

Describe the bug

type annotation for Index/MultiIndex.names is incorrect

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.
In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"a": ["a", "b", "c"], "i": [10, 11, 12]}, index=pd.Index([5, 10, 15], name="idx"))

In [4]: df.index.names
Out[4]: FrozenList(['idx'])

In [5]: df.index.names = (None,)

In [6]: df.index.names
Out[6]: FrozenList([None])
  1. Indicate which type checker you are using (mypy or pyright).

mypy

  1. Show the error message received from that type checker while checking your example.

bigframes/core/blocks.py:436: error: Incompatible types in assignment (expression has type "tuple[None]", variable has type "list[str]") [assignment]

Please complete the following information:

  • OS: [e.g. Windows, Linux, MacOS] Linux
  • OS Version [e.g. 22]

Distributor ID: Debian
Description: Debian GNU/Linux rodete
Release: n/a
Codename: rodete

  • python version Python 3.10.9
  • version of type checker mypy==1.5.1
  • version of installed pandas-stubs pandas-stubs==2.0.3.230814

Additional context
Add any other context about the problem here.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 26, 2023

There are a few things going on here, and I don't think we can do much about them.

First, from a static typing perspective, we can't track the type of DataFrame.index. It could be a regular Index, or a MultiIndex. So if you know your DataFrame is backed by a MultiIndex, you'd have to cast df.index to the MultiIndex type.

Second, whether the underlying index is single-dimensional or a MultiIndex, df.index.names will return a list of strings. It is also possible to return a list of None if you clobber the names, but if we use static typing to declare that Index.names returns a list[str | None], that will force more people to cast the result of Index.names.

Finally, you reported a mypy error on your own code, but we'd prefer an example that is self-contained and can be run directly through the type checker. ipython code can't be used that way.

I'm going to close this, but am willing to reopen it if you can convince me otherwise.

@Dr-Irv Dr-Irv closed this as completed Oct 26, 2023
@tswast
Copy link
Author

tswast commented Oct 26, 2023

names is available on both Index and MultiIndex. I think that's a moot point in regards to this issue.

Isn't list[str] incorrect though? Unnamed indexes are incredibly common. Here's a standalone example:

(dev-3.10-pip) ➜  pandas-stubs-804 cat sample.py
import pandas as pd

df = pd.DataFrame({"a": ["a", "b", "c"], "i": [10, 11, 12]})
print(df.index.names)

# OK
df.index.names = ["idx"]
print(df.index.names)

# Not OK, but works
df.index.names = ("idx2",)
print(df.index.names)
df.index.names = [None]
print(df.index.names)
df.index.names = (None,)
print(df.index.names)
(dev-3.10-pip) ➜  pandas-stubs-804 mypy sample.py
sample.py:11: error: Incompatible types in assignment (expression has type "tuple[str]", variable has type "list[str]")  [assignment]
sample.py:13: error: List item 0 has incompatible type "None"; expected "str"  [list-item]
sample.py:15: error: Incompatible types in assignment (expression has type "tuple[None]", variable has type "list[str]")  [assignment]
Found 3 errors in 1 file (checked 1 source file)

@tswast
Copy link
Author

tswast commented Oct 26, 2023

@Dr-Irv , I've added a standalone sample demonstrating the issue.

@tswast
Copy link
Author

tswast commented Oct 26, 2023

Output of sample.py showing the default name is None:

(dev-3.10-pip) ➜  pandas-stubs-804 python sample.py
[None]
['idx']
['idx2']
[None]
[None]

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 27, 2023

Thanks @tswast for the example. I will reopen.

The property for names should be updated here:

in two ways:

  1. The "getter" should return list[str | None]
  2. The "setter" should allow any SequenceNotStr[str] (but not Sequence[str])

PR with tests welcome.

@Dr-Irv Dr-Irv reopened this Oct 27, 2023
@Dr-Irv Dr-Irv added good first issue Index Related to the Index class or subclasses labels Oct 27, 2023
@twoertwein
Copy link
Member

using str | None sounds reasonable. Technically, any type seems to be accepted at runtime:

>>> df.index.names = [None]
>>> df.index.names
FrozenList([None])

>>> df.index.names = [1]
>>> df.index.names
FrozenList([1])

The pandas-internal annotations declare name as any hashable object:
https://github.com/pandas-dev/pandas/blob/e86ed377639948c64c429059127bcf5b359ab6be/pandas/core/indexes/base.py#L1657C5-L1657C32

I couldn't find names, but the doc-string here says that the elements have to be hashable https://github.com/pandas-dev/pandas/blob/e86ed377639948c64c429059127bcf5b359ab6be/pandas/core/indexes/base.py#L1753

@loicdiridollou
Copy link
Contributor

Thanks @tswast for the example. I will reopen.

The property for names should be updated here:

in two ways:

  1. The "getter" should return list[str | None]
  2. The "setter" should allow any SequenceNotStr[str] (but not Sequence[str])

PR with tests welcome.

Actually I think the getter should also return SequenceNotStr because if you set names as a tuple, you will never be able to return a list[str | None] (since there is no process when we set the names to convert them into anything else than their original name but please correct me if I am missing something.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Nov 13, 2024

Actually I think the getter should also return SequenceNotStr because if you set names as a tuple, you will never be able to return a list[str | None] (since there is no process when we set the names to convert them into anything else than their original name but please correct me if I am missing something.

Can you provide an example?

This may be one of the cases where if we declare the getter to return anything that could possibly be returned at runtime, then the most typical usage of returning a list of strings would force people to do a cast on the result. So I lean towards supporting the typical usage that lets people avoid using cast in their pandas code.

@loicdiridollou
Copy link
Contributor

My bad I did not try it at runtime and thought the type would carry over in the getter but indeed it is a FrozenList that gets returned, let me write a quick test to make sure but I believe you are correct!

@loicdiridollou
Copy link
Contributor

Added a PR to see it in action, there still seems to be an issue with the setter type not getting recognize (mypy complains that it expects a list while we allow the setter to receive a SequenceNotStr.
Open to ideas on this one.

@loicdiridollou loicdiridollou added the Blocked Requires external dependency before progressing label Nov 15, 2024
@loicdiridollou
Copy link
Contributor

Still waiting on mypy with support for getters and setters support from @Dr-Irv's comment #1031 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocked Requires external dependency before progressing good first issue Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants