Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: astype method fails to raise errors for category data type #59899

Open
3 tasks done
noahblakesmith opened this issue Sep 26, 2024 · 4 comments
Open
3 tasks done

API: astype method fails to raise errors for category data type #59899

noahblakesmith opened this issue Sep 26, 2024 · 4 comments
Labels
API Design Categorical Categorical Data Type

Comments

@noahblakesmith
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

col = pd.Series(["a", "b", "c"], dtype=str)
cat = pd.api.types.CategoricalDtype(categories=["a", "b"])

col = col.astype(dtype=cat, errors="raise")
print(col)

0      a
1      b
2    NaN
dtype: category
Categories (2, object): ['a', 'b']

Issue Description

No error is raised when recasting as a category, despite the presence of an undefined value, c. Rather, c is coerced to NaN.

This behavior appears inconsistent with that of other data types, such as int.

Expected Behavior

I believe an error should be raised.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.10.14.final.0
python-bits : 64
OS : Darwin
OS-release : 23.6.0
Version : Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 2.2.2
numpy : 2.0.1
pytz : 2024.2
dateutil : 2.9.0.post0
setuptools : 72.1.0
pip : 24.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.26.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.9.0
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.4
pandas_gbq : None
pyarrow : 17.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

@noahblakesmith noahblakesmith added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 26, 2024
@asishm
Copy link
Contributor

asishm commented Sep 26, 2024

Thanks for the report, could you please update the title to have a description?

That said, based on this comment - #51074 (comment) this is expected behavior

@asishm asishm added the Categorical Categorical Data Type label Sep 26, 2024
@rhshadrach
Copy link
Member

This behavior appears inconsistent with that of other data types, such as int.

Can you give an example that demonstrates the inconsistency?

@rhshadrach rhshadrach added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 28, 2024
@noahblakesmith
Copy link
Author

Sure thing @rhshadrach. Here is an example using int, which throws an error. I also tested float, "Int64", and "int64[pyarrow]", which produced similar errors.

import pandas as pd

col = pd.Series(["a", "b", "c"])
col = col.astype(dtype=int, errors="raise")

Traceback (most recent call last):
  File "./test.py", line 4, in <module>
    col = col.astype(dtype=int, errors="raise")
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/generic.py", line 6643, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 430, in astype
    return self.apply(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 363, in apply
    applied = getattr(b, f)(**kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 758, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 237, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 182, in astype_array
    values = _astype_nansafe(values, dtype, copy=copy)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/test/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 133, in _astype_nansafe
    return arr.astype(dtype, copy=True)
ValueError: invalid literal for int() with base 10: 'a'

@noahblakesmith noahblakesmith changed the title BUG: BUG: astype method fails to raise errors for category data type Sep 29, 2024
@rhshadrach
Copy link
Member

Thanks @noahblakesmith. I would not call this inconsistent since categorical dtype has it's own specialized semantics as @asishm mentioned. This is well-established and purposeful behavior, so it is also not a bug.

That said, there is agreement this is undesired behavior. This is very closely related, and may even be fixed by, #40996.

@rhshadrach rhshadrach added API Design and removed Bug labels Sep 29, 2024
@rhshadrach rhshadrach changed the title BUG: astype method fails to raise errors for category data type API: astype method fails to raise errors for category data type Sep 29, 2024
@rhshadrach rhshadrach removed the Needs Info Clarification about behavior needed to assess issue label Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants