Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete pandas metadata #427

Closed
2 of 3 tasks
troyraen opened this issue Nov 5, 2024 · 0 comments · Fixed by #429
Closed
2 of 3 tasks

Incomplete pandas metadata #427

troyraen opened this issue Nov 5, 2024 · 0 comments · Fixed by #429
Assignees
Labels
bug Something isn't working

Comments

@troyraen
Copy link
Collaborator

troyraen commented Nov 5, 2024

Bug report

The small_sky_object_catalog has pandas metadata that is missing the hats-specific columns, so the pandas schema is different from the actual schema. This is causing some of the verification unit tests to fail. The two schemas matched before the hipscat -> hats transition. I'm not yet sure whether this is just small_sky_object_catalog or others as well.

Names in each schema:

>>> import pyarrow.dataset
>>> schema = pyarrow.dataset.dataset('tests/data/small_sky_object_catalog').schema
>>>
>>> [col['name'] for col in schema.pandas_metadata['columns']]
['id', 'ra', 'dec', 'ra_error', 'dec_error']
>>>
>>> schema.names
['_healpix_29',
 'id',
 'ra',
 'dec',
 'ra_error',
 'dec_error',
 'Norder',
 'Dir',
 'Npix']

Full schemas:

>>> schema
_healpix_29: int64
id: int64
ra: double
dec: double
ra_error: int64
dec_error: int64
Norder: uint8
Dir: uint64
Npix: uint64
-- schema metadata --
pandas: '{"index_columns": [], "column_indexes": [], "columns": [{"name":' + 616
>>>
>>> schema.pandas_metadata
{'index_columns': [],
 'column_indexes': [],
 'columns': [{'name': 'id',
   'field_name': 'id',
   'pandas_type': 'int64',
   'numpy_type': 'int64',
   'metadata': None},
  {'name': 'ra',
   'field_name': 'ra',
   'pandas_type': 'float64',
   'numpy_type': 'float64',
   'metadata': None},
  {'name': 'dec',
   'field_name': 'dec',
   'pandas_type': 'float64',
   'numpy_type': 'float64',
   'metadata': None},
  {'name': 'ra_error',
   'field_name': 'ra_error',
   'pandas_type': 'int64',
   'numpy_type': 'int64',
   'metadata': None},
  {'name': 'dec_error',
   'field_name': 'dec_error',
   'pandas_type': 'int64',
   'numpy_type': 'int64',
   'metadata': None}],
 'creator': {'library': 'pyarrow', 'version': '17.0.0'},
 'pandas_version': '2.2.3'}

Before submitting
Please check the following:

  • I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
  • I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a description of what I expected instead.
  • If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.
@troyraen troyraen added the bug Something isn't working label Nov 5, 2024
@troyraen troyraen mentioned this issue Nov 5, 2024
11 tasks
@delucchi-cmu delucchi-cmu self-assigned this Nov 7, 2024
@delucchi-cmu delucchi-cmu moved this to Todo in HATS / LSDB Nov 7, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in HATS / LSDB Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants