Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unfriendly failure on non-overlapping catalogs #512

Closed
2 of 3 tasks
delucchi-cmu opened this issue Nov 25, 2024 · 1 comment · Fixed by #537
Closed
2 of 3 tasks

Unfriendly failure on non-overlapping catalogs #512

delucchi-cmu opened this issue Nov 25, 2024 · 1 comment · Fixed by #537
Assignees
Labels
bug Something isn't working

Comments

@delucchi-cmu
Copy link
Contributor

Bug report

If you take two catalogs that are totally non-overlapping and try to crossmatch them, LSDB fails in a way that would be confusing to end-users.

ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set

with stack trace:

src/lsdb/catalog/catalog.py:214: in crossmatch
    ddf, ddf_map, alignment = crossmatch_catalog_data(
src/lsdb/dask/crossmatch_catalog_data.py:117: in crossmatch_catalog_data
    left_pixels, right_pixels = get_healpix_pixels_from_alignment(alignment)
src/lsdb/dask/merge_catalog_functions.py:212: in get_healpix_pixels_from_alignment
    left_pixels = make_pixel(
../../../.virtualenvs/demo/lib/python3.12/site-packages/numpy/lib/function_base.py:2372: in __call__
    return self._call_as_normal(*args, **kwargs)
../../../.virtualenvs/demo/lib/python3.12/site-packages/numpy/lib/function_base.py:2365: in _call_as_normal
    return self._vectorize_call(func=func, args=vargs)
../../../.virtualenvs/demo/lib/python3.12/site-packages/numpy/lib/function_base.py:2450: in _vectorize_call
    ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)

Attaching a dataset suitable for unit tests (100 points randomly generated inside HealpixPixel(0,0), with some silly string identifier). The pixels and MOC are totally disjoint from the small_sky* catalogs.

disjoint.csv

I'm not sure where exactly the condition should be caught, and if we should provide an error, or simply an empty catalog.

Before submitting
Please check the following:

  • I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
  • I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a description of what I expected instead.
  • If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.
@delucchi-cmu delucchi-cmu added the bug Something isn't working label Nov 25, 2024
@smcguire-cmu smcguire-cmu self-assigned this Jan 6, 2025
@smcguire-cmu smcguire-cmu moved this to In Progress in HATS / LSDB Jan 6, 2025
@hombit
Copy link
Contributor

hombit commented Jan 6, 2025

I have the same issue with my pipeline, where I cross-match a sparse Gaia variable catalog with PS1 (paths on PSC):

import lsdb
from dask.distributed import Client
from lsdb.core.search.pixel_search import PixelSearch

search_filter = PixelSearch([(1, 33)])
ps1_otmo = lsdb.read_hats(
    "/ocean/projects/phy210048p/shared/hats/catalogs/ps1/ps1_otmo",
    margin_cache="/ocean/projects/phy210048p/shared/hats/catalogs/ps1/ps1_otmo_10arcs",
    columns=(
        ['objID', 'raMean', 'decMean']
    ),
    search_filter=search_filter,
)

gaia_var = lsdb.read_hats(
    "/ocean/projects/phy210048p/malanche/zubercal-filtering/hats/gaia_dr3_vcep",
    search_filter=search_filter,
)

result = gaia_var.crossmatch(
    ps1_otmo,
    radius_arcsec=1.0,
    suffixes=["", ""],
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants