Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more detail about available HATS properties on import. #474

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions docs/catalogs/arguments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -342,3 +342,17 @@ using the LSST nomenclature:

Some data providers split detection-level data into a separate catalog, to make object
catalogs smaller, and reflects a relational data model.

Additional catalog properties
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The HATS format allows for many additional key-values in the high-level ``properties``
file. Many of these values are automatically set by the import process itself, but
catalog providers may want to set additional fields for data provenance.

This full set of properties is outlined on a separate page (:doc:`properties`),
but you can pass these key-value sets to the import process with the ``addl_hats_properties``
argument, and they will appear in the final ``properties`` file:

.. code-block::

addl_hats_properties={"hats_cols_default": "id, mjd", "obs_regime": "Optical"},
193 changes: 193 additions & 0 deletions docs/catalogs/properties.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@

HATS properties
===============================================================================

The HATS format allows for many additional key-values in the high-level ``properties``
file. Many of these values are automatically set by the import process itself, but
catalog providers may want to set additional fields for data provenance.

Properties from Arguments
-------------------------------------------------------------------------------

In some cases, the argument name was created before the HATS properties were settled,
so we have different names for arguments/properties. Below is a list of properties
that we will set automatically in the import process, based on the arguments provided.

.. list-table::
:header-rows: 1

* - **HATS Keyword**
- **Argument**
- **Format**
- **Notes**
* - ``obs_collection``
- ``output_artifact_name``
- one word (e.g. ``"2MASS"``) [#f1]_
- Short name of original data set
* - ``dataproduct_type``
- ``catalog_type`` (defaults to ``"object"``)
- one word (object, nested, margin, association, index, ...) [#f1]_
- Variety of HATS catalog generated.
* - ``hats_col_ra``
- ``ra_column``
- one word [#f1]_
- Column name of the ra coordinate. Used for partitioning and default cross-matching.
* - ``hats_col_dec``
- ``dec_column``
- one word [#f1]_
- Column name of the dec coordinate. Used for partitioning and default cross-matching.
* - ``hats_cols_sort``
- ``sort_columns``
- one or more words [#f2]_
- At catalog creation time, the columns used to sort the data, in addition to `healpix_29` column.
* - ``hats_max_rows``
- ``pixel_threshold``
- positive integer
- At catalog creation time, the maximum number of rows per file before breaking into 4 new
files at higher order.

Constructed Properties
-------------------------------------------------------------------------------

Below is a list of properties that we will set automatically in the import process,
based on the execution environment and output of the import pipeline.

.. list-table::
:header-rows: 1

* - **HATS Keyword**
- **Format**
- **Notes**
* - ``hats_nrows``
- positive integer
- Number of rows of the HATS catalog
* - ``hats_order``
- positive integer
- Deepest HATS order of catalog
* - ``moc_sky_fraction``
- real between 0 and 1
- Fraction of the sky covered by the MOC associated with the catalog
* - ``hats_estsize``
- positive integer, with unit ``KB``
- HATS on-disk size estimation
* - ``hats_builder``
- free text (e.g. ``"hats-import v0.6.4"``)
- Name and version of the tool used for building the HATS catalog
* - ``hats_creation_date``
- ISO 8601 ``YYYY-mm-ddTHH:MMZ``
- HATS first creation date
* - ``hats_release_date``
- ISO 8601 ``YYYY-mm-ddTHH:MMZ``
- Last update to the HATS format
* - ``hats_version``
- semantic version (e.g. ``"0.1"``)
- Number of HATS specification document version

User-provided Properties
-------------------------------------------------------------------------------

In addition, catalog providers can set provenance-related fields in the
``properties`` file by setting the ``addl_hats_properties`` argument on import.
This will carry key-value pairs into the final ``properties`` file.

All of these fields are optional.

.. code-block:: python

args = ImportArguments(
...
addl_hats_properties={"hats_cols_default": "id, mjd", "obs_regime": "Optical"},
...
)

.. list-table::
:header-rows: 1

* - **HATS Keyword**
- **Format**
- **Notes**
* - ``hats_cols_default``
- one or more words [#f2]_
- Which columns should be read from parquet files, when user doesn't otherwise specify.
Useful for wide tables.
* - ``hats_cols_survey_id``
- one or more words [#f2]_
- The primary key used in the original survey data. May be multiple columns if
the survey uses a composite key (e.g. ``object_id MJD`` for detections)
* - ``hats_coordinate_epoch``
- one word [#f1]_
- For the default ra and dec (hats_col_ra, hats_col_dec), the measurement epoch
* - ``hats_frame``
- one word ("equatorial" (ICRS), "galactic", or "ecliptic") [#f1]_
- Coordinate frame reference
* - ``creator_did``
- IVOID (e.g. ``ivo://CDS/P/2MASS/J``)
- Unique ID of the HATS
* - ``addendum_did``
- IVOID
- If content has been added after initial catalog creation, ``creator_did``
of any added data
* - ``bib_reference``
- free text
- Bibliographic reference
* - ``bib_reference_url``
- URL
- URL to bibliographic reference
* - ``data_ucd``
- IVOA UCD
- UCD describing data contents
* - ``hats_copyright``
- free text
- Copyright mention associated to the HATS
* - ``hats_creator``
- free text
- Institute or person who built the HATS
* - ``hats_progenitor_url``
- URL
- URL to an associated progenitor HATS
* - ``hats_service_url``
- URL
- HATS access url
* - ``hats_status``
- one or more words (("private" or "public"), ("main", "mirror", or "partial"),
("clonable", "unclonable" or "clonableOnce")) [#f2]_
- HATS catalog status
* - ``obs_ack``
- free text
- Acknowledgment mention.
* - ``obs_copyright``
- free text
- Copyright mention associated to the original data
* - ``obs_copyright_url``
- URL
- URL to a copyright mention
* - ``obs_description``
- free text, longer free text description of the dataset
- Data set description
* - ``obs_regime``
- one word ("Radio", "Millimeter", "Infrared", "Optical", "UV", "EUV",
"X-ray", "Gamma-ray") [#f1]_
- General wavelength
* - ``obs_title``
- free text, one line. (e.g. "HST F110W observations")
- Data set title
* - ``prov_progenitor``
- free text
- Provenance of the original data
* - ``publisher_id``
- IVOID (e.g. "ivo://CDS")
- Unique ID of the HATS publisher
* - ``t_min``
- real, MJD
- Start time of the observations
* - ``t_max``
- real, MJD
- Stop time of the observations

.. rubric:: Notes on "Words"

.. [#f1] "one word" implies a single name, consisting of alphanumerics, underscore or hyphens.
the field may act either as an identifier, column name, or an enumeration of valid values.
.. [#f2] "one or more words" generally means a set of values, taken from some valid
enumeration. These can either be separated by spaces (``"value1 value2"``),
or commas (``"value1,value2"``).
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ threading issues with dask:
catalogs/arguments
guide/hipscat_conversion
catalogs/temp_files
catalogs/properties
catalogs/public/index

.. toctree::
Expand Down
Loading