Skip to content

Commit

Permalink
Merge pull request #41 from kthyng/var_updates
Browse files Browse the repository at this point in the history
Var updates
  • Loading branch information
kthyng authored Jan 9, 2023
2 parents 5b8b60f + 2d4ba36 commit 7589bcb
Show file tree
Hide file tree
Showing 18 changed files with 626 additions and 1,555 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ repos:
rev: v0.982
hooks:
- id: mypy
additional_dependencies: [types-setuptools]
additional_dependencies: [types-setuptools, types-PyYAML]
exclude: docs/source/conf.py
args: [--ignore-missing-imports]

Expand Down
178 changes: 0 additions & 178 deletions docs/Demo-AK.ipynb

This file was deleted.

80 changes: 0 additions & 80 deletions docs/Demo-AK.md

This file was deleted.

219 changes: 0 additions & 219 deletions docs/Demo-CA.ipynb

This file was deleted.

883 changes: 0 additions & 883 deletions docs/Demo_workflows.ipynb

This file was deleted.

50 changes: 36 additions & 14 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ Make a catalog from datasets available from an ERDDAP server using `intake-erdda

#### Examples

Select a box and time range over which to search catalog:
Select a spatial box and time range over which to search catalog:

```{code-cell} ipython3
!omsa make_catalog --project_name test1 --catalog_type erddap --catalog_name example_erddap_catalog --description "Example ERDDAP catalog description" --kwargs server=https://erddap.sensors.ioos.us/erddap --kwargs_search min_lon=-170 min_lat=53 max_lon=-165 max_lat=56 min_time=2022-1-1 max_time=2022-1-2
Expand All @@ -96,6 +96,40 @@ Input model output to use to create the space search range, but choose time sear
!omsa make_catalog --project_name test1 --catalog_type erddap --catalog_name example_erddap_catalog --description "Example ERDDAP catalog description" --kwargs server=https://erddap.sensors.ioos.us/erddap --kwargs_search model_path=https://thredds.cencoos.org/thredds/dodsC/CENCOOS_CA_ROMS_FCST.nc min_time=2022-1-1 max_time=2022-1-2
```

Narrow your search by variable. For `intake-erddap` you can filter by the CF `standard_name` of the variable directly with:

```{code-cell} ipython3
!omsa make_catalog --project_name test1 --catalog_type erddap --catalog_name cat1 --kwargs server=https://erddap.sensors.ioos.us/erddap standard_names="[sea_surface_temperature,sea_water_temperature,surface_temperature]"
```

You can return equivalent results in your catalog by searching with a variable nickname (the keys in the dictionary) along with a dictionary defining a vocabulary of regular expressions for matching what "counts" as a particular variable. To save a custom vocabulary to a location for this command, use the `Vocab` class in `cf-pandas` ([docs](https://cf-pandas.readthedocs.io/en/latest/demo_vocab.html#save-to-file)). A premade set of vocabularies aimed at use by ocean modelers is also available to use by name; see them with command `omsa vocabs`. Suggested uses:
* axds catalog: vocab_name standard_names
* erddap catalog, IOOS: vocab_name erddap_ioos
* erddap catalog, Coastwatch: vocab_name erddap_coastwatch
* local catalog: vocab_name general

This is more complicated than simply defining the desired standard_names as shown in the previous example. However, it becomes useful when using other data files or model output which might have different variable names but could be reocgnized with variable matching through the vocabulary.

The example below uses the pre-defined vocabulary "erddap_ioos" since we are using the IOOS ERDDAP server, and will search for matching variables by standard_name and matching the variable nickname "temp". The "erddap_ioos" vocabulary can be investigated as shown here and contains exactly the same standard_names as in the previous example. The regular expressions are set up to match exactly those standard_names. This is why we return the same results from either approach.

```{code-cell} ipython3
import ocean_model_skill_assessor as omsa
import cf_pandas as cfp
vocab = cfp.Vocab(omsa.VOCAB_PATH("erddap_ioos"))
vocab
```

```{code-cell} ipython3
!omsa make_catalog --project_name test1 --catalog_type erddap --catalog_name cat3 --kwargs server=https://erddap.sensors.ioos.us/erddap category_search="[standard_name,temp]" --vocab_name erddap_ioos
```

You can additionally narrow your search by a text term by adding the `search_for` and `query_type` keyword inputs. This example searches for datasets containing the varaible "sea_surface_temperature" and, somewhere in the dataset metadata, the term "Timeseries". If we had wanted datasets that contain one OR the other, we could use `query_type=union`.

```{code-cell} ipython3
!omsa make_catalog --project_name test1 --catalog_type erddap --catalog_name cat2 --kwargs server=https://erddap.sensors.ioos.us/erddap standard_names="[sea_surface_temperature]" search_for="[Timeseries]" query_type=intersection
```

### Catalog for Axiom assets

Make a catalog of Axiom Data Science-stored assets using `intake-axds`.
Expand Down Expand Up @@ -136,25 +170,13 @@ Input model output to use to create the space search range, but choose time sear
!omsa make_catalog --project_name test1 --catalog_type axds --catalog_name example_axds_catalog --description "Example AXDS catalog description" --kwargs standard_names='[sea_water_practical_salinity,sea_water_temperature]' verbose=True --kwargs_search model_path=https://thredds.cencoos.org/thredds/dodsC/CENCOOS_CA_ROMS_FCST.nc min_time=2022-1-1 max_time=2022-1-2
```

Alternatively, filter returned datasets for variables using the variable nicknames along with a vocabulary of regular expressions for matching what "counts" as a variable. To save a custom vocabulary to a location for this command, use the `Vocab` class in `cf-pandas` ([docs](https://cf-pandas.readthedocs.io/en/latest/demo_vocab.html#save-to-file)). A premade set of vocabularies is also available to use by name; see them with command `omsa vocabs`. Suggested uses:
* axds catalog: vocab_name standard_names
* erddap catalog, IOOS: vocab_name erddap_ioos
* erddap catalog, Coastwatch: vocab_name erddap_coastwatch
* local catalog: vocab_name general

```
omsa make_catalog --project_name test1 --catalog_type axds --vocab_name standard_names --kwargs keys_to_match="[temp,salt]"
```

+++

## Run model-data comparison

Note that if any datasets have timezones attached, they are removed before comparison with the assumption that the model output and data are in the same time zone.

#### Available options

omsa run --project_name test1 --catalog_names CATALOG_NAME1 CATALOG_NAME2 --vocab_names VOCAB1 VOCAB2 --key KEY --model_path PATH_TO_MODEL_OUTPUT --ndatasets NDATASETS
omsa run --project_name test1 --catalogs CATALOG_NAME1 CATALOG_NAME2 --vocab_names VOCAB1 VOCAB2 --key KEY --model_path PATH_TO_MODEL_OUTPUT --ndatasets NDATASETS

* `project_name`: Subdirectory in cache dir to store files associated together.
* `catalog_names`: Catalog name(s). Datasets will be accessed from catalog entries.
Expand Down
4 changes: 1 addition & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,9 @@ Welcome to ocean-model-skill-assessor's documentation!
.. toctree::
:maxdepth: 3

demo.md
cli.md
create_vocabs.md
.. Demo-AK.md
.. Demo-CA
.. Demo_workflows
api
GitHub repository <https://github.com/axiom-data-science/ocean-model-skill-assessor>

Expand Down
65 changes: 60 additions & 5 deletions ocean_model_skill_assessor/CLI.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,24 @@
import ocean_model_skill_assessor as omsa


def is_int(s):
"""Check if string is actually int."""
try:
int(s)
return True
except (ValueError, TypeError):
return False


def is_float(s):
"""Check if string is actually float."""
try:
float(s)
return True
except (ValueError, TypeError):
return False


# https://sumit-ghosh.com/articles/parsing-dictionary-key-value-pairs-kwargs-argparse-python/
class ParseKwargs(argparse.Action):
"""With can user can input dicts on CLI."""
Expand All @@ -15,11 +33,17 @@ def __call__(self, parser, namespace, values, option_string=None):
"""With can user can input dicts on CLI."""
setattr(namespace, self.dest, dict())
for value in values:
key, value = value.split("=")
# maxsplit helps in case righthand side of input has = in it, like filenames can have
key, value = value.split("=", maxsplit=1)
# catch list case
if value.startswith("[") and value.endswith("]"):
# if "[" in value and "]" in value:
value = value.strip("][").split(",")
# change numbers to numbers but with attention to decimals and negative numbers
if is_int(value):
value = int(value)
elif is_float(value):
value = float(value)
getattr(namespace, self.dest)[key] = value


Expand Down Expand Up @@ -57,6 +81,10 @@ def main():
help="Input keyword arguments for the search specification. Dictionary-style input. More information on options can be found in `omsa.main.make_catalog` docstrings. Format for list items is e.g. standard_names='[sea_water_practical_salinity,sea_water_temperature]'.",
)

parser.add_argument(
"--vocab_name", help="Vocab file name, must be in the vocab user directory."
)

parser.add_argument(
"--catalog_name", help="Catalog name, with or without suffix of yaml."
)
Expand All @@ -77,13 +105,37 @@ def main():
parser.add_argument(
"--key", help="Key from vocab representing the variable to compare."
)
parser.add_argument("--model_path", help="Path for model output.")
parser.add_argument(
"--model_name",
help="Name of catalog for model output, created in a `make_Catalog` command.",
)
parser.add_argument(
"--ndatasets",
type=int,
help="Max number of datasets from input catalog(s) to use.",
)

parser.add_argument(
"--kwargs_open",
nargs="*",
action=ParseKwargs,
help="Input keyword arguments to be passed onto xarray open_mfdataset or pandas read_csv.",
)

parser.add_argument(
"--metadata",
nargs="*",
action=ParseKwargs,
help="Metadata to be passed into catalog.",
)

parser.add_argument(
"--kwargs_map",
nargs="*",
action=ParseKwargs,
help="Input keyword arguments to be passed onto map plot.",
)

args = parser.parse_args()

# Make a catalog.
Expand All @@ -93,9 +145,11 @@ def main():
project_name=args.project_name,
catalog_name=args.catalog_name,
description=args.description,
metadata=args.metadata,
kwargs=args.kwargs,
kwargs_search=args.kwargs_search,
vocab=args.vocab_names,
kwargs_open=args.kwargs_open,
vocab=args.vocab_name,
save_cat=True,
)

Expand All @@ -111,9 +165,10 @@ def main():
elif args.action == "run":
omsa.main.run(
project_name=args.project_name,
catalog_names=args.catalog_names,
catalogs=args.catalog_names,
vocabs=args.vocab_names,
key_variable=args.key,
model_path=args.model_path,
model_name=args.model_name,
ndatasets=args.ndatasets,
kwargs_map=args.kwargs_map,
)
Loading

0 comments on commit 7589bcb

Please sign in to comment.