Merge pull request #41 from kthyng/var_updates

Var updates
axiom-data-science · Jan 9, 2023 · 7589bcb · 7589bcb
2 parents 5b8b60f + 2d4ba36
commit 7589bcb
Show file tree

Hide file tree

Showing 18 changed files with 626 additions and 1,555 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -51,7 +51,7 @@ repos:
   rev: v0.982
   hooks:
   - id: mypy
-    additional_dependencies: [types-setuptools]
+    additional_dependencies: [types-setuptools, types-PyYAML]
     exclude: docs/source/conf.py
     args: [--ignore-missing-imports]
 

diff --git a/docs/Demo-AK.ipynb b/docs/Demo-AK.ipynb
diff --git a/docs/Demo-AK.md b/docs/Demo-AK.md
diff --git a/docs/Demo-CA.ipynb b/docs/Demo-CA.ipynb
diff --git a/docs/Demo_workflows.ipynb b/docs/Demo_workflows.ipynb
diff --git a/docs/cli.md b/docs/cli.md
@@ -84,7 +84,7 @@ Make a catalog from datasets available from an ERDDAP server using `intake-erdda
 
 #### Examples
 
-Select a box and time range over which to search catalog:
+Select a spatial box and time range over which to search catalog:
 
 ```{code-cell} ipython3
 !omsa make_catalog --project_name test1 --catalog_type erddap --catalog_name example_erddap_catalog --description "Example ERDDAP catalog description" --kwargs server=https://erddap.sensors.ioos.us/erddap --kwargs_search min_lon=-170 min_lat=53 max_lon=-165 max_lat=56 min_time=2022-1-1 max_time=2022-1-2
@@ -96,6 +96,40 @@ Input model output to use to create the space search range, but choose time sear
 !omsa make_catalog --project_name test1 --catalog_type erddap --catalog_name example_erddap_catalog --description "Example ERDDAP catalog description" --kwargs server=https://erddap.sensors.ioos.us/erddap --kwargs_search model_path=https://thredds.cencoos.org/thredds/dodsC/CENCOOS_CA_ROMS_FCST.nc min_time=2022-1-1 max_time=2022-1-2
 ```
 
+Narrow your search by variable. For `intake-erddap` you can filter by the CF `standard_name` of the variable directly with:
+
+```{code-cell} ipython3
+!omsa make_catalog --project_name test1 --catalog_type erddap --catalog_name cat1 --kwargs server=https://erddap.sensors.ioos.us/erddap standard_names="[sea_surface_temperature,sea_water_temperature,surface_temperature]"
+```
+
+You can return equivalent results in your catalog by searching with a variable nickname (the keys in the dictionary) along with a dictionary defining a vocabulary of regular expressions for matching what "counts" as a particular variable. To save a custom vocabulary to a location for this command, use the `Vocab` class in `cf-pandas` ([docs](https://cf-pandas.readthedocs.io/en/latest/demo_vocab.html#save-to-file)). A premade set of vocabularies aimed at use by ocean modelers is also available to use by name; see them with command `omsa vocabs`. Suggested uses:
+* axds catalog: vocab_name standard_names
+* erddap catalog, IOOS: vocab_name erddap_ioos
+* erddap catalog, Coastwatch: vocab_name erddap_coastwatch
+* local catalog: vocab_name general
+
+This is more complicated than simply defining the desired standard_names as shown in the previous example. However, it becomes useful when using other data files or model output which might have different variable names but could be reocgnized with variable matching through the vocabulary.
+
+The example below uses the pre-defined vocabulary "erddap_ioos" since we are using the IOOS ERDDAP server, and will search for matching variables by standard_name and matching the variable nickname "temp". The "erddap_ioos" vocabulary can be investigated as shown here and contains exactly the same standard_names as in the previous example. The regular expressions are set up to match exactly those standard_names. This is why we return the same results from either approach.
+
+```{code-cell} ipython3
+import ocean_model_skill_assessor as omsa
+import cf_pandas as cfp
+
+vocab = cfp.Vocab(omsa.VOCAB_PATH("erddap_ioos"))
+vocab
+```
+
+```{code-cell} ipython3
+!omsa make_catalog --project_name test1 --catalog_type erddap --catalog_name cat3 --kwargs server=https://erddap.sensors.ioos.us/erddap category_search="[standard_name,temp]" --vocab_name erddap_ioos
+```
+
+You can additionally narrow your search by a text term by adding the `search_for` and `query_type` keyword inputs. This example searches for datasets containing the varaible "sea_surface_temperature" and, somewhere in the dataset metadata, the term "Timeseries". If we had wanted datasets that contain one OR the other, we could use `query_type=union`.
+
+```{code-cell} ipython3
+!omsa make_catalog --project_name test1 --catalog_type erddap --catalog_name cat2 --kwargs server=https://erddap.sensors.ioos.us/erddap standard_names="[sea_surface_temperature]" search_for="[Timeseries]" query_type=intersection
+```
+
 ### Catalog for Axiom assets
 
 Make a catalog of Axiom Data Science-stored assets using `intake-axds`.
@@ -136,25 +170,13 @@ Input model output to use to create the space search range, but choose time sear
 !omsa make_catalog --project_name test1 --catalog_type axds --catalog_name example_axds_catalog --description "Example AXDS catalog description" --kwargs standard_names='[sea_water_practical_salinity,sea_water_temperature]' verbose=True --kwargs_search model_path=https://thredds.cencoos.org/thredds/dodsC/CENCOOS_CA_ROMS_FCST.nc min_time=2022-1-1 max_time=2022-1-2
 ```
 
-Alternatively, filter returned datasets for variables using the variable nicknames along with a vocabulary of regular expressions for matching what "counts" as a variable. To save a custom vocabulary to a location for this command, use the `Vocab` class in `cf-pandas` ([docs](https://cf-pandas.readthedocs.io/en/latest/demo_vocab.html#save-to-file)). A premade set of vocabularies is also available to use by name; see them with command `omsa vocabs`. Suggested uses:
-* axds catalog: vocab_name standard_names
-* erddap catalog, IOOS: vocab_name erddap_ioos
-* erddap catalog, Coastwatch: vocab_name erddap_coastwatch
-* local catalog: vocab_name general
-
-```
-omsa make_catalog --project_name test1 --catalog_type axds --vocab_name standard_names --kwargs keys_to_match="[temp,salt]"
-```
-
-+++
-
 ## Run model-data comparison
 
 Note that if any datasets have timezones attached, they are removed before comparison with the assumption that the model output and data are in the same time zone.
 
 #### Available options
 
-    omsa run --project_name test1 --catalog_names CATALOG_NAME1 CATALOG_NAME2 --vocab_names VOCAB1 VOCAB2 --key KEY --model_path PATH_TO_MODEL_OUTPUT --ndatasets NDATASETS
+    omsa run --project_name test1 --catalogs CATALOG_NAME1 CATALOG_NAME2 --vocab_names VOCAB1 VOCAB2 --key KEY --model_path PATH_TO_MODEL_OUTPUT --ndatasets NDATASETS
 
 * `project_name`: Subdirectory in cache dir to store files associated together.
 * `catalog_names`: Catalog name(s). Datasets will be accessed from catalog entries.

diff --git a/docs/index.rst b/docs/index.rst
@@ -9,11 +9,9 @@ Welcome to ocean-model-skill-assessor's documentation!
 .. toctree::
    :maxdepth: 3
 
+   demo.md
    cli.md
    create_vocabs.md
-   .. Demo-AK.md
-   .. Demo-CA
-   .. Demo_workflows
    api
    GitHub repository <https://github.com/axiom-data-science/ocean-model-skill-assessor>
 

diff --git a/ocean_model_skill_assessor/CLI.py b/ocean_model_skill_assessor/CLI.py
@@ -7,6 +7,24 @@
 import ocean_model_skill_assessor as omsa
 
 
+def is_int(s):
+    """Check if string is actually int."""
+    try:
+        int(s)
+        return True
+    except (ValueError, TypeError):
+        return False
+
+
+def is_float(s):
+    """Check if string is actually float."""
+    try:
+        float(s)
+        return True
+    except (ValueError, TypeError):
+        return False
+
+
 # https://sumit-ghosh.com/articles/parsing-dictionary-key-value-pairs-kwargs-argparse-python/
 class ParseKwargs(argparse.Action):
     """With can user can input dicts on CLI."""
@@ -15,11 +33,17 @@ def __call__(self, parser, namespace, values, option_string=None):
         """With can user can input dicts on CLI."""
         setattr(namespace, self.dest, dict())
         for value in values:
-            key, value = value.split("=")
+            # maxsplit helps in case righthand side of input has = in it, like filenames can have
+            key, value = value.split("=", maxsplit=1)
             # catch list case
             if value.startswith("[") and value.endswith("]"):
                 # if "[" in value and "]" in value:
                 value = value.strip("][").split(",")
+            # change numbers to numbers but with attention to decimals and negative numbers
+            if is_int(value):
+                value = int(value)
+            elif is_float(value):
+                value = float(value)
             getattr(namespace, self.dest)[key] = value
 
 
@@ -57,6 +81,10 @@ def main():
         help="Input keyword arguments for the search specification. Dictionary-style input. More information on options can be found in `omsa.main.make_catalog` docstrings. Format for list items is e.g. standard_names='[sea_water_practical_salinity,sea_water_temperature]'.",
     )
 
+    parser.add_argument(
+        "--vocab_name", help="Vocab file name, must be in the vocab user directory."
+    )
+
     parser.add_argument(
         "--catalog_name", help="Catalog name, with or without suffix of yaml."
     )
@@ -77,13 +105,37 @@ def main():
     parser.add_argument(
         "--key", help="Key from vocab representing the variable to compare."
     )
-    parser.add_argument("--model_path", help="Path for model output.")
+    parser.add_argument(
+        "--model_name",
+        help="Name of catalog for model output, created in a `make_Catalog` command.",
+    )
     parser.add_argument(
         "--ndatasets",
         type=int,
         help="Max number of datasets from input catalog(s) to use.",
     )
 
+    parser.add_argument(
+        "--kwargs_open",
+        nargs="*",
+        action=ParseKwargs,
+        help="Input keyword arguments to be passed onto xarray open_mfdataset or pandas read_csv.",
+    )
+
+    parser.add_argument(
+        "--metadata",
+        nargs="*",
+        action=ParseKwargs,
+        help="Metadata to be passed into catalog.",
+    )
+
+    parser.add_argument(
+        "--kwargs_map",
+        nargs="*",
+        action=ParseKwargs,
+        help="Input keyword arguments to be passed onto map plot.",
+    )
+
     args = parser.parse_args()
 
     # Make a catalog.
@@ -93,9 +145,11 @@ def main():
             project_name=args.project_name,
             catalog_name=args.catalog_name,
             description=args.description,
+            metadata=args.metadata,
             kwargs=args.kwargs,
             kwargs_search=args.kwargs_search,
-            vocab=args.vocab_names,
+            kwargs_open=args.kwargs_open,
+            vocab=args.vocab_name,
             save_cat=True,
         )
 
@@ -111,9 +165,10 @@ def main():
     elif args.action == "run":
         omsa.main.run(
             project_name=args.project_name,
-            catalog_names=args.catalog_names,
+            catalogs=args.catalog_names,
             vocabs=args.vocab_names,
             key_variable=args.key,
-            model_path=args.model_path,
+            model_name=args.model_name,
             ndatasets=args.ndatasets,
+            kwargs_map=args.kwargs_map,
         )