Skip to content

Adaptation

Jack McNelis edited this page Jul 24, 2021 · 7 revisions

Adaptation

Help for future data engineers that may stumble across this repo.

Disclaimer

This repo was hastily developed/tested/pushed to GH in late 2019 to meet an urgent need. Its code is difficult to follow and poorly-documented, and contains hard-coded elements that won't be familiar outside the context of a specific project. For instance, the logic to select/join trajectories from "multi-leg" flights (a requirement that only applies to ACT-America) is crammed inside the main function of the __main__.py script. Adapting the code might be more trouble than it's worth, depending on your requirements.

This page points to and describes the useful logic+data from the code and ancillary files so future users can avoid digging for it themselves.

Contents

At first glance, the repo is fairly well-organized:

  • ./ornldaac_icartt_to_netcdf/ contains all the code, Python 3
    • _utils.py: a script containing helper data/routines which are exposed globally during import by __init__.py
    • __init__.py: the standard init script allowing the folder to be recognized/imported/run like a Python module
    • __main__.py: the module-level routine, which implements the ICARTT to netCDF4 conversion for ACT-America; it executes when the whole dir is run as a module, i.e. python -m ornldaac_icartt_to_netcdf [args]
  • ./references/ contains metadata in CSV/JSON format that are used for look-ups driving the conversion logic, attributes in output netCDFs, ???. Refer to the README.md for now; it gives decent descriptions of the key files in this directory and how to use them.
  • The inputs and outputs directories contain semi-standardized inputs and outputs for the routines. I'd ignore them because the inputs and outputs to this routine are specific to your use case. (They may be useful references as you modify the existing code to your needs, though.)

Code and pseudo-code

These steps describe the ordered logic that evaluates when ornldaac_icartt_to_netcdf is run as described in the README.md.

The majority of the workflow is contained within the main script (__main__.py). Data/metadata from each step of the workflow are written to CSV/JSON files so that they can be modified in place and their changes incorporated in the next set of outputs generated for the corresponding set of input files.

the main function

  • accepts one argument: a dictionary of input data (that are deserialized from a YAML config passed to the script/module at runtime)
  • calls two routines: 1) , 2) to write output netCDF files
    • routine 1, a function write_resource_files, crawls the configured directory for ICARTT files, parses their headers, and writes reference metadata to JSON files in the ??? directory (and which are later written to output netCDFs as global attributes)
    • routine 2, a function write_netcdfs identifies all reference metadata files
Clone this wiki locally