Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for potential evolving/changing results with reproducible builds #191

Open
anthonyfok opened this issue Apr 20, 2022 · 0 comments
Open
Assignees

Comments

@anthonyfok
Copy link
Member

As discussed during our GSC Cloud Tech meeting on Wed 2022-04-20:

  • As our datasets and models evolve, slightly different charts may result over time. Such apparent inconsistencies may perplex our users.
  • Internally, we recently have a case where two supposedly identical scenario calculation runs, done 11 months apart, give two different max PGA values, and it wasn't immediately obvious whether it was due to different OpenQuake versions (v3.11.0 vs v3.11.5) or due to data/model refinement over time.

Suggested remedies include:

  • Maintain multiple major versions of API so that end users can compare the changes over time
  • Record all relevant build information into the database and exported to Elasticsearch and pygeoapi etc. to allow for reproducible builds
  • Proactively/Preemptively tell the end users of changes in Release Notes, especially explain any discrepancies in the results between versions, so that the end users know what's coming (thus not surprised/confused).

It is probably easiest to do, at least initially, from OpenDRR/opendrr-api add_data.sh because it is aggregating all the data source anyway. Some ideas of what to record (not sure if we can get all of these, haha!):

  • Git commit references (release tag, commit hash, git describe) of all the Git repositories (opendrr-api, model-factory, earthquake-scenarios, canada-srm2, etc.) that are used for a certain stack build.
  • Exact versions of Docker images (pygeoapi, and especially python-env where the underlying Debian OS and Python versions may change)
  • dpkg -l (installed Debian packages)
  • pip3 list
  • Versions of Docker and Docker Compose
  • Host OS and version? CPU (model and no. of cores), RAM, etc.
  • OpenQuake version (already listed in CSV file or in logs?)
  • IP address of build machine (?)
  • Stack build date/time and duration
  • (Optionally): Build user and email

See also https://reproducible-builds.org/ and related reproducibility projects for ideas and inspiration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant