Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pipeline): check cluster time in yaml config #254

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rikvl
Copy link
Contributor

@rikvl rikvl commented Oct 14, 2023

I ran into a small issue with the cluster configuration that may go under the radar:

I tried to specify a job time of 12 hours in a configuration file in the following way (without quotation marks):

cluster:
    time:      12:00:00

PyYAML accepts this without complaints, but parses the value as a sexagimal (base 60) number, giving the following line in the job script:

#SBATCH --time=43200

Slurm then interprets this bare number as minutes, so it results in a job with a time request of 43200 minutes = 720 hours = 30 days, i.e. 60 times more time than I was expecting. In this particular case Cedar did not accept the job, because the time requested is longer than the maximum allowed (28 days). However, if the raw value is 11:12:00 or less Cedar won't complain, and the erroneous time value will lead to jobs inadvertently requesting much more time than needed and maybe having difficulties getting scheduled.

The solution is to make sure such time values are enclosed in quotation marks.

My goal in this PR was to let the pipeline give an error (it could perhaps be a warning instead) when it encounters time values consisting of just digits and colons. Because the check has to be made on the raw configuration string yaml_doc (not on the yaml_params dictionary that is already parsed by PyYAML) I used some ugly regex matching to do the check. Maybe there's a better way.

Either way I do think it's would be good to add some sort of check for such time values. An alternative would be to try to simply educate everyone to put quotes around the time value. A quick check of the example configuration files in ch_pipeline shows there's one with this mistake:
https://github.com/chime-experiment/ch_pipeline/blob/master/examples/noise_calibration.yaml

@rikvl rikvl requested review from jrs65 and ljgray October 14, 2023 22:45
@jrs65
Copy link
Contributor

jrs65 commented Oct 16, 2023

@rikvl I definitely see the problem here, but I'm not sure if this is the way we should go about solving it. As far as I can tell, this conversion was deemed to be a mistake in the YAML spec and removed from YAML 1.2. Unfortunately PyYAML only supports YAML 1.1 and so will do this conversion.

I'm a little concerned that your check is just going to be picking up on other things, as for instance it will scan the whole document for a time: tag. There are maybe a few better ways to address this.

  • Switch to a YAML 1.2 implementing parser such as ruamel, or poyo
  • Switch a deliberately stricter parser, e.g. StrictYAML.

Just switching to ruamel is probably the easiest. It will also fix the so-called Norway problem too, although that might break some of our older configs.

Anyway, I'm definitely interested to hear your thoughts.

@rikvl
Copy link
Contributor Author

rikvl commented Oct 16, 2023

Right, I should have just created an issue rather than spend my time writing a hacky solution. I wasn't so happy myself with the regex search that scans the entire document for time:

@jrs65 I like your suggestion of switching to a parser that defaults to YAML 1.2 like ruamel, but it could be a bit of an undertaking. Would we want to make that change just in caput or across the entire CHIME codebase? I think PyYAML is also used in ch_pipeline, chimedb, chimedb-dataflag, draco, and driftscan

Depending on the version of ruamel, its API differs a bit from that of PyYAML. Looking at the ruamel documentation's basic usage examples, implementing the latest version would look something like this:

from ruamel.yaml import YAML

yaml = YAML(typ="safe")
yaml.default_flow_style = False

That would hopefully let the .load and .dump methods that we use work without further changes (I didn't try yet). Here and there we also use PyYAML's .safe_load method and the SafeLoader class, which will require further changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants