Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job metrics retrieval #2031

Open
LuiggiTenorioK opened this issue Dec 16, 2024 · 2 comments · May be fixed by #2036
Open

Job metrics retrieval #2031

LuiggiTenorioK opened this issue Dec 16, 2024 · 2 comments · May be fixed by #2036
Assignees
Labels
new feature Use this label to plan and request new features

Comments

@LuiggiTenorioK
Copy link
Member

LuiggiTenorioK commented Dec 16, 2024

Purpose: Give the user instant access to user-defined job metrics they want to track when a job finishes in the workflow

Description: As discussed in BSC-ES/autosubmit-api#75, the idea is to give the users a way to retrieve metrics that are calculated during the execution of their workflow. In this line, the API should give the interface for the user and Autosubmit should handle the transfer of the files from remote to local.

Document with some of the metrics: https://docs.google.com/document/d/12yWDwXsohf4G4MPeP6e3Eil4ZL-YeIN71dBcoWRliEg/edit

Requirements

  • Autosubmit should get the output file specification from the YAML files
  • Autosubmit should get the metric depending on the selector that the user defines (directly from a text file or JSON with a key selector)
  • Autosubmit should store the metric from the output file in the DDBB
  • Autosubmit should read and store when the job finishes

Acceptance Criteria

  • Autosubmit correctly identifies the output files, their paths, and how to read the metric value
  • The metric in the DDBB should be the one expected by the user-defined specification
  • The metric should be updated once the job finishes

Related issue: BSC-ES/autosubmit-api#75

@LuiggiTenorioK LuiggiTenorioK self-assigned this Dec 16, 2024
@LuiggiTenorioK LuiggiTenorioK added the new feature Use this label to plan and request new features label Dec 16, 2024
@LuiggiTenorioK
Copy link
Member Author

@mcastril I updated the requirements and acceptance criteria based on what we discussed last Thursday. Feel free to modify it or confirm that we can close the scope of this new feature.

@LuiggiTenorioK
Copy link
Member Author

LuiggiTenorioK commented Dec 16, 2024

Following the possible design, here is an example of the user flow:

The user will have to define their metrics in the JOBS section like this:

JOBS:
  SETUP:
    METRICS:
      - NAME: model
        PATH: /remote_folder/model.txt
  SIM:
    METRICS:
      - NAME: custom_metric_1
        PATH: /remote_folder/metrics.json
        SELECTOR:
          TYPE: JSON   # Default is TEXT
          KEY: METRIC_1
      - NAME: custom_metric_2
        PATH: /remote_folder/metrics.json
        SELECTOR:
          TYPE: JSON
          KEY: MISC.METRIC_2

Then, it will be expected that once the jobs are finished the /remote_folder/ will have 2 files:

model.txt, e.g.:

ICON

metrics.json, e.g.:

{
  "METRIC_1": "HIGH",
  "MISC": {
    "METRIC_2": 640.28
  }
}

Later, the DDBB table should look like this:

job_name metric_name metric_value
<job_name_prefix>_SETUP model ICON
<job_name_prefix>_SIM custom_metric_1 HIGH
<job_name_prefix>_SIM custom_metric_2 640.28

At this point, it might be feasible to add the run_id in the column to versioning the metrics by run, since that information could be available in the retrieval.

@LuiggiTenorioK LuiggiTenorioK linked a pull request Dec 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Use this label to plan and request new features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant