Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API interactivity #55

Open
LuiggiTenorioK opened this issue Jan 4, 2024 · 19 comments
Open

API interactivity #55

LuiggiTenorioK opened this issue Jan 4, 2024 · 19 comments
Assignees

Comments

@LuiggiTenorioK
Copy link
Member

Summarizing the discussion about the goal of this year, we are aiming to add more interactive endpoints to the API that will allow users to trigger actions that can modify the state of the experiments. To accomplish this there are some issues we have to solve:

Define the scope

We need to list requirements to better structure the changes we want to make. In this particular case, we can list the actions (run experiment, update description, change status, etc) we desire to include inside the API. Then, we can make a formal endpoint definition in OpenAPI with the route, expected request, and response.

Also, this will help us to link the effort we have to do in other tasks (DDBB sync, security, communication with Autosubmit, ...)

[UPDATE] Mapped actions until now:

  • Start Experiment POST /v4/experiments/<expid>?action=run -> start
  • Stop Experiment POST /v4/experiments/<expid>?action=stop -> stop
  • Set Status to Job PATCH /v4/experiments/<expid>/jobs/<jobid>?status=<newstatus> -> setstatus
  • Create experiment POST /v4/experiments -> expid
  • Generate the experiment POST /v4/experiments/<expid>?action=generate -> create
  • Restart the experiment POST /v4/experiments/<expid>?action=restart -> recovery

Set some infrastructure cases

There are multiple scenarios in which the API will be installed like in ES, Climate DT, and EDITO. Is important that we formally define those to better understand the bounds (security, network, dependencies) we are going to have for each one.

Define the action procedures

As discussed, there are some options to process the actions we want to include inside the API:

  • Modify the artifacts (DDBB and files) directly: This option is great for actions that update the metadata but will require a better understanding of the data sources as discussed in DDBB sync improvements #53 to unify definitions and responsibilities. Might be hard to ensure retro compatibility as different versions of Autosubmit might use different data sources.
  • Call the Autosubmit commands: This can be used for actions that need to execute extensive procedures through a subprocess. Also, it will not add a dependency directly to the API. This allows the possibility to run the commands even if the API and Autosubmit are in different nodes but we will need to figure out how to ensure a safe communication between them. There is some discussion about it in autosubmitreact#21.
  • Use Autosubmit as a Python package: In this case, Autosubmit can be called directly through the code and can be more easily handled. Still, it adds a hard requirement that Autosubmit should be a dependency of the API. In both, the previous and this option, retrocompatibility is delegated to Autosubmit.

@mcastril @kinow

@LuiggiTenorioK
Copy link
Member Author

changed due date to June 30, 2024

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @mcastril on Jan 5, 2024, 19:42

Thank you for the documentation Luiggi. Regarding the infrastructure cases and action procedures, we have to keep the portability and interoperability of Autosubmit and its API. Anyway, ES environment, EDITO Infra or Climate DT one are different enough to provide some general enough specifications for a system that must be compliant with the three environments.

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @kinow on Jan 18, 2024, 10:23

From today's meeting:

  • Check with Surf GUI developers what are the requirements for their model builder GUI
  • Based on those requirements, we can try to craft the minimal viable product & reqs for AS API Interactivity for EDITO
  • After that we can revisit this issue and think if we need to improve/how/when/etc
  • We might use this to also check in the docs or with the Surf GUI devs if there are any other reqs for Autosubmit GUI

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @mcastril on Jan 18, 2024, 17:17

I agree with the plan for EDITO. In a broad way, these are the summarized requirements.

  • Core actions to trigger with the endpoints
    • run, setstatus, stop (there is an issue in Autosubmit about the implementation of a new command)
  • Other secondary actions
    • create, expid, recovery

setstatus can be triggered with a file modification: https://autosubmit.readthedocs.io/en/master/userguide/manage/index.html#how-to-change-the-job-status-without-stopping-autosubmit

For the stop and the recovery, we could use the same approach if we implement the same behavior in Autosubmit (by using files).

For run, create, or expid it's more tricky as Autosubmit is not running to look for and consume the file.

One alternative is to deploy a daemon looking for these files and spawning an Autosubmit process but we could end up in the same issue that the API has to start a process under the user's identity.

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @kinow on Jan 22, 2024, 10:37

mentioned in issue autosubmitreact#90

@LuiggiTenorioK
Copy link
Member Author

Going back to this. I opened an issue (#58) with a design that could handle the run and stop operations without deploying a daemon by having a higher-level API that maps the nodes executing those processes.

But, in the design, I assume that the current API will call the Autosubmit command autosubmit run using its latest version and opening an independent process. This is something that wasn't done before as Autosubmit wasn't necessarily installed in the same node of the API and they were connected just by the file system.

@kinow @dbeltrankyl I wanted to ask if this is a feasible strategy or if I'm missing a potential issue by calling Autosubmit CLI commands from the API environment.

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @kinow on Jan 23, 2024, 17:47

I am not sure if that would work well. There are potential issues with the Autosubmit version. e.g. we changed the pickle or configuration parsing, and know that an experiment needs adjusting before it can be used with the latest version. Now we need to know what is the version of Autosubmit to use to launch it. I think we will really need a few sessions on the whiteboard to discuss possible scenarios, like user deleted/left the company, experiment was archived (maybe we still want to show in the UI and unarchive?), how/if it will handle restarting experiments of others, etc.

After using the whiteboard it should be more clear (at least for me) what are the limitations, and how this should work.

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @mcastril on Jan 31, 2024, 18:56

You are right that there are many aspects to consider.

Maybe we can separate the problem in two parts: the "interactive" endpoints and synchronizing remote environments. The second is interesting for many reasons, not only this one but also our medium-term goal to synchronize workflows running in independent environments and set dependencies between their tasks.

The daemon issue for me is independent of the higher-level API. That was a way to allow interaction with AS by just writing files. If the API can call Autosubmit commands then the daemon is not needed anyway, but this is independent from the synchronization IMO.

Regarding the Autosubmit version, at least we store this value in the DDBB and in the config, and Autosubmit alerts the user when they intend to run an experiment with a different version. We can port the feature to the GUI/API and then directly run the experiment with -v in case the user approves the version change.

@LuiggiTenorioK
Copy link
Member Author

Right! There are different problems to solve. Adding the interactive endpoints is a must for sure. On the other hand, we have to find a way to handle experiments of different versions in different environments.

For the different versions issue, I think Miguel is right that we can use the -v flag to solve it.

For the different environment issues, there are the daemon synchronization and the higher-level API solutions. IMO the synchronization solution is way more complex, especially considering the different versions of the experiments. Then, having a higher-level API will work better for EDITO as it will only need one service from where the requests will be made (SURF).

This higher-level API solution is inspired by another project used in the most popular workflow manager in bioinformatics (https://galaxyproject.org/) which uses a similar lower-level API called Pulsar to solve the same problem we have.

@LuiggiTenorioK
Copy link
Member Author

Our problem is more or less stated here: https://pulsar.readthedocs.io/en/latest/containers.html

(I remember stating similar issues in my Master's thesis)

@LuiggiTenorioK
Copy link
Member Author

From today's meeting:

20240201_150049

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @kinow on Feb 2, 2024, 10:38

Thank you for attaching it here, @LuiggiTenorioK !

@mcastril , should one of us get in touch with EDITO/SURF to schedule a meeting to discuss about this? If so, maybe it would be Quentin/Renaud and Francesco from CMCC via email explaining what we want to discuss and asking for best time/day for the meeting? Thank you

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @mcastril on Feb 7, 2024, 12:33

Yes Bruno, thanks for volunteering. Please refer to Renaud, Quentin and Francesco together, with us in copy.

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @kinow on Feb 13, 2024, 11:57

Meeting doodle poll sent.

Summarizing what we discussed in the meeting.

  • Are we going to have a single API instance, shared by all the EDITO-Infra users, or are we going to have one API per user? Or both?
  • If we have instances/containers for the API shared by users, which user could we use to run the API? Would it have access to S3 and to an SSH key to connect to HPC or other EDITO-Infra instances?
  • How are the API and GUI instances going to be started? By a user action, or Kubernetes will keep a minimum number of pod(s) running?
  • We need to define how/if we will use a shared file-system for Autosubmit experiments. For the demo we used S3, but that's not really an option for Autosubmit (we use NFS at ClimateDT and BSC). Maybe we could allocate a permanent volume to be bound to each Autosubmit container (with enough storage for the experiments? GB's, TB's?)
  • Do we have a list of requirements, or how SURF GUI will interact with the API? This would be useful to validate the endpoints we will have to implement.

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @kinow on Feb 29, 2024, 11:26

Do we have a list of requirements, or how SURF GUI will interact with the API? This would be useful to validate the endpoints we will have to implement.

Use the API to request the status of the experiment that is running. We can build a list of experiments the user is running, and show how long the experiment is taking, resources used...

When the user defines what they want to submit, at that point they submit the list of tasks and the run of the job starts.

Users must be able to restart from a certain point in the workflow (setstatus).

Q: Can we get the list of N last experiments?

Yes, but the deployment option could define how it works.

Possible endpoints required:

  • list experiments per user
    • get experiment details (if not returned in the previous call)
  • create experiments (from scratch or from a template)
  • launch an experiment
  • restart an experiment
  • stop experiment

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @kinow on Feb 29, 2024, 11:52

Are we going to have a single API instance, shared by all the EDITO-Infra users, or are we going to have one API per user? Or both?

Both ways are possible, but it's on the business side to choose.

  • Surf GUI
  • Surf API (?)
  • Autosubmit API - python process
  • Autosubmit GUI - HTML httpd
  • Autosubmit - python process, reading a shared DB

We need to choose between Process and Service (in EDITO Infra).

In a Service you can launch multiple tools (GUI + AS API + Surf API + etc).

Not tested, but it should be possible to have dependencies between Services.

In Datalab we have "Projects". We could create the project "Edito ModelLab". You can create an instance with members of the project. You can also share the URL of the project with members outside the project.

If we have instances/containers for the API shared by users, which user could we use to run the API? Would it have access to S3 and to an SSH key to connect to HPC or other EDITO-Infra instances?

This needs to be tested to confirm. It should be possible to share the instance so others can manage it too.

How are the API and GUI instances going to be started? By a user action, or Kubernetes will keep a minimum number of pod(s) running?

In the project we can have services that are always available. At the moment +2 weeks old services are killed, but this may change in the future.

Q: Who maintains the infra (if a service goes down?)

Suggestion: use replication (pods/etc) to have more resources, configure the helm/etc to have higher availability.

Q: Surf GUI can use the API

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @kinow on Feb 29, 2024, 12:00

Q: Can I deploy to another service/catalogue/env?

At the moment merge requests go to production. Staging is for EDITO-Infra. EDITO-Infra team is working to give others access to the playground catalogue/env.

N.B.: BSC team to use the playground. Then later we ask it to be moved to the modellab/ai/etc catalogue.

We need to define how/if we will use a shared file-system for Autosubmit experiments. For the demo we used S3, but that's not really an option for Autosubmit (we use NFS at ClimateDT and BSC). Maybe we could allocate a permanent volume to be bound to each Autosubmit container (with enough storage for the experiments? GB's, TB's?)

At the moment this is not doable. But that should be possible under the common modellab project. So all users under that project can target that volume there. We can test that after the modellab is created.

Q: are we going to give access to external users, to use this database (on the shared docker volume)?

...

@LuiggiTenorioK
Copy link
Member Author

In GitLab by @kinow on Feb 29, 2024, 12:04

Action: we need to define who are the users of the model lab too. At the moment anyone can request an account to EDITO. The Catalogue Service view is available to unauthenticated users.

@LuiggiTenorioK
Copy link
Member Author

changed due date to December 31, 2024

@LuiggiTenorioK LuiggiTenorioK self-assigned this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant