Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Galaxy tool in container #35

Open
Tracked by #11233
innovate-invent opened this issue Feb 10, 2021 · 8 comments
Open
Tracked by #11233

Include Galaxy tool in container #35

innovate-invent opened this issue Feb 10, 2021 · 8 comments

Comments

@innovate-invent
Copy link
Contributor

innovate-invent commented Feb 10, 2021

This is a proposal to expand this project to include the Galaxy tool folder in the container. This is necessary to achieve full Cloud support for Galaxy.

To optimize the size of the containers, these containers can be based off of the mulled container available in biocontainers. The container will need to include in its specification a standardized environment variable pointing to the equivalent of the $__tool_directory__ variable in the tool wrapper xml. The Galaxy job runner can then be extended to refer to this variable in the job script.

An alternative is to generate storage containers that have no other purpose than to store the Galaxy tool folder. This folder is then reverse mounted outside of the container to be mounted by the mulled tool container. Docker in particular supports mounting containers together directly bypassing the host filesystem.

@mvdbeek
Copy link
Member

mvdbeek commented Feb 11, 2021

I assume you want this because you don't want to have a shared filesystem that contains tool scripts ?
In that case the answer is pulsar, I don't think we want to build specialized containers for Galaxy.

@innovate-invent
Copy link
Contributor Author

Pulsar still requires a NFS as far as I can see.

@mvdbeek
Copy link
Member

mvdbeek commented Feb 11, 2021

Pulsar can stage the tool dir and needs no NFS.

@innovate-invent
Copy link
Contributor Author

innovate-invent commented Feb 11, 2021

I looked into Pulsars implementation of operating in a pod and it seems to be hacky at best. Where does it fetch the tool directory from?

Also, including the tool folder in a container layer allows the node to cache it.

@mvdbeek
Copy link
Member

mvdbeek commented Feb 11, 2021

@innovate-invent
Copy link
Contributor Author

Ah, it fetches it from the Galaxy app. That isn't optimal.

What is the downside of adding a layer to the mulled tool containers to include the tool folder?
The mulled containers basically already only exist to serve that tool.

@mvdbeek
Copy link
Member

mvdbeek commented Feb 11, 2021

So you'd need to generate a container for each tool version, and these containers are used by CWL and nextflow as well. Galaxy can scale serving these datasets pretty well, you'd hit a bottleneck with the database long before serving tool files becomes an issue.

@innovate-invent
Copy link
Contributor Author

I am proposing adding a layer on top of the mulled container. CWL and nextflow can freely pull the mulled container without the Galaxy tool layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants