Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection of tools and databases required for GTN that cannot be synchronized easily between the main servers #5391

Open
paulzierep opened this issue Oct 2, 2024 · 6 comments

Comments

@paulzierep
Copy link
Collaborator

Will try to add to this list step by step. Still need to check versions of the DBs on the servers if they exist.

@paulzierep
Copy link
Collaborator Author

Another issue are tools that are differently configured in TPV. E.g. quast on org cannot access the internet and therefore fails in this tutorial: https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/metagenomics-assembly/tutorial.html

@hexylena
Copy link
Member

hexylena commented Oct 2, 2024

Thanks for filing this @paulzierep ! Really appreciate it. cc @bgruening @natefoo @cat-bro since some databases/etc may be needed on each.

@hexylena
Copy link
Member

hexylena commented Oct 2, 2024

quast on org cannot access the internet

i'm shocked that it needs internet access. That shouldn't be necessary :/ cc @jennaj

@hexylena
Copy link
Member

hexylena commented Oct 2, 2024

I'm looking through a couple of these to see if we can analyse this problem statically but I fear we can't. E.g. the blastn link, the tutorial mentions a database, the workflow does not! instead it uses a connected input parameter that's empty. Same for Kraken in https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html, they're empty input parameters. The test case for taxonomy_profiling_and_visualisation_with_krona-test.yml also doesn't mention the database, but maybe I've missed it? So without parsing the english language text, there's no way to figure that out for that specific case.

nanopore_preprocessing.ga, for the same tutorial, in theory we could. But it would involve

  • for every workflow
    • for every tool, recursively through subworkflows
      • for every parameter
        • for every supported server (~20-30)
          • check against /api/tools/build/{tool_id}?io_details=True to see if that value is available.

In a subset of cases it is technically possible, but I am very afraid of false positives/negatives there, which means maybe we would have to restrict it to tools we know use databases, but i'm still not confident I'll say, since it requires such deep parsing of Galaxy's datastructures and API responses.

Especially since there's no flag or signal (as far as I can tell) in the API responses that a specific parameter is a "database select" parameter that might vary between servers. If that was exposed, if we had a convenient way to know which parameters are "database selects", this problem would look a lot more tractable (albeit still with the cases of "workflow doesn't match tutorial and doesn't pre-select a DB")

@hexylena
Copy link
Member

hexylena commented Oct 2, 2024

phyloseq IT (only in EU)

this should already be tested. we test for tools used in the tutorial / workflow. If you notice any bugs here please let me know! :) See the tools key in https://training.galaxyproject.org/training-material/api/topics/microbiome/tutorials/dada-16S//tutorial.json where phyloseq is mentioned. That's the data that goes into compatibility checking.

@natefoo
Copy link
Member

natefoo commented Oct 3, 2024

I installed the phyloseq IT on .org but don't have data to test it, can someone do that please?

humann databases are updated as per the linked issue.

kraken2, blastn, and pathogen detection DBs, any specific details about what is needed there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants