Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address reproducibility issues #17

Open
skrakau opened this issue Dec 24, 2020 · 2 comments
Open

Address reproducibility issues #17

skrakau opened this issue Dec 24, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@skrakau
Copy link
Collaborator

skrakau commented Dec 24, 2020

No description provided.

@skrakau skrakau added the enhancement New feature or request label Dec 24, 2020
@AntoniaSchuster
Copy link
Collaborator

I'm not sure what is meant here, but I discovered that when I run the test_tiny multiple times, the file entrez_data/entities_proteins.entrez.tsv is not always the same, e.g. I ran it three times and two out of three times it has
1189 lines and one time it has 1154 lines. Can someone explain this?

@skrakau
Copy link
Collaborator Author

skrakau commented Dec 23, 2021

Hi @AntoniaSchuster,
what you described turned out to be a bug. Great that you discovered this :) I created a PR: nf-core/metapep#3

This issue originally meant, for example, that the Entrez download is not reproducible: for instance, new assemblies, sequences, proteins might be uploaded to NCBI which then will be downloaded and processed by the pipeline, causing not reproducible results (this is a general issue with the Entrez download). One could think about avoiding this by saving some intermediate results in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants