Skip to content
Marco Reverenna edited this page Feb 14, 2024 · 3 revisions

Welcome to the Retrieve info MGnifyAPI wiki!

Doubts

This section is dedicated to sharing doubts, questions and considerations about the project.

Question:

  1. Which variables should we consider to filter our data?
  2. Should we consider the "pipeline version"? if so, which one? from 4.1 and exclude 1.0, 2.0 and 3.0? Why?
  3. Should we condire the "minimum number of samples"? How many samples should have one study? Why?
  4. Should we consider a specific "technology" instead of another one? Why? Which technology fir better with tools used in Albert's pipeline?

Feedback:

  1. "pipeline version", "n_samples","technology"
  2. pipelines version 4.1 and 5.0 contain more informations
  3. exclude all the studies which contain only 1 single sample, consider >= 6 samples (still not enough for a good study maybe)
  4. ask to Albert

Marco's considerations

  • Applied filters for studies: minimum of 5 samples and the latest pipeline versions (4.0, 4.1, and 5.0).
  • Considering these filters the total number of studies is 17 (6 metagenomics, 1 metatranscriptomics and 10 assembly) and the pipeline versions considered are just 4.0 and 5.1
  • total number of ERR id is 491
  • total number of ERZ id is 433 (ERZ comes just from assembly):
  1. Is there any way to get FASTQ using this kind of ID? (ERZ --> ERS --> SAMEA --> FASTA)
  2. Using assembly we can get only FASTA files which are non allowed for nf-metagenomics pipeline
  • should we consider the research center and the technology to normalise as much as possible different studies?

Sebastian's considerations