forked from fdarthen/taXaminer
-
Notifications
You must be signed in to change notification settings - Fork 1
Configuration parameters
Freya Arthen edited this page Feb 20, 2024
·
4 revisions
-
fasta_path
path to genomic FASTA file -
gff_path
path to GFF file -
output_path
path to output directory -
taxon_id
NCBI taxonomy ID of the query species
-
threads: X/"auto"
number of threads to be used by Bowtie2 and DIAMOND- default: 'auto' → auto-detection of all available cores by DIAMOND (Bowtie2 uses one thread)
- X → X threads are used by DIAMOND and Bowtie2
-
force: True/False
overwrite existing results- will not overwrite sequence similarity search results ('tax_assignment_path'); to overwrite either delete file or use option 'compute_tax_assignment'
-
include_coverage: TRUE/FALSE
explicitly include coverage information in the analysis or not- default: inferred from existence of either of the files at 'pbc_path', 'bam_path' or 'read_paths'
-
bam_path_X
path to BAM file for coverage set X
-
database_path
path to diamond-formatted NCBI NR protein database (set this up according to the instructions in Installation)- default: 'db.dmnd' in the directory that was specified at 'taxaminer.setup -d'
-
compute_tax_assignment: TRUE/FALSE
run sequence similarity search with Diamond- default: inferred from existence of file(s) at 'tax_assignment_path'
-
extract_proteins: TRUE/FALSE
automatic generation of protein FASTA file based on genomic FASTA and GFF; saved to 'proteins_path'- default: inferred from existence of file at 'proteins_path'
-
proteins_path
path to FASTA file containing the protein sequences- will automatically be generated on non-existence (or 'extract_proteins' == TRUE)
- can be either specified by user or default is set
- default: 'output_path/proteins.faa'
-
tax_assignment_path
hit file(s) of sequence similarity search in database- when 'assignment_mode' == 'quick' and only one path is provided, the suffixe '_1' and '_2' are added; to state both files specifically, give as comma-separated list in brackets
- can be either specified by user or default is set
- default: 'output_path/taxonomic_hits.txt' / ['output_path/taxonomic_hits_1.txt', 'output_path/taxonomic_hits_2.txt']
-
target_exclude: TRUE/FALSE
exclude self-hits in similarity search (query taxon is either in- or excluded)- default: TRUE
-
exclusion_rank: <rank>
taxonomic rank at which hits are excluded in taxonomic assignment (based on the query species)- taxa which are in the same <exclusion_rank> as the query species are discarded from taxonomic assignment
- default: 'species'
-
assignment_mode: "exhaustive"/"quick"
mode in which to perform similarity search- "exhaustive" → default mode
- "quick" → speed up of similarity search - genes with origin most likely in query species are identified by doing an inital search in small subset of database, other genes are then forwarded to search in whole database
- default: 'exhaustive'
-
quick_mode_search_rank
taxonomic rank at which to create the subset of the database for inital filtering search- can be either taxonomic rank like phylum or order and is then based on query species or can be NCBI taxon ID
- default: 'kingdom'
-
quick_mode_match_rank
taxonomic rank which taxonomic assignment of genes has to reach to be accepted in first search, i.e. be identified as belonging to the query species- can be either taxonomic rank like phylum or order and is then based on query species or can be NCBI taxon ID
- default: 'order'
-
num_groups_plot: x/"all"
number of distinct taxonomic groups to display in the plots- x → only x labels are displayed; taxonomic assignments are iteratively merged to higher ranks until number is exhausted
- "all" → every taxonomic assignment is displayed
- default: 25
-
merging_labels: <NCBI IDs>/<rank>/<rank>-all
merging of taxonomic assignments can be manually influenced- NCBI IDs → comma-separated list of NCBI taxon IDs; taxonomic assignments are merged at each of these IDs (please make sure the IDs are not within the same lineage)
- <rank> → a taxonomic rank; taxon to merge taxonomic assignments at will be inferred from rank for the query species
- <rank>-all → a taxonomic rank with suffix '-all'; all taxonomic assignments will be generalized to this rank
- default: None
-
include_pseudogenes: TRUE/FALSE
include pseudogenes in the analysis- default: FALSE
-
input_variables
variables to be used for the PCA- comma-separated list of variables, no spaces, whole list put in quotes ('" "')
- default: "c_name,c_num_of_genes,c_len,c_genelenm,c_genelensd,g_len,g_lendev_c,g_abspos,g_terminal,c_cov,c_covsd,g_cov,g_covsd,g_covdev_c,c_pearson_r,g_pearson_r_o,g_pearson_r_c"
- see Additional information for details on options