Configuration parameters

Minimal required information

fasta_path path to genomic FASTA file
gff_path path to GFF file
output_path path to output directory
taxon_id NCBI taxonomy ID of the query species

General options

threads: X/"auto" number of threads to be used by Bowtie2 and DIAMOND
- default: 'auto' → auto-detection of all available cores by DIAMOND (Bowtie2 uses one thread)
- X → X threads are used by DIAMOND and Bowtie2
force: True/False overwrite existing results
- will not overwrite sequence similarity search results ('tax_assignment_path'); to overwrite either delete file or use option 'compute_tax_assignment'

Coverage options

include_coverage: TRUE/FALSE explicitly include coverage information in the analysis or not
- default: inferred from existence of either of the files at 'pbc_path', 'bam_path' or 'read_paths'
bam_path_X path to BAM file for coverage set X

Taxonomic assignment options

database_path path to diamond-formatted NCBI NR protein database (set this up according to the instructions in Installation)
- default: 'db.dmnd' in the directory that was specified at 'taxaminer.setup -d'
compute_tax_assignment: TRUE/FALSE run sequence similarity search with Diamond
- default: inferred from existence of file(s) at 'tax_assignment_path'
extract_proteins: TRUE/FALSE automatic generation of protein FASTA file based on genomic FASTA and GFF; saved to 'proteins_path'
- default: inferred from existence of file at 'proteins_path'
proteins_path path to FASTA file containing the protein sequences
- will automatically be generated on non-existence (or 'extract_proteins' == TRUE)
- can be either specified by user or default is set
- default: 'output_path/proteins.faa'
tax_assignment_path hit file(s) of sequence similarity search in database
- when 'assignment_mode' == 'quick' and only one path is provided, the suffixe '_1' and '_2' are added; to state both files specifically, give as comma-separated list in brackets
- can be either specified by user or default is set
- default: 'output_path/taxonomic_hits.txt' / ['output_path/taxonomic_hits_1.txt', 'output_path/taxonomic_hits_2.txt']
target_exclude: TRUE/FALSE exclude self-hits in similarity search (query taxon is either in- or excluded)
- default: TRUE
exclusion_rank: <rank> taxonomic rank at which hits are excluded in taxonomic assignment (based on the query species)
- taxa which are in the same <exclusion_rank> as the query species are discarded from taxonomic assignment
- default: 'species'
assignment_mode: "exhaustive"/"quick" mode in which to perform similarity search
- "exhaustive" → default mode
- "quick" → speed up of similarity search - genes with origin most likely in query species are identified by doing an inital search in small subset of database, other genes are then forwarded to search in whole database
- default: 'exhaustive'
quick_mode_search_rank taxonomic rank at which to create the subset of the database for inital filtering search
- can be either taxonomic rank like phylum or order and is then based on query species or can be NCBI taxon ID
- default: 'kingdom'
quick_mode_match_rank taxonomic rank which taxonomic assignment of genes has to reach to be accepted in first search, i.e. be identified as belonging to the query species
- can be either taxonomic rank like phylum or order and is then based on query species or can be NCBI taxon ID
- default: 'order'

Plot output options

num_groups_plot: x/"all" number of distinct taxonomic groups to display in the plots
- x → only x labels are displayed; taxonomic assignments are iteratively merged to higher ranks until number is exhausted
- "all" → every taxonomic assignment is displayed
- default: 25
merging_labels: <NCBI IDs>/<rank>/<rank>-all merging of taxonomic assignments can be manually influenced
- NCBI IDs → comma-separated list of NCBI taxon IDs; taxonomic assignments are merged at each of these IDs (please make sure the IDs are not within the same lineage)
- <rank> → a taxonomic rank; taxon to merge taxonomic assignments at will be inferred from rank for the query species
- <rank>-all → a taxonomic rank with suffix '-all'; all taxonomic assignments will be generalized to this rank
- default: None

Gene info options

include_pseudogenes: TRUE/FALSE include pseudogenes in the analysis
- default: FALSE

PCA options

input_variables variables to be used for the PCA
- comma-separated list of variables, no spaces, whole list put in quotes ('" "')
- default: "c_name,c_num_of_genes,c_len,c_genelenm,c_genelensd,g_len,g_lendev_c,g_abspos,g_terminal,c_cov,c_covsd,g_cov,g_covsd,g_covdev_c,c_pearson_r,g_pearson_r_o,g_pearson_r_c"
- see Additional information for details on options

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration parameters

Minimal required information

General options

Coverage options

Taxonomic assignment options

Plot output options

Gene info options

PCA options

Clone this wiki locally