reorientateCircGenomes is an R package that allows the processing of gff (genomic feature format) file types that have been obtained from NCBI or by Prokka, as well as the reorientation of these and fna (nucleic acid fasta) files based on proteinID or base pair position. With this package, gff and fna files can be used to generate a visualization of the circular genome including GC skew and indication of selected gene locations.
# Install reorientateCircGenomes from GitHub:
install.packages("devtools")
devtools::install_github("SonjaElena/reorientateCircGenomes")
Process a gff file provided by NCBI into a data.frame.
processNCBIgff(path)
path
Path to the unprocessed gff file generated downloaded from NCBI.
Return:
The processed gff file as data.frame is returned including an additional column containing the alternative end, used for function circGenomePlot
. The alternative end eliminates overlaps between genes.
Examples:
gff_unprocessed <- path_to_gff
gff <- processNCBIgff(gff_unprocessed)
Process a gff file provided by Prokka into a data.frame.
processProkkagff(gff_object)
gff_object
Path to the unprocessed gff file generated by Prokka.
Return:
The processed gff file as data.frame is returned including an additional column containing the alternative end, used for function circGenomePlot
. The alternative end eliminates overlaps between genes.
Examples:
gff_unprocessed <- path_to_gff
gff <- processProkkagff(gff_unprocessed)
Reorientation of the start position of a gff file type based on ProteinID or base pair position. New start and end locations are added in two additional columns called 'Ostart' and 'Oend'.
reorientgff(gff_object, proteinID = NA, bplocation = bp_location, replicon = NA, Rep_size = fasta)
gff_object
A gff file, processed with functionsprocessProkkagff
orprocessNCBIgff
.proteinID
Supplies the start position at which the file should be reoriented; Defaults to NA. If the ProteinID is not found on the biggest replicon, the replicon must be supplied as well.bplocation
Supplies the start position at which the file should be reoriented in base pairs and needs to be identical with the start position of one of the proteins.replicon
Replicon to be reoriented. This option defaults to the largest replicon.Rep_size
Indicats the size of the Replicon to be used in base pairs. Alternatively, a genomic fasta sequence in fna format can be supplied.
Return: The reoriented gff file with three additional columns called 'Ostart' and 'Oend' containing the adjusted start and end base pair locations as well as a column called 'Oaltend' containing the reoriented end position based on the alternative end column, in case the function 'processNCBIgff' had been used beforehand.
Examples:
gff <- reorientgff(gff, "WP_012176686.1")
gff <- reorientgff(gff, bplocation = 1866, replicon = "CP000031.2")
gff <- reorientgff(gff, proteinID = "AAV97145.1", replicon = "CP000032.1")
# when no reorientation is required but the file should be used for the circular plot afterwards
gff <- reorientgff(gff, bplocation = 0, Rep_size = fna_path)
Adjust the start position of a fna file downloaded from NCBI based on base pair location of the proteinID. Reorientation based on proteinID requires the supply of a gff file.
reorientfna(fasta_object, replicon = NA, bplocation = NA, proteinID = NA, gff = NA)
-
fasta_object
DNAStringSet of nucleotide sequences in fasta formate. -
replicon
Replicon to be reoriented. This option defaults to the largest replicon. -
bplocation
Location in base pairs that should be used as new start position. -
proteinID
ProteinID indicating the protein based on which the file should be reoriented. Must be accompanied by a processed gff file and either be located on the largest replicon or also be accompanied by an indication of the replicon to be used. -
gff
A processed gff file (e.g. using functions processProkkagff or processNCBIgff). Must be supplied when option ProteinID is selected.
Return: The reoriented DNA string set.
Examples:
fasta <- reorientfna(fasta_object = dna_list, proteinID = "AAV93333.1", gff = gff3)
Generates a genomic plot indicating locations of regulators and showing the GC skew, based on gff file. Only one replicon should be supplied.
circGenomePlot(fasta_object, gff = gff, proteinID = proteinID, reorigff = FALSE)
-
fasta_object
DNAStringSet of nucleotide sequences in fasta formate. -
gff
Processed gff file, has to be output of either functionsprocessProkkagff
orprocessNCBIgff
, since a column with alternative end is generated that is used in this function. This file can have been reoriented afterwards with functionreorientgff
. -
proteinID
Vector of ProteinIDs to be indicated in the plot. -
reorientgff
If TRUE uses columns with names 'Ostart' and 'Oend' to obtain the base pair location that were generated using the reorientation functions above. Defaults to FALSE and uses columns with names 'start' and 'end'.
Return: List containing the circular plot and data.frame of regulator location. The genome plot consists of four rings. The outer ring shows the position of the provided genes (black) and the location of the first gene (red). The third and second ring each show the genes located on the plus and minus strand. The inner ring shows the GC skew. Whereby locations with negative and positive GC skew values are color coded with light or dark gray, respectively. A sliding window of 10,000 bp is used for the GC skew.
Examples:
vect <- c("AAV93333.1", "AAV93335.1")
plot <- circGenomePlot(fasta, gff3, vect, reorigff = TRUE)