You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CIRCexplorer2 parse -t STAR Chimeric.out.junction > CIRCexplorer2_Diff0_T18und_parse.log
and I was successuful.
Then, when I tried annotating, I used the following command and got the following mistake:
andregabriel@brutus:~/Desktop/Alignments_André_CIRCexplorer/Diff0_T18und_circRNA$ CIRCexplorer2 annotate -r /home/andregabriel/Desktop/Alignments_André_CIRCexplorer/Homo_sapiens.GRCh38.99.gtf -g /mnt/work1/genodata/STAR-GRCh38.99/Homo_sapiens.GRCh38.dna.primary_assembly.fa -b back_spliced_junction.bed -o circularRNA_Diff0_T18und.txt > CIRCexplorer2_Diff0_T18und_annotate.log
Traceback (most recent call last):
File "/home/andregabriel/.local/bin/CIRCexplorer2", line 8, in <module>
sys.exit(main())
File "/home/andregabriel/.local/lib/python3.10/site-packages/circ2/command_parse.py", line 50, in main
annotate.annotate(docopt(annotate.__doc__, version=__version__),
File "/home/andregabriel/.local/lib/python3.10/site-packages/circ2/helper.py", line 38, in wrapper
fn(*args)
File "/home/andregabriel/.local/lib/python3.10/site-packages/circ2/annotate.py", line 37, in annotate
annotate_fusion(options['--ref'], options['--bed'], fusion_tmp,
File "/home/andregabriel/.local/lib/python3.10/site-packages/circ2/annotate.py", line 50, in annotate_fusion
genes, novel_genes, gene_info, chrom_info = parse_ref(ref_f, 1)
File "/home/andregabriel/.local/lib/python3.10/site-packages/circ2/parser.py", line 76, in parse_ref
gene_id, iso_id, chrom, strand = line.split()[:4]
ValueError: not enough values to unpack (expected 4, got 2)
While exploring the GitHub repository of the initial version of CIRCexplorer, I came across the information that for successful annotation, the GTF file needs to be of the "Gene Predictions and RefSeq Genes with Gene Names" type, essentially a GenePred file (https://genome.ucsc.edu/FAQ/FAQformat.html#format9).
According to the documentation, the expected format for the gene annotation file is as follows:
geneName: Name of the gene
isoformName: Name of the isoform
chrom: Reference sequence
strand: + or - for the strand
txStart: Transcription start position
txEnd: Transcription end position
cdsStart: Coding region start
cdsEnd: Coding region end
exonCount: Number of exons
exonStarts: Exon start positions
exonEnds: Exon end positions (https://circexplorer2.readthedocs.io/en/2.3.0/modules/annotate/).
As such, I used gtfToGenePred to convert the GRCh38/99 GTF file. At this time CIRCexplorer no longer generates errors when using the annotation command. However, it still fails to annotate the obtained junctions, despite not encountering any errors this time.
I attempted to change the approach and tested it with the hg19 genome instead, just to check if it would work. Although I haven't had much success yet, I've outlined the following steps that I used:
So:
hg19 Genome Preparation:
I used STAR to index the genome. STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /mnt/work1/genodata/hg19 --genomeFastaFiles /mnt/work1/genodata/hg19/bwa_index/hg19.bowtie.fa --sjdbGTFfile /mnt/work1/genodata/hg19/IlluminaiGenomes/UCSC_genes.gtf --sjdbOverhang 100 --limitGenomeGenerateRAM 300 --outFileNamePrefix /home/andregabriel/Desktop/Alignments_André_CIRCexplorer_hg19_Mock/D17Und_Diff0/
Alignment:
I used STAR to align one of the samples with the newly indexed genome. STAR --runThreadN 6 --genomeDir /mnt/work1/genodata/hg19/Indexed_hg19_STAR/ --readFilesIn /mnt/work2/andre_phd/circRNA_Jan2023/raw/T17Und_circRNA_1.fastq.gz /mnt/work2/andre_phd/circRNA_Jan2023/raw/T17Und_circRNA_2.fastq.gz --readFilesCommand zcat --chimSegmentMin 10
Parsing:
I utilized the CIRCexplorer2 parse command to extract relevant information from the alignment results, focusing specifically on chimeric junctions. Note that for this sample when I used GRCh38, CIRCexplorer2 parse converted 156628 fusion reads, whereas with hg19, it converted 162404 fusion reads. So, parsing continues to identify fusion reads consistently. It's interesting to see the variation, but I assume this is normal. CIRCexplorer2 parse -t STAR Chimeric.out.junction > CIRCexplorer2_Diff0_T17und_parse.log
gtfToGenePred:
I used the gtfToGenePred command to convert the GTF file into a Gene Prediction file, which I just assume that it should facilitate the annotation of fusion junctions in the next step (5). gtfToGenePred -genePredExt -geneNameAsName2 /mnt/work1/genodata/hg19/IlluminaiGenomes/UCSC_genes.gtf UCSC_geneGTFtoGENEPRED.txt
Annotation:
Finally, I attempted to use the CIRCexplorer2 annotate command to annotate fusion junctions based on the fusion reads (.bed file) obtained during parsing and the gene annotation files (specifically, the hg19 .fa and the output from gtfToGenePred).
Unfortunately, the command didn't produce any annotated fusion junctions again. I wonder if this is due to how the file obtained in the gtfToGenePred step is structured. The file is being read and there are no issues in the CIRCexplorer code, which was a problem previously when we only provided the GTF file. There seems to be an issue with obtaining annotation information. CIRCexplorer2 annotate -r /mnt/work1/genodata/hg19/Indexed_hg19_STAR/UCSC_geneGTFtoGENEPRED.txt -g /mnt/work1/genodata/hg19/bwa_index/hg19.bowtie.fa -b back_spliced_junction.bed -o circularRNA_Diff0_T17und.txt > CIRCexplorer2_Diff0_T17und_annotate.log
I appreciate your insights on how to address this matter effectively. Thank you.
Best regards,
Cheers,
The text was updated successfully, but these errors were encountered:
To obtain the desired annotation file format, which is slightly different from the direct output of "gtfToGenePred," use the following command: paste <(cut -f 12 UCSC_geneGTFtoGENEPRED.txt) <(cut -f 1-10 UCSC_geneGTFtoGENEPRED.txt) > ref_all.txt. This command combines specific columns from the "UCSC_geneGTFtoGENEPRED.txt" file to create the annotation file with the required format "Gene Predictions and RefSeq Genes with Gene Names".
Hello.. I am currently having an issue with the annotation part. I am using Linux to perform CIRCexplorer2 analysis.
After aligning everything with STAR as described by you,
for instance:
STAR --runThreadN 10 --genomeDir /mnt/work1/genodata/STAR-GRCh38.99/ --readFilesIn /mnt/work2/andre_phd/circRNA_Jan2023/raw/T18Und_circRNA_1.fastq.gz /mnt/work2/andre_phd/circRNA_Jan2023/raw/T18Und_circRNA_2.fastq.gz --readFilesCommand zcat --chimSegmentMin 10
Afterwards, I did the parsing step as follows:
CIRCexplorer2 parse -t STAR Chimeric.out.junction > CIRCexplorer2_Diff0_T18und_parse.log
and I was successuful.
Then, when I tried annotating, I used the following command and got the following mistake:
While exploring the GitHub repository of the initial version of CIRCexplorer, I came across the information that for successful annotation, the GTF file needs to be of the "Gene Predictions and RefSeq Genes with Gene Names" type, essentially a GenePred file (https://genome.ucsc.edu/FAQ/FAQformat.html#format9).
According to the documentation, the expected format for the gene annotation file is as follows:
geneName: Name of the gene
isoformName: Name of the isoform
chrom: Reference sequence
strand: + or - for the strand
txStart: Transcription start position
txEnd: Transcription end position
cdsStart: Coding region start
cdsEnd: Coding region end
exonCount: Number of exons
exonStarts: Exon start positions
exonEnds: Exon end positions (https://circexplorer2.readthedocs.io/en/2.3.0/modules/annotate/).
As such, I used gtfToGenePred to convert the GRCh38/99 GTF file. At this time CIRCexplorer no longer generates errors when using the annotation command. However, it still fails to annotate the obtained junctions, despite not encountering any errors this time.
I attempted to change the approach and tested it with the hg19 genome instead, just to check if it would work. Although I haven't had much success yet, I've outlined the following steps that I used:
So:
hg19 Genome Preparation:
I used STAR to index the genome.
STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /mnt/work1/genodata/hg19 --genomeFastaFiles /mnt/work1/genodata/hg19/bwa_index/hg19.bowtie.fa --sjdbGTFfile /mnt/work1/genodata/hg19/IlluminaiGenomes/UCSC_genes.gtf --sjdbOverhang 100 --limitGenomeGenerateRAM 300 --outFileNamePrefix /home/andregabriel/Desktop/Alignments_André_CIRCexplorer_hg19_Mock/D17Und_Diff0/
Alignment:
I used STAR to align one of the samples with the newly indexed genome.
STAR --runThreadN 6 --genomeDir /mnt/work1/genodata/hg19/Indexed_hg19_STAR/ --readFilesIn /mnt/work2/andre_phd/circRNA_Jan2023/raw/T17Und_circRNA_1.fastq.gz /mnt/work2/andre_phd/circRNA_Jan2023/raw/T17Und_circRNA_2.fastq.gz --readFilesCommand zcat --chimSegmentMin 10
Parsing:
I utilized the CIRCexplorer2 parse command to extract relevant information from the alignment results, focusing specifically on chimeric junctions. Note that for this sample when I used GRCh38, CIRCexplorer2 parse converted 156628 fusion reads, whereas with hg19, it converted 162404 fusion reads. So, parsing continues to identify fusion reads consistently. It's interesting to see the variation, but I assume this is normal.
CIRCexplorer2 parse -t STAR Chimeric.out.junction > CIRCexplorer2_Diff0_T17und_parse.log
gtfToGenePred:
I used the gtfToGenePred command to convert the GTF file into a Gene Prediction file, which I just assume that it should facilitate the annotation of fusion junctions in the next step (5).
gtfToGenePred -genePredExt -geneNameAsName2 /mnt/work1/genodata/hg19/IlluminaiGenomes/UCSC_genes.gtf UCSC_geneGTFtoGENEPRED.txt
Annotation:
Finally, I attempted to use the CIRCexplorer2 annotate command to annotate fusion junctions based on the fusion reads (.bed file) obtained during parsing and the gene annotation files (specifically, the hg19 .fa and the output from gtfToGenePred).
Unfortunately, the command didn't produce any annotated fusion junctions again. I wonder if this is due to how the file obtained in the gtfToGenePred step is structured. The file is being read and there are no issues in the CIRCexplorer code, which was a problem previously when we only provided the GTF file. There seems to be an issue with obtaining annotation information.
CIRCexplorer2 annotate -r /mnt/work1/genodata/hg19/Indexed_hg19_STAR/UCSC_geneGTFtoGENEPRED.txt -g /mnt/work1/genodata/hg19/bwa_index/hg19.bowtie.fa -b back_spliced_junction.bed -o circularRNA_Diff0_T17und.txt > CIRCexplorer2_Diff0_T17und_annotate.log
I appreciate your insights on how to address this matter effectively. Thank you.
Best regards,
Cheers,
The text was updated successfully, but these errors were encountered: