Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-a option #41

Open
bshim181 opened this issue Nov 25, 2024 · 12 comments
Open

-a option #41

bshim181 opened this issue Nov 25, 2024 · 12 comments

Comments

@bshim181
Copy link

Hello,

I have a custom stringtie derived GTFs that I am trying to predict ORFs from.

Screenshot 2024-11-25 at 11 11 09 AM

I am also inputting the primary assembly gene annotations from gencode with option -a

ribotish predict -b file1,2,3 -t file4,5,6 -a gencode.gtf -g custom.gtf -f $reference_assembly -o output.txt -v --seq --inframecount --altcodons ATG,CTG,GTG,TTG,ACG --minaalen 7

do you have an idea of why I might still encounter this error?

@zhpn1024
Copy link
Owner

What about the quality figure?

@bshim181
Copy link
Author

bshim181 commented Dec 3, 2024

I tried running ribotish quality with the same custom gtf and it returned 0 counted reads.

for file in "input_dir"/*; do
file_name=$(basename "$file" .fastq)
STAR --runThreadN 16
--genomeDir custom_gtf_index
--readFilesIn ${file}
--outFilterMismatchNmax 2
--outFilterMultimapNmax 10
--outFilterMismatchNoverLmax 0.04
--alignIntronMin 20
--alignIntronMax 100000
--quantMode TranscriptomeSAM GeneCounts
--alignSJDBoverhangMin 1
--outSAMattributes All
--outSAMtype BAM SortedByCoordinate
--sjdbGTFfile custom_gtf
--outFileNamePrefix out_file
done

I generated the bam file with star based on the custom gtf. do I have to align to the transcriptome?

@zhpn1024
Copy link
Owner

zhpn1024 commented Dec 4, 2024

There's no need to align to the transcriptome. Genome alignment is required.
ribotish quality need known protein coding annotation input (gencode.gtf).

@bshim181
Copy link
Author

bshim181 commented Dec 4, 2024

Screenshot 2024-12-04 at 9 38 55 AM

quality figures look good with the known protein coding annotation input.
I am wondering why the error still persists in predict step.

@zhpn1024
Copy link
Owner

zhpn1024 commented Dec 4, 2024

The quality step generates/updates .para.py files, which were used in predict step. Try run predict again.

@bshim181
Copy link
Author

bshim181 commented Dec 4, 2024

Screenshot 2024-12-04 at 10 13 02 AM Screenshot 2024-12-04 at 10 14 39 AM

based on the message in the beginning, it seems like the para files generated from ribotish quality was read in as an input. It still throws this error, however.

@zhpn1024
Copy link
Owner

zhpn1024 commented Dec 4, 2024

Were the para files of all TI-Seq bams updated? It seems that you are using multiple bam files.

@bshim181
Copy link
Author

bshim181 commented Dec 4, 2024

I am currently running singular bam files just as a test. Still getting that error.

ribotish predict -b ${out_dir}/STAR_output_TE_Onlys/MC38VehCHX_S1_R1_001_ATCGTAligned.sortedByCoord.out.bam -t ${out_dir}/STAR_output_TE_Onlys/MC38vehLTM_S3_R1_001_ATCGTAligned.sortedByCoord.out.bam -f $reference_assembly -g custom_gtf -o ${out_dir}/ribotish/Test_TE_Transcripts.txt -v --aaseq --inframecount --altcodons ATG,CTG,GTG,TTG,ACG --minaalen 7

@zhpn1024
Copy link
Owner

zhpn1024 commented Dec 4, 2024

There's no -a gtf input.

@bshim181
Copy link
Author

bshim181 commented Dec 4, 2024

I was modifying some command line parameters, but it returns the same error with parameter -a with the known protein coding annotation.

-g custom_GTF -a gencode_GTF: This fails

-g gencode_GTF -a custom_GTF: This succeeds.

does this imply that the number of reads mapping to features in the custom_GTF is too small?

Also, the run successfully finishes if I run with riboseq data only with the --longest parameter. Why might this be the case?

@zhpn1024
Copy link
Owner

zhpn1024 commented Dec 5, 2024

The -g is the genes to predict translation, while the -a genes are known coding genes only for TIS background estimation.

@zhpn1024
Copy link
Owner

zhpn1024 commented Dec 5, 2024

I find the problem. It is a bug. If you used numProc > 1, the -a gtf would actually not be used. The bug is fixed in the latest commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants