You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I wanted to test your compare_predictions_to_phages.py to make sure that it was working, so I used the tsv file containing the reference locations for phages in NC_002655.
I was expecting to get perfect results, since I was using the reference intervals from the Casjens 2003 paper as reported on the PHASTER website statistics page. Instead I got these results:
(base) [u1323098@notch164:scripts]$ python3 compare_predictions_to_phages.py -t /uufs/chpc.utah.edu/common/home/u1323098/sundar-group-space2/PHAGE/BENCHMARKING/Philympics_dataset/Escherichia_coli_O157-H7_EDL933.gb -r reference.tsv --fp --fn -v
Reading reference.tsv
Reading /uufs/chpc.utah.edu/common/home/u1323098/sundar-group-space2/PHAGE/BENCHMARKING/Philympics_dataset/Escherichia_coli_O157-H7_EDL933.gb again to get the phage regions
Getting from 1879335 to 1897622
Getting from 3551577 to 3565707
Getting from 2966382 to 3015014
Getting from 2668339 to 2688870
Getting from 2285976 to 2330172
Getting from 300073 to 310251
Getting from 1897625 to 1908911
Getting from 1702185 to 1725748
Getting from 310756 to 323112
Getting from 1250521 to 1295458
Getting from 1330857 to 1391923
Getting from 1678706 to 1693737
Getting from 1849488 to 1879269
Getting from 1909139 to 1930250
Getting from 892845 to 930943
Getting from 1730065 to 1756006
Getting from 1626722 to 1673485
Getting from 1655548 to 1696145
Getting from 2743223 to 2788348
Getting from 2118738 to 2165694
Getting from 3263064 to 3270404
Getting from 1521574 to 1530771
Found 789 predicted prophage features
Reading /uufs/chpc.utah.edu/common/home/u1323098/sundar-group-space2/PHAGE/BENCHMARKING/Philympics_dataset/Escherichia_coli_O157-H7_EDL933.gb
Comparing real and predicted
Found:
Test set:
Phage: 676 Not phage: 4832
Predictions:
Phage: 789 Not phage: 4709
TP: 641
FP: 158
TN: 4674
FN: 35
Accuracy: 0.965 (this is the ratio of the correctly labeled phage genes to the whole pool of genes
Precision: 0.802 (This is the ratio of correctly labeled phage genes to all predictions)
Recall: 0.948 (This is the fraction of actual phage genes we got right)
Specificity: 0.967 (This is the fraction of non phage genes we got right)
f1_score: 0.869 (this is the harmonic mean of precision and recall, and is the best measure when, as in this case, there is a big difference between the number of phage and non-phage genes)
It seems that there are some differences between the reference intervals listed in your supplementary table and the intervals listed on the PHASTER website.
Do you have a list of where the annotations came from that you are using? Thank you
LeAnn
The text was updated successfully, but these errors were encountered:
Hello, I wanted to test your compare_predictions_to_phages.py to make sure that it was working, so I used the tsv file containing the reference locations for phages in NC_002655.
I was expecting to get perfect results, since I was using the reference intervals from the Casjens 2003 paper as reported on the PHASTER website statistics page. Instead I got these results:
(base) [u1323098@notch164:scripts]$ python3 compare_predictions_to_phages.py -t /uufs/chpc.utah.edu/common/home/u1323098/sundar-group-space2/PHAGE/BENCHMARKING/Philympics_dataset/Escherichia_coli_O157-H7_EDL933.gb -r reference.tsv --fp --fn -v
Reading reference.tsv
Reading /uufs/chpc.utah.edu/common/home/u1323098/sundar-group-space2/PHAGE/BENCHMARKING/Philympics_dataset/Escherichia_coli_O157-H7_EDL933.gb again to get the phage regions
Getting from 1879335 to 1897622
Getting from 3551577 to 3565707
Getting from 2966382 to 3015014
Getting from 2668339 to 2688870
Getting from 2285976 to 2330172
Getting from 300073 to 310251
Getting from 1897625 to 1908911
Getting from 1702185 to 1725748
Getting from 310756 to 323112
Getting from 1250521 to 1295458
Getting from 1330857 to 1391923
Getting from 1678706 to 1693737
Getting from 1849488 to 1879269
Getting from 1909139 to 1930250
Getting from 892845 to 930943
Getting from 1730065 to 1756006
Getting from 1626722 to 1673485
Getting from 1655548 to 1696145
Getting from 2743223 to 2788348
Getting from 2118738 to 2165694
Getting from 3263064 to 3270404
Getting from 1521574 to 1530771
Found 789 predicted prophage features
Reading /uufs/chpc.utah.edu/common/home/u1323098/sundar-group-space2/PHAGE/BENCHMARKING/Philympics_dataset/Escherichia_coli_O157-H7_EDL933.gb
Comparing real and predicted
Found:
Test set:
Phage: 676 Not phage: 4832
Predictions:
Phage: 789 Not phage: 4709
TP: 641
FP: 158
TN: 4674
FN: 35
Accuracy: 0.965 (this is the ratio of the correctly labeled phage genes to the whole pool of genes
Precision: 0.802 (This is the ratio of correctly labeled phage genes to all predictions)
Recall: 0.948 (This is the fraction of actual phage genes we got right)
Specificity: 0.967 (This is the fraction of non phage genes we got right)
f1_score: 0.869 (this is the harmonic mean of precision and recall, and is the best measure when, as in this case, there is a big difference between the number of phage and non-phage genes)
It seems that there are some differences between the reference intervals listed in your supplementary table and the intervals listed on the PHASTER website.
Do you have a list of where the annotations came from that you are using? Thank you
LeAnn
The text was updated successfully, but these errors were encountered: