Do some benchmarks #5

davmlaw · 2022-01-19T23:13:25Z

Generate 100/1k/10k random HGVS from ClinVar
Install SeqRepo locally
Benchmark how long it takes with JSON and REST client - check out caching/non caching etc

davmlaw · 2022-02-01T04:11:08Z

We should make a proper script to generate from clinvar etc

We should also make an option to use unique transcripts. Otherwise we're pulling from local cache and it's a bit unfair

UTA is missing around ~80% of the transcripts, doesn't affect results as we only count good ones, but makes it take 5x as long to run to generate results we can use. Perhaps we should restrict to those to make the benchmarks quicker for them.

davmlaw · 2022-02-01T04:34:35Z

Here's benchmarks on existing HGVS (with dupe transcripts) - I think using median is fairest as the mean includes local cache.

This is not a totally fair benchmark as cdot.cc is in Australia (1000 miles away), while UTA is in the USA

On 500 random ClinVar HGVS entries:

cdot REST - median of 0.1s/HGVS (resolved 100% - 500/500)
UTA - median of 1.84s/HGVS (resolved 17% - missing data on 415/500 transcripts)

Initial results are 17x faster and resolved 5.8x more transcripts

dlawrence@dlawrence-XPS-15-9560:~/localwork/cdot$ ./tests/benchmark_hgvs.py tests/test_data/clinvar_hgvs_100.tsv --rest
Using 100 test records
Total: 100, correct: 100, incorrect: 0, no data: 0
0
count 100.000000
mean 0.114272
std 0.061827
min 0.001432
25% 0.107378
50% 0.111659
75% 0.124938
max 0.410409

dlawrence@dlawrence-XPS-15-9560:~/localwork/cdot$ ./tests/benchmark_hgvs.py tests/test_data/clinvar_hgvs_100.tsv --uta
Using 100 test records
Total: 100, correct: 22, incorrect: 0, no data: 78
0
count 22.000000
mean 1.593698
std 0.781262
min 0.000808
25% 1.842216
50% 1.848696
75% 1.950901
max 2.468861

dlawrence@dlawrence-XPS-15-9560:~/localwork/cdot$ ./tests/benchmark_hgvs.py tests/test_data/clinvar_hgvs_500.tsv --rest
Using 500 test records
Total: 500, correct: 500, incorrect: 0, no data: 0
0
count 500.000000
mean 0.098256
std 0.097638
min 0.000865
25% 0.097436
50% 0.108600
75% 0.115424
max 1.316386

dlawrence@dlawrence-XPS-15-9560:~/localwork/cdot$ ./tests/benchmark_hgvs.py tests/test_data/clinvar_hgvs_500.tsv --uta
Using 500 test records
Total: 500, correct: 85, incorrect: 0, no data: 415
0
count 85.000000
mean 1.286009
std 0.944009
min 0.000821
25% 0.002182
50% 1.842527
75% 1.893244
max 2.766618

Faster work internet has approx same ratio: 1.547102÷0.086330 = 17.9

dlawrence@dlawrence-Precision-5820-Tower:~/localwork/cdot$ python3 tests/benchmark_hgvs.py --rest tests/test_data/clinvar_hgvs_500.tsv
Using 500 test records
Total: 500, correct: 500, incorrect: 0, no data: 0
0
count 500.000000
mean 0.069974
std 0.047126
min 0.001092
25% 0.082477
50% 0.086330
75% 0.088926
max 0.712429

dlawrence@dlawrence-Precision-5820-Tower:~/localwork/cdot$ python3 tests/benchmark_hgvs.py --uta tests/test_data/clinvar_hgvs_500.tsv
Using 500 test records
Total: 500, correct: 85, incorrect: 0, no data: 415
0
count 85.000000
mean 1.040413
std 0.752840
min 0.001311
25% 0.004117
50% 1.547102
75% 1.566680
max 2.042583

davmlaw · 2024-07-29T08:07:08Z

https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-016-0396-7#Sec14 has a "truth set"

davmlaw added a commit that referenced this issue Jan 28, 2022

#5 - benchmarks - some random HGVSs from ClinVar

c406327

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do some benchmarks #5

Do some benchmarks #5

davmlaw commented Jan 19, 2022

davmlaw commented Feb 1, 2022

davmlaw commented Feb 1, 2022 •

edited

Loading

davmlaw commented Jul 29, 2024

Do some benchmarks #5

Do some benchmarks #5

Comments

davmlaw commented Jan 19, 2022

davmlaw commented Feb 1, 2022

davmlaw commented Feb 1, 2022 • edited Loading

davmlaw commented Jul 29, 2024

davmlaw commented Feb 1, 2022 •

edited

Loading