Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do some benchmarks #5

Open
davmlaw opened this issue Jan 19, 2022 · 3 comments
Open

Do some benchmarks #5

davmlaw opened this issue Jan 19, 2022 · 3 comments

Comments

@davmlaw
Copy link
Contributor

davmlaw commented Jan 19, 2022

  • Generate 100/1k/10k random HGVS from ClinVar
  • Install SeqRepo locally
  • Benchmark how long it takes with JSON and REST client - check out caching/non caching etc
@davmlaw
Copy link
Contributor Author

davmlaw commented Feb 1, 2022

We should make a proper script to generate from clinvar etc

We should also make an option to use unique transcripts. Otherwise we're pulling from local cache and it's a bit unfair

UTA is missing around ~80% of the transcripts, doesn't affect results as we only count good ones, but makes it take 5x as long to run to generate results we can use. Perhaps we should restrict to those to make the benchmarks quicker for them.

@davmlaw
Copy link
Contributor Author

davmlaw commented Feb 1, 2022

Here's benchmarks on existing HGVS (with dupe transcripts) - I think using median is fairest as the mean includes local cache.

This is not a totally fair benchmark as cdot.cc is in Australia (1000 miles away), while UTA is in the USA

On 500 random ClinVar HGVS entries:

cdot REST - median of 0.1s/HGVS (resolved 100% - 500/500)
UTA - median of 1.84s/HGVS (resolved 17% - missing data on 415/500 transcripts)

Initial results are 17x faster and resolved 5.8x more transcripts


dlawrence@dlawrence-XPS-15-9560:~/localwork/cdot$ ./tests/benchmark_hgvs.py tests/test_data/clinvar_hgvs_100.tsv --rest
Using 100 test records
Total: 100, correct: 100, incorrect: 0, no data: 0
0
count 100.000000
mean 0.114272
std 0.061827
min 0.001432
25% 0.107378
50% 0.111659
75% 0.124938
max 0.410409

dlawrence@dlawrence-XPS-15-9560:~/localwork/cdot$ ./tests/benchmark_hgvs.py tests/test_data/clinvar_hgvs_100.tsv --uta
Using 100 test records
Total: 100, correct: 22, incorrect: 0, no data: 78
0
count 22.000000
mean 1.593698
std 0.781262
min 0.000808
25% 1.842216
50% 1.848696
75% 1.950901
max 2.468861

dlawrence@dlawrence-XPS-15-9560:~/localwork/cdot$ ./tests/benchmark_hgvs.py tests/test_data/clinvar_hgvs_500.tsv --rest
Using 500 test records
Total: 500, correct: 500, incorrect: 0, no data: 0
0
count 500.000000
mean 0.098256
std 0.097638
min 0.000865
25% 0.097436
50% 0.108600
75% 0.115424
max 1.316386

dlawrence@dlawrence-XPS-15-9560:~/localwork/cdot$ ./tests/benchmark_hgvs.py tests/test_data/clinvar_hgvs_500.tsv --uta
Using 500 test records
Total: 500, correct: 85, incorrect: 0, no data: 415
0
count 85.000000
mean 1.286009
std 0.944009
min 0.000821
25% 0.002182
50% 1.842527
75% 1.893244
max 2.766618


Faster work internet has approx same ratio: 1.547102÷0.086330 = 17.9

dlawrence@dlawrence-Precision-5820-Tower:~/localwork/cdot$ python3 tests/benchmark_hgvs.py --rest tests/test_data/clinvar_hgvs_500.tsv
Using 500 test records
Total: 500, correct: 500, incorrect: 0, no data: 0
0
count 500.000000
mean 0.069974
std 0.047126
min 0.001092
25% 0.082477
50% 0.086330
75% 0.088926
max 0.712429

dlawrence@dlawrence-Precision-5820-Tower:~/localwork/cdot$ python3 tests/benchmark_hgvs.py --uta tests/test_data/clinvar_hgvs_500.tsv
Using 500 test records
Total: 500, correct: 85, incorrect: 0, no data: 415
0
count 85.000000
mean 1.040413
std 0.752840
min 0.001311
25% 0.004117
50% 1.547102
75% 1.566680
max 2.042583

@davmlaw
Copy link
Contributor Author

davmlaw commented Jul 29, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant