-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump mmsplice version #54
Comments
Hi Jerome, Thank you for your request. CADD was trained with MMsplice 1.0.1 and scores are generated with this specific version. Therefore we don't want to upgrade MMsplice because predictions might be different and the scores will be different. This leads to a disagreement with the pre-computated whole genome files. We also see that sometimes scores are totally off when a feature has different values due to versions/environments. We can keep your request and update MMsplice on the next CADD release. Best, |
Max @visze Thanks for reply and totally agree with that. I canceled the PR I generated for this realizing that any small change can affect the scores overall. But it would be great if the new version can patch the most recent MMSplice. Legacy dependency such as concise, cython=0.29 (current cython is 3.x) etc are going to be affecting code efficiency. Plus the improvement from the new model probably would add to the CADD score in a positive way (can't gauge this at my end). I am having trouble installing the scripts locally due to conflicts. (Snakemake with mamba fails at multiple instances). |
@visze also micromamba is able to solve the environment better. |
@visze and @makirc The core models that are used in mmsplice doesn't look like they have changed since v1.0.1 -> v2.4.0. Check here What has been changed is all the infrastructure around it. I am still having trouble installing the mmsplice environment in my cluster. I get TypeError with the metaclass which is very cryptic to solve given everything is outdated
|
Outliers above are not splice variants in the new version of MMSplice hence they have NaN in their columns. So essentially proving the model's PHRED scale (which is relative) still holds for the updated version of MMSplice where variants are concordantly identified as splice variants. re-calibrating genome wide predictions to remove variants that are not splice variants anymore should actually be a good thing ? 1% of the variants had np.linalg.norm (PHRED_mmsplice_2.4.0 - PHRED_mmsplice_1.0.1 ) > 1. i.e. absolute difference of PHRED >1 between the variants. and 0.5% (57 variants ) had absolute difference of PHRED >2. stats below:
I focused on the 24 variants where the difference is > 5 and found some of them are not even splice variants (which may be due to hardware change ?, this variation is unlikely to be due to mmsplice version as both the versions report NaN in the MMSp_ columns). These variants and their corresponding MMSplice scores b/w two versions as well as their deviations are shown below.
most of the variants are novel. and all of them have higher PHRED when using the mmsplice==1.0.1. Variant "17_6758911_AGAGATGGGGTCTGAGAGTTGGGGGACGAGGGTCCAGTCCTCCCTGCAGGT_A" is the highest deviation with 13.2. is a splice_donor variant from VEP but I am not sure why MMSplice 2.4.0 didn't score this one. another example is "7_103083064_AGG_A" where the deviation is >10 and a splice variant MMSplice 2.4.0 didn't score this variant either. I am not sure why this is, it could be due to the version changes. But amongst the scored variants the deviation in PHRED is insignificant proving that the upgrade to MMSplice2.4.0 doesn't impact the CADD PHRED significantly. |
I released a new CADD-scripts version v1.7.1. Maybe you try that one. Now it is recommended to use apptainer/singularity and all environments are packed within a container and no conda builds are needed (container is 17GB large). You also need now snakemake 8. Also I updated the environments. So If you use mamba/conda instead I hope you will not face the issues you had above |
Hi,
I am trying to install CADD-scripts on my local env and the legacy dependency of mmsplice 1.0.1 with concise is giving me problems installing. Since mmsplice 2.x the concise dependency has be integrated into the core api and much of the predictions are 1:1 with the legacy api. would it be possible to bump the version of mmsplice to the most recent version?
I am trying to have local installation of CADD v1.7.
The text was updated successfully, but these errors were encountered: