Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VEP annotations other than impact? #146

Open
karynne7 opened this issue Sep 28, 2022 · 13 comments
Open

VEP annotations other than impact? #146

karynne7 opened this issue Sep 28, 2022 · 13 comments

Comments

@karynne7
Copy link

Is there a way to pull other annotations from the CSQ field other than the built in INFO.impactful? I see that some of the flags need to be integers or flags, but I have a loftee annotations with "HC" or "LC" I'd like to filter on. (Yes, it could be turned into a flag, but there are other annotations less binary too)

I was thinking about bcftools view -i 'INFO/CSQ[70]=="HC"' but that didn't work, and I'm not sure it would capture the whole annotation if there were multiple transcripts. Happy to write my own work around, but figured I'd double check with you first! Thanks!

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|GENE_PHENO|NEAREST|SIFT|PolyPhen|DOMAINS|miRNA|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|LoF|LoF_filter|LoF_flags|LoF_info|existing_InFrame_oORFs|existing_OutOfFrame_oORFs|existing_uORFs|five_prime_UTR_variant_annotation|five_prime_UTR_variant_consequence|REVEL|SpliceRegion|AF_gnomadall|ALT|AUG_eventAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakvar_frameAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakvar_stop_locAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakvar_kozakAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakref_frameAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakref_stop_locAUG_event|var_frame|var_stop_loc|var_kozak|ref_frame|ref_stop_loc|ref_kozakref_kozak|CADD_PHRED|CLNSIG|CLNSIGINCL|CLNVC|CLNVI|Delta_G4|Delta_dsRNA|FREQ_exomes|FREQ_genomes|FREQ_gnomadall|G4|GENEINFO|Literature_source|MC|NCboost_chr_rank_perc|PMID|REF|ReMM_probability|STOP_eventSTOP_event|var_preceeding_start_loc|ref_preceeding_start_locvar_preceeding_start_locSTOP_event|var_preceeding_start_loc|ref_preceeding_start_locref_preceeding_start_loc|TE|TE_log2fold|canonical|chr|distance|dsRNA|gene_name|pos|ref|strand|transcript|transcript_ver|type|var">
1	138852	.	C	T	.	.	AC=1;AN=700;HPO_CT=1;GENE=LOC729737;MRNA=NR_039983.2;FXN=non-coding-exon;HGVS_CDNA=n.1395G>A;HGVS_PROT=.;ESP_AF=0;GNOMAD_AF=0.000106689;G1K_AF=0;CADD=27.2;nhet_aff=1;nhet_unaff=0;nhomalt_aff=0;nhomalt_unaff=0;nhet_female_aff=0;nhet_female_unaff=0;nhomalt_female_aff=0;nhomalt_female_unaff=0;nhet_male_aff=1;nhet_male_unaff=0;nhomalt_male_aff=0;nhomalt_male_unaff=0;maxAAF=0.00173611;bravo_AF=0.000615815;CSQ=T|stop_gained|HIGH|AL627309.1|ENSG00000237683|Transcript|ENST00000423372|protein_coding|1/2||ENST00000423372.3:c.458G>A|ENSP00000473460.1:p.Trp153Ter|528/2661|458/780|153/259|W/*|tGg/tAg|rs540391832||-1||SNV|Clone_based_ensembl_gene||YES||||ENSP00000473460||R4GN28&B7Z7W4|UPI0002C88512||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||HC|||GERP_DIST:0&BP_DIST:1223&PERCENTILE:0.587179487179487179&DIST_FROM_LAST_EXON:374&50_BP_RULE:PASS&PHYLOCSF_TOO_SHORT|||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|CICP27|ENSG00000233750|Transcript|ENST00000442987|processed_pseudogene||||||||||rs540391832|4016|1||SNV|HGNC|48835|YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|RP11-34P13.13|ENSG00000241860|Transcript|ENST00000484859|antisense||||||||||rs540391832|2622|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|RP11-34P13.13|ENSG00000241860|Transcript|ENST00000490997|antisense||||||||||rs540391832|3956|-1||SNV|Clone_based_vega_gene|||||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|downstream_gene_variant|MODIFIER|RP11-34P13.14|ENSG00000239906|Transcript|ENST00000493797|antisense||||||||||rs540391832|938|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|upstream_gene_variant|MODIFIER|RP11-34P13.15|ENSG00000268903|Transcript|ENST00000494149|processed_pseudogene||||||||||rs540391832|2957|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|upstream_gene_variant|MODIFIER|RP11-34P13.16|ENSG00000269981|Transcript|ENST00000595919|processed_pseudogene||||||||||rs540391832|887|-1||SNV|Clone_based_vega_gene||YES|||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000249765|CTCF_binding_site||||||||||rs540391832||||SNV||||||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||,T|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000918299|TF_binding_site||||||||||rs540391832||||SNV||||||||||||AL627309.1||||||0.0008|0.003|0|0|0|0|||0.0001265|0.002325|0|0|0|0|0|0|9.197e-05|0.003|AFR|||||||||||||||||||||||||||||||||||||||||||||||||||||||;SpliceAI=T|AL627309.1|0.00|0.00|0.00|0.00|-2|-14|-2|-5
@brentp
Copy link
Owner

brentp commented Sep 28, 2022

For now, I think it's better to do this type of thing with vembrane from @tedil as it can do this directly already.

For slivar, it's not yet implemented. This is more than trivial in part because CSQ is actually a nested array--an array for each transcript. So you'd need something like (this is not implemented, just brainstorming):

INFO.CSQ[0].LoF == "HC" || INFO.CSQ[1].LoF == "HC"

there's not information about what's a string and what's not, so for numeric, you'd still need to do:

parseFloat(INFO.CSQ[0].gnomAD_AMR_AF) < 0.01 ...

but,

@karynne7
Copy link
Author

Thank you, Brent! I'll look into it!

@jxchong
Copy link
Contributor

jxchong commented Jul 14, 2023

Hey @brentp just checking in if you've ever implemented CSQ filtration in slivar or will in the near future. We're at a point where we need to move away from GEMINI but it's been a bit of a headache to figure out how to port over all our annotation filters to a slivar-based pipeline.

@brentp
Copy link
Owner

brentp commented Jul 23, 2023

Hi Jessica. I'm looking into this now. I have started a branch here to explore: https://github.com/brentp/slivar/tree/csq
I should have progress in the next 3 weeks to know at least if it's feasible, if not have it completed.

brentp added a commit that referenced this issue Jul 24, 2023
@brentp
Copy link
Owner

brentp commented Jul 24, 2023

Hi, can you try out this file: https://raw.githubusercontent.com/brentp/slivar/83591adf2d154c45ecf5e64c87e4ba1635e874b6/js/csq.js
?

then you can do commands like:

./slivar expr \
    --js js/csq.js \
    -v tests/test.vcf \
    --pass-only \
    --info "CSQs(INFO.CSQ, VCF.CSQ, ['PolyPhen', 'SIFT']).some(function(csq) { return csq.SIFT > 10 })" \
   -o high-sift.vcf

so the expression is: CSQs(INFO.CSQ, VCF.CSQ, ['PolyPhen', 'SIFT']).some(function(csq) { return csq.SIFT > 10 })

note that CSQs is a function that returns a javascript array of CSQ objects. With keys always capitalized. The first and 2nd arg to that function will always be the same (unless your field is e.g. ANN from snpEff or BCSQ from bcftools), the final argument is a list of fields in the CSQ string that will be treated as numeric, so should include and allele frequency fields.

Because this gives an array, you can use methods like .some() which accepts a closure for each CSQ object.
Here's an example getting only missense variants from BCSQ:

'BCSQ' in INFO && CSQs(INFO.BCSQ, VCF.BCSQ, []).some(function(csq) { return csq.CONSEQUENCE == 'missense' })

Note also that duktape javascript engine used by slivar does not support "fat arrow" functions like .some(csq => csq.CONSEQUENCE == "missense").

Please let me know anything that's missing from this. I'm sure there's more we can do to make it ergonomic.
If you want to use js/slivar-functions.js and js/csq.js, for now, you'll have to concatenate them yourself.

@jxchong
Copy link
Contributor

jxchong commented Jul 24, 2023

Thanks. We're definitely slowly struggling our way through the new syntax. :)

Can you provide an example of how we would, e.g. filter for variants that both pass INFO.genic AND have a SIFT <0.05?

@brentp
Copy link
Owner

brentp commented Jul 25, 2023

Can you provide an example of how we would, e.g. filter for variants that both pass INFO.genic AND have a SIFT <0.05?

--info "INFO.genic && CSQs(INFO.CSQ, VCF.CSQ, ['SIFT']).some(function(csq) { return csq.SIFT < 0.05 })"

so for CSQ, it will always be:

CSQs(INFO.CSQ, VCF.CSQ, array_of_numeric_fields).some(function(csq) { return [expression here that uses *csq*] })

so you'll change:

  • array_of_numeric_fields to contain any field you want to be converted to an integer.
  • [expression that uses csq] to something like csq.CONSEQUENCE == "missense" && csq.SIFT < 0.05"

The some is like python's any it just checks if the expression is true for any transcript.

@brentp
Copy link
Owner

brentp commented Jul 25, 2023

I documented this in more detail here: https://github.com/brentp/slivar/wiki/CSQ

@wwgordon
Copy link

Hi @brentp ,

Thanks so much for all your help and clarification. I'm trying to sort out some of the nuances of how these queries work and am hoping you could clear something up for me. As an example:

--info "(INFO.highest_impact_order < ImpactOrder.synonymous) && variant.FILTER == 'PASS'" 

This will work for me, I presume because "synonymous_variant" is a Consequence present in my VCF and Slivar is smart enough to match "synonymous" to "synonymous_variant."

However, this will not work for me:

--info "(INFO.highest_impact_order < ImpactOrder.5_prime_UTR) && variant.FILTER == 'PASS'" \

...producing the following error:

Error: unhandled exception: SyntaxError: invalid number literal (line 1)

This is surprising to me, because "5_prime_UTR_variant" is present as a Consequence in my VCF and I would expect the matching to work similarly as the "synonymous" example.

Finally, this also does not work for me:

--info "(INFO.highest_impact_order < ImpactOrder.TFBS_ablation) && variant.FILTER == 'PASS'"

Producing many of the following errors:

[slivar] error evaluating info expression (this can happen if a field is missing):
error from duktape: unknown attribute:TFBS_ablation for expression:(INFO.highest_impact_order < ImpactOrder.TFBS_ablation) && variant.FILTER == 'PASS'

I presume because there are no "TFBS" related Consequences in my VCF. I include this example because I would expect the query to look for all of the Consequence terms above "TFBS_ablation" in the default-order.txt file, many of which are present. Is it expected behavior that the "threshold Consequence" must be present in the VCF? This is a secondary question, I am primarily interested in understanding how Slivar handles the "_variant" suffix.

Thanks again!

@brentp
Copy link
Owner

brentp commented Jul 26, 2023

Hi,
for this one:

--info "(INFO.highest_impact_order < ImpactOrder.5_prime_UTR) && variant.FILTER == 'PASS'" 

This is because ImpactOrder is a javascript object and a property can't start with a number. We can see the same problem in the javascript console for: obj = {'b': 1, '5_prime': 22} where obj.b is fine, but obj.5_prime gives an error. You can get around this with obj['5_prime'], or in your case:

"(INFO.highest_impact_order < ImpactOrder['5_prime_utr']) ... "

Note that lower-case "utr"

for TFBS_ablation try tfbs_ablation instead. All impacts will be lower-cased. I didn't document this anywhere. I'll do so now!

@wwgordon
Copy link

wwgordon commented Jul 26, 2023

I seem to be having trouble with the named indices. "(INFO.highest_impact_order < ImpactOrder['5_prime_utr']) ... " produces the Error: unhandled exception: SyntaxError: expected identifier (line 1) error. Notably, I get the same error when trying to use "synonymous" as a threshold, which was working before with the original notation:

./slivar expr \
	--vcf $vcf --ped $ped \
	--trio "${moi}:kid.alts == 2 && kid.GQ >= 20 \
	            && mom.alts == 1 && mom.GQ >= 20 \
	            && dad.alts == 1 && dad.GQ >= 20" \
	--info "variant.FILTER == 'PASS' \
	     	     && INFO.highest_impact_order < ImpactOrder.synonymous" \
	--pass-only > $intermediate 2> ./slivar_stderr.txt

... works, but

./slivar expr \
	--vcf $vcf --ped $ped \
	--trio "${moi}:kid.alts == 2 && kid.GQ >= 20 \
	            && mom.alts == 1 && mom.GQ >= 20 \
	            && dad.alts == 1 && dad.GQ >= 20" \
	--info "variant.FILTER == 'PASS' \
	     	     && INFO.highest_impact_order < ImpactOrder.['synonymous']" \
	--pass-only > $intermediate 2> ./slivar_stderr.txt

...does not work, producing the same error as above.

The lower case ImpactOrder.tfbs_ablation works beautifully, thanks!

@brentp
Copy link
Owner

brentp commented Jul 26, 2023

instead of ImpactOrder.['synonymous'], you want ImpactOrder['synonymous']

@wwgordon
Copy link

Aha, perfect. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants