You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background:
In FILTER, multiple filters should be separated by semicolons. The widely used, but not actively maintained, VarScan2 genomic variant caller uses commas instead. Moreover, VarScan2 does not add ##FILTER metadata for most of its filters. Picard FixVcfHeader can be used to fix missing FILTER metadata. A "fixed" metadata row will look like: ##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">
Error:
PyVCF fails with:
`
Traceback (most recent call last):
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 236, in
main()
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 232, in main
run(parser.parse_args())
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 166, in run
df_1 = vcf_to_dataframe(args.vcf_1)
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 74, in vcf_to_dataframe
vcf_reader = vcf.Reader(open(vcf_file, "r"))
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 300, in init
self._parse_metainfo()
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 326, in _parse_metainfo
key, val = parser.read_filter(line)
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 142, in read_filter
raise SyntaxError(
SyntaxError: One of the FILTER lines is malformed: ##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">
`
Issue:
It might be more robust for PyVCF to treat a filter with commas as just one big filter name, as does Picard FixVcfHeader.
Instead of raising an exception, accept metadata with a filter ID inside double quotes and containing commas, e.g., ID="RefAvgRL,VarAvgRL".
Similarly, in the data, treat a FILTER value like RefAvgRL,VarAvgRL as a single entity. I think this solution is consistent with the VCF 4.2 spec for a filter name: String, no whitespace or semicolons permitted.
Possible pull request:
This hack (changing [^,] + to .+ worked to get me through an urgent analysis, but it may not be the best solution. At parser.py line 142 self.filter_pattern = re.compile(r'''\#\#FILTER=< ID=(?P<id>.+),\s* Description="(?P<desc>[^"]*)" >''', re.VERBOSE)
The text was updated successfully, but these errors were encountered:
Background:
In FILTER, multiple filters should be separated by semicolons. The widely used, but not actively maintained, VarScan2 genomic variant caller uses commas instead. Moreover, VarScan2 does not add ##FILTER metadata for most of its filters. Picard FixVcfHeader can be used to fix missing FILTER metadata. A "fixed" metadata row will look like:
##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">
Error:
PyVCF fails with:
`
Traceback (most recent call last):
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 236, in
main()
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 232, in main
run(parser.parse_args())
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 166, in run
df_1 = vcf_to_dataframe(args.vcf_1)
File "/mnt/hdd/dnanexus/scripts_local/compare_vcfs.py", line 74, in vcf_to_dataframe
vcf_reader = vcf.Reader(open(vcf_file, "r"))
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 300, in init
self._parse_metainfo()
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 326, in _parse_metainfo
key, val = parser.read_filter(line)
File "/home/myourshaw/.venv/dnanexus/lib/python3.10/site-packages/vcf/parser.py", line 142, in read_filter
raise SyntaxError(
SyntaxError: One of the FILTER lines is malformed: ##FILTER=<ID="RefAvgRL,VarAvgRL",Description="Missing description: this FILTER line was added by Picard's FixVCFHeader">
`
Issue:
It might be more robust for PyVCF to treat a filter with commas as just one big filter name, as does Picard FixVcfHeader.
Instead of raising an exception, accept metadata with a filter ID inside double quotes and containing commas, e.g.,
ID="RefAvgRL,VarAvgRL"
.Similarly, in the data, treat a FILTER value like
RefAvgRL,VarAvgRL
as a single entity. I think this solution is consistent with the VCF 4.2 spec for a filter name:String, no whitespace or semicolons permitted
.Possible pull request:
This hack (changing
[^,] +
to.+
worked to get me through an urgent analysis, but it may not be the best solution. At parser.py line 142self.filter_pattern = re.compile(r'''\#\#FILTER=< ID=(?P<id>.+),\s* Description="(?P<desc>[^"]*)" >''', re.VERBOSE)
The text was updated successfully, but these errors were encountered: