-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: vcf is not a valid file or directory. Please provide a valid file or directory. #71
Comments
Hi @arvkevi , I obtained the prediction.csv file and plotted it. The problem was probably due to a malformed file; I generated again the VCF file adding some parameters in VEP. |
Ezancestry uses snps to read vcfs in process.py. Are the two samples related? Do they have the exact same set of AISNPs? |
I noticed it, also using snps I've same results. The samples are not related, they belong two different person. In a while I'll analyze wgs of other 2 different samples, I'll test also on those the script.
|
Hi @arvkevi also with other 2 samples I've same problem. Following head of vcf with SNPs that I give in input. Is that correct for Ezancestry?
|
Hey @RosaDeSa, one other thing that could be contributing to this is having too many missing AISNPs in the vcf. When you call predict, it should log a message indicating how many AISNPs were present in your vcf for a sample. It looks like this (from cell 23 of this notebook). 2021-09-20 06:25:34.289 | INFO | ezancestry.process:_input_to_dataframe:276 - Sample has a valid genotype for 44
out of a possible 55 (80.0%) Do you know how many AISNPs were in your input samples? |
Yes, you're right! I've 0 of out of possible 55 using the Kidd set and 1 of 127 using the Seldin set. |
Hmm, the merge is on both rsid AND position. Unfortunately, this requires vcf annotated with rsids and for the position to match the hg19 positions from the .aisnps files. You could try commenting out "chr" and "position_hg19" in this line, but I haven't looked at the hg19->hg38 liftover in about a year. So if you do this, you should see if any alleles changed. I'll have to think about how ezancestry could support hg38. The easiest would probably be a --hg38 flag that uses new versions of the aisnps files. But I won't have time to get to this work for a little while. |
Hi Kevin , I'm trying this script but I'm running into this error during the prediction:
(the vcf file was annotated with VEP)
DEBUG | ezancestry.process:process_user_input:214 - list index out of range
Traceback (most recent call last):
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/ezancestry/process.py", line 217, in process_user_input
snpsdf = pd.read_csv(
File "/usr/local/lib/python3.9/dist-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/readers.py", line 678, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/readers.py", line 581, in _read
return parser.read(nrows)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/readers.py", line 1253, in read
index, columns, col_dict = self._engine.read(nrows)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/python_parser.py", line 270, in read
alldata = self._rows_to_cols(content)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/python_parser.py", line 1013, in _rows_to_cols
self._alert_malformed(msg, row_num + 1)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/python_parser.py", line 739, in _alert_malformed
raise ParserError(msg)
pandas.errors.ParserError: Expected 3 fields in line 7, saw 4
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tigem/r.desantis/.local/bin/ezancestry", line 8, in
sys.exit(app())
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/typer/main.py", line 214, in call
return get_command(self)(*args, **kwargs)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/typer/main.py", line 532, in wrapper
return callback(**use_params) # type: ignore
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/ezancestry/commands.py", line 286, in predict
snpsdf = process_user_input(input_data, aisnps_directory, aisnps_set)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/ezancestry/process.py", line 232, in process_user_input
raise ValueError(
ValueError: a1.VEP.ann.vcf is not a valid file or directory. Please provide a valid file or directory.
The text was updated successfully, but these errors were encountered: