Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't get a working pair of vcf2db - gemini #68

Open
naumenko-sa opened this issue Feb 23, 2022 · 2 comments
Open

can't get a working pair of vcf2db - gemini #68

naumenko-sa opened this issue Feb 23, 2022 · 2 comments

Comments

@naumenko-sa
Copy link
Contributor

naumenko-sa commented Feb 23, 2022

Hi @brentp !

Thanks so much for supporting half of the ecosystem just by yourself!

In an old project I suddenly can't get a pair of vcf2db.py - gemini working with gts fields compression.

I create a db as usually with:

vcf2db.py cohort.vcf.gz. cohort.ped. cohort.db

The cohort is large ~ 1000 samples.

then I query it with:

gemini \
query \
--header \
-q "select chrom, start, end, ref, alt, variant_id, gene, aaf, gts.REAL_SAMPLE_ID from variants limit 100" cohort.db

and I'm getting:

chrom	start	end	ref	alt	variant_id	gene	aaf	gts.REAL_SAMPLE_ID
Traceback (most recent call last):
  File "/bcbio/bin/gemini", line 7, in <module>
    gemini_main.main()
  File "/bcbio/anaconda/envs/python2/lib/python2.7/site-packages/gemini/gemini_main.py", line 1249, in main
    args.func(parser, args)
  File "/bcbio/anaconda/envs/python2/lib/python2.7/site-packages/gemini/gemini_main.py", line 439, in query_fn
    gemini_query.query(parser, args)
  File "/bcbio/anaconda/envs/python2/lib/python2.7/site-packages/gemini/gemini_query.py", line 169, in query
    run_query(args)
  File "/bcbio/anaconda/envs/python2/lib/python2.7/site-packages/gemini/gemini_query.py", line 141, in run_query
    for row in gq:
  File "/bcbio/anaconda/envs/python2/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 771, in next
    val = row[source][idx]
  File "/bcbio/anaconda/envs/python2/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 449, in __getitem__
    self.cache[key] = self.unpack(self.row[key])
  File "/bcbio/anaconda/envs/python2/lib/python2.7/site-packages/gemini/compression.py", line 94, in snappy_unpack_blob
    dt = lookup[blob[0]]
KeyError: 'U'

Both vcf2db.py and gemini were installed via bioconda in bcbio, python2 environment (worked just well untill recently).
https://github.com/chapmanb/cloudbiolinux/blob/master/contrib/flavor/ngs_pipeline_minimal/packages-conda.yaml#L350

I tried to install a fresh vcf2db.py in python3 environment with:

git clone [email protected]:quinlan-lab/vcf2db.git
# with python3.10 I had a conflict with cyvcf2 with a downgrade list, so I used python3.9
conda create -n vcf2db_python3 python=3.9
conda install -n vcf2db_python3 -c conda-forge gcc snappy
conda install -n vcf2db_python3 -c conda-forge python-snappy
conda install -n vcf2db_python3 -c bioconda cyvcf2 peddy
conda activate vcf2db_python3
which pip
cd vcf2db
pip install -r requirements.txt 

This received htslib1.13, python-snappy=0.6.0,snappy=1.1.8

When I create a database with python vcf2db.py from vcf2db_python3 env and trying to read with gemini I am getting the same error.

I tried also to make python-snappy versions the same in python2 gemini environment and in vc2db_python3 by

conda install -n vcf2db_python3 python-snappy=0.5.4 -c conda-forge

still the same snappy unpack blob error.

Let me know if you have any ideas - I am ready to put in efforts to resolve this.

Sergey

@naumenko-sa
Copy link
Contributor Author

This solved for me by downgrading cyvcf2

conda install -n python2 cyvcf2=0.10.0

Now I have

# Name                    Version                   Build  Channel
cyvcf2                    0.10.0           py27h355e19c_0    bioconda

instead of

Original 
# Name                    Version                   Build  Channel
cyvcf2                    0.30.11          py27h3ce6e29_0    bioconda

@brentp
Copy link
Member

brentp commented Feb 24, 2022

Hi, thanks for figuring this out (I was at a loss for what to do).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants