Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String treatments (alleles) implemented for GWAS #197

Closed
wants to merge 5 commits into from

Conversation

joshua-slaughter
Copy link
Contributor

String treatments are now available.

  1. Modified functions in src/inputs_from_config.jl to cross-reference SnpArrays.snpdata.snp_info to map UInt8 coding to String.
  2. These naming conventions do not follow any conventions when determining A1 or A2 however, having the string should be enough given the user will run post-hoc analyses that will tell them which allele is the minor allele in the population that they're interested in.
  3. Added helper functions to testutils.jl to allow for checks in test to occur without saving snp_info to arrow taking up unnecessary space in a non-test run.

@olivierlabayle
Copy link
Member

Thanks Josh for taking the lead on this, have you checked how much disk and/or memory overhead is generated by the string representation? Could you provide these figures here, for a few chromosomes?

I will have a look at the code a little later.

@joshua-slaughter
Copy link
Contributor Author

Thanks Josh for taking the lead on this, have you checked how much disk and/or memory overhead is generated by the string representation? Could you provide these figures here, for a few chromosomes?

I will have a look at the code a little later.

Will do!

@joshua-slaughter
Copy link
Contributor Author

Just got around to testing this and the overhead is around 17 fold and a significant time increase. I can post the graphics once they submitted jobs finish running, but for now I will just aim to ensure that 0x00 is minor-minor, 0x01 is major-minor, and so on using SnpArrays.jl minorallele() function to check this version should not increase overhead.

@olivierlabayle
Copy link
Member

Thank you Josh, that helps a lot. Yes 0/1/2 counting minor alleles is definitely the way to go for now.

Note: In the longer run I think we should build a TMLE process that can read BGEN or BED files straight away. Probably a package similar to TMLECLI.

@olivierlabayle
Copy link
Member

@joshua-slaughter should we close this then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Genotype changes in GWAS are ambiguous Genotype changes in GWAS are ambiguous
2 participants