Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n samples in sample_group and datafile are different error with CRAN install, not reproduced after installation from github #6

Open
joshuamschmidt opened this issue Aug 1, 2022 · 1 comment

Comments

@joshuamschmidt
Copy link
Contributor

joshuamschmidt commented Aug 1, 2022

Get an error that n samples in sample_group and datafile are different when they are not:

unload('smartsnp')
remove.packages('smartsnp')   
pacman::p_load('smartsnp','data.table','ggplot2','ggrepel',install = T, update = F)
group_id <- fam$phase
length(group_id)

[1] 9411

SP <- which(fam$phase != "kg")
gfile <- 'hidden.traw'
sm.pca <- smart_pca(snp_data = gfile, sample_group = group_id, missing_value = 9, missing_impute = "mean",
                    scaling = "drift", program_svd = "RSpectra", pc_axes = 2, 
                    sample_project = SP, pc_project = 1:2)

Imported 126351 SNP by 9411 sample genotype matrix
Time elapsed: 0h 0m 44s
Filtering data...
Error in smart_pca(snp_data = gfile, sample_group = group_id, missing_value = 9, :
length(sample_group) should be equal to number of samples in dataset: computation aborted

This is an uninformative error, as it doesn't report the expected number of samples given by sample_group variable. I added code to do this (see separate pull request #5.), but upon installing this branch, the error went away. This is reproducible with the main branch:

unload('smartsnp')
remove.packages('smartsnp')
devtools::install_github("ChristianHuber/smartsnp")
library('smartsnp')
sm.pca <- smart_pca(snp_data = gfile, sample_group = group_id, missing_value = 9, missing_impute = "mean",
                    scaling = "drift", program_svd = "RSpectra", pc_axes = 2, 
                    sample_project = SP, pc_project = 1:2)

Imported 126350 SNP by 9411 sample genotype matrix
Time elapsed: 0h 0m 8s
Filtering data...
126350 SNPs included in PCA computation
2504 samples included in PCA computation
6907 samples projected after PCA computation
Completed data filtering
Time elapsed: 0h 0m 14s
Scanning for invariant SNPs...
Scan complete: no invariant SNPs found
Time elapsed: 0h 0m 28s
Checking for missing values...
Scan completed: no missing values found
Time elapsed: 0h 0m 29s
Scaling values by SNP...
Centering and scaling by drift dispersion...
Completed scaling using drift
Time elapsed: 0h 0m 30s
Computing singular value decomposition using RSpectra...
Completed singular value decomposition using RSpectra
Time elapsed: 0h 0m 51s
Extracting eigenvalues and eigenvectors...
Eigenvalues and eigenvectors extracted
Time elapsed: 0h 0m 51s
Projecting ancient samples onto PCA space
PCA space = PC1PC2
Completed ancient sample projection
6907 ancient samples projected
Time elapsed: 0h 2m 5s
Tabulating PCA outputs...
Completed tabulations of PCA outputs...

I note that the number of SNPs differs by 1. Has the function to read in traw files changed?

session_info()

setting value
version R version 4.2.1 (2022-06-23 ucrt)
os Windows 10 x64 (build 18363)
system x86_64, mingw32
ui RStudio
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Adelaide
date 2022-08-01
rstudio 2022.02.3+492 Prairie Trillium (desktop)
pandoc NA

@joshuamschmidt joshuamschmidt changed the title error with CRAN install not reproduced after installation from github n samples in sample_group and datafile are different error with CRAN install, not reproduced after installation from github Aug 1, 2022
@ChristianHuber
Copy link
Owner

Thanks for the fix of the error message!
No, we didn't change the function for reading traw files. Not sure why there is one SNP less.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants