Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in python #14

Open
taotao-mars opened this issue Nov 28, 2021 · 5 comments
Open

Error in python #14

taotao-mars opened this issue Nov 28, 2021 · 5 comments

Comments

@taotao-mars
Copy link

Hi,

One error occurred when I used DISCOVER, could you help me with that? Thanks!

截屏2021-11-27 下午11 54 10

The format of my binary data is:
截屏2021-11-27 下午11 53 49

@scanisius
Copy link
Member

Unfortunately, the error message alone is not specific enough to pinpoint the source of this error. So I may need some more information from you.

There is one detail that catches my eye: it looks like the gene names are a regular column in your data frame as opposed to the index, which is what DISCOVER expects. If you read these data with read_table (or read_csv), could you try passing the argument index_col=0 to read_table?

Please try that first. If the above does not fix the problem, I will ask you for some more detailed information.

@taotao-mars
Copy link
Author

Hi,

Thanks for your reply. Yes, I added index_col=0 when I read_csv.

截屏2021-11-29 上午11 33 11

I compared the example data frame with my data frame. They look similar, and the same error appeared.
截屏2021-11-29 上午11 34 34

@scanisius
Copy link
Member

In this example it seems that all elements of subset are False, which means that pairwise_discover_test receives an empty mutation matrix. That would indeed give the error you are seeing.

With the line df11 = df11.iloc[:5, :5] you have overwritten (probably unintentionally) your full mutation matrix with a small sub-matrix that does not contain any mutations anymore.

@taotao-mars
Copy link
Author

Thanks for your reminder. My data is huge, so I want to intercept part of it for testing. My problem was solved when I increased the amount of data.

And also, are there any parameters I should adjust for large data sets? My data has been running for over 13 hours. Thanks

@scanisius
Copy link
Member

Good to hear your problem has been solved. In the next update of the DISCOVER package I will add an explicit check for empty mutation matrices so that at least the error message will be more informative.

As for your second question, I assume that the long runtime you report is for the pairiwse_discover_test function, not for the call to DiscoverMatrix. Is that correct? If so, what you can do to speed up the process substantially is to pass the argument fdr_method="BH" to pairwise_discover_test.

This speed up does come at a price though. With the above option you are asking DISCOVER to perform multiple testing correction with the standard Benjamini-Hochberg procedure, as opposed to the default, which uses a discrete version of the Benjamini-Hochberg procedure. The advantage of the discrete version is that it tends to give lower Q values, but the disadvantage is that it takes much more time. In contrast, the standard version (enabled with the fdr_method="BH" argument) is faster, but with the disadvantage of a somewhat reduced sensitivity (i.e. higher Q values). That trade off is yours to make.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants