Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading Large files #197

Open
tomasflyo opened this issue Jan 13, 2019 · 6 comments
Open

Uploading Large files #197

tomasflyo opened this issue Jan 13, 2019 · 6 comments

Comments

@tomasflyo
Copy link

Dear Alper,

Sorry for opening a new thread. I didn't want to interfere with any issues Aviv might discover in his ongoing debugging attempts.

  1. I took off the "-" dashes but it did not do anything.
    As far as I can tell, the problem seems to have been that the sequential order of the sample IDs in the Metadata File was not in complete accordance with the order of appearance of the same IDs in the Count Data File. After I rearranged the order I was able to continue to filter the data.

  2. I still can't get the entire file to upload. I can successfully upload a Count Data File that reaches row number 15,905 along with the Metadata file. But anything beyond those 15,905 rows gives back the same error message:

Warning: Error in as.data.frame.default: cannot coerce class ‘"try-error"’ to a data.frame
  77: stop
  76: as.data.frame.default
  73: observeEventHandler
   2: runApp
   1: startDEBrowser

Upon inspection, I found nothing remarkable about line 15906.

  1. After successfully uploading my trimmed Count file and it's respective Metadata
    15905
    I filter the data, go to DE Analysis and add a new comparison according to the Metadata file.
    If I choose DESeq, I get the following error message:

Warning: Error in DESeqDataSet: counts matrix should be numeric, currently it has mode: logical

If I choose EdgeR:

Warning: Error in rep: invalid 'times' argument

Any advice?

Thanks again for all the help,
Tom

@nephantes
Copy link
Member

Just check the line 15906. There can be some unexpected character in that line. Check the enter char etc. You can just cut the lines between 15900 and 15910, and try to upload this portion to understand what might be wrong. You can send me that portion, I can check too. I see there are lots of genes 0 count. You can always pre-filter before you upload them to DEBrowser. This might also help reducing the # of lines, if there is a memory issue in your computers.
It looks like, you have some non-numerical values somewhere in your data. They might not be visible.

@tomasflyo
Copy link
Author

I see there are lots of genes 0 count. You can always pre-filter before you upload them to DEBrowser. This might also help reducing the # of lines, if there is a memory issue in your computers.

Yes, thats exactly what I have been up to.
I have reduced the dataframe to ~20K rows, but between line 13K - 18K there seem to be a number of problematic lines.

Count_Data_File_low_mean_no_dotsdashes_13001_20000.zip

From looking at this dataset in python I can't see anything wrong with it. But each time I upload it into the DEBrowser, it crashes.

@nephantes
Copy link
Member

These are duplicated genes in this data table.
So the gene names has to be unique in a count table. Otherwise, it cannot load the table.

5502 LIMS3 16 117 13
3372 NPIPA7 62 68 92
1896 PNRC2 147 498 68
6462 SCARNA17 14 34 37
2055 SNORA11 5 13 1
5053 SNORA31 4 6 3
5077 SNORA31 40 18 23
6230 SNORA40 58 38 17
5045 SNORA66 21 39 58
2098 SNORD19 6 18 10
4053 snoU13 8 14 14
4055 snoU13 66 41 72
4056 snoU13 10 2 9
4058 snoU13 5 9 33
4060 snoU13 18 9 7
4062 snoU13 69 38 58
4066 snoU13 16 10 6
4067 snoU13 33 33 57
6296 U3 23 12 387
6900 U3 13 19 10
765 Y_RNA 29 52 51
768 Y_RNA 24 14 17
770 Y_RNA 11 25 13
773 Y_RNA 17 5 12
774 Y_RNA 38 20 47
782 Y_RNA 16 36 22
788 Y_RNA 6 7 14
1193 Y_RNA 6 11 20
1204 Y_RNA 16 2 9
1214 Y_RNA 1 11 6
1216 Y_RNA 48 0 0
1221 Y_RNA 9 14 15
1411 Y_RNA 34 1 1
4065 Y_RNA 61 76 107
5041 Y_RNA 22 0 14
5048 Y_RNA 55 373 229
5069 Y_RNA 66 77 162
5073 Y_RNA 7 16 21
5074 Y_RNA 25 17 25
3830 ZNF26 2 21 15
1969 ZNF84 22 22 20

@tomasflyo
Copy link
Author

Awesome. Thank you!

One last thing is that when I finally reach the DE analysis part, I have the option to choose my metadata file, but when I select it and continue, the DEBrowser crashes.

select meta
select meta2

new_meta.txt

Thanks,
Tom

@nephantes
Copy link
Member

nephantes commented Jan 14, 2019

R adds X in front of the column names automatically, if the column names start with a number and they don't match with the sample names in your meta_data file. I have added 's' in front of the sample names in both files (meta_data and count table), it worked properly.

screenshot 2019-01-13 21 16 46

@tomasflyo
Copy link
Author

Dear Alper,
Looks like everything is working great,
Thank you so much for all the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants