Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test data upload with different files #38

Open
esmason opened this issue Oct 24, 2016 · 3 comments
Open

test data upload with different files #38

esmason opened this issue Oct 24, 2016 · 3 comments

Comments

@esmason
Copy link
Contributor

esmason commented Oct 24, 2016

right now I'm just testing by re-uploading the same files and it seems to work but should test with different data or at least with truncated .tsv files to make sure it's working.

@oganm
Copy link
Contributor

oganm commented Oct 25, 2016

We should probably wait to decide what the exact input will be before implementing this. For instance starting users won't have tsne, then many users might not bother with using sparse matrices and can have different data structures, So data upload probably should be placed at the very front of the pipeline after everything else is done. Till then we can experiment with other datasets.

To start from zero I have several single cell datasets that I'm working on that can be useful. GSE67835, GSE67835 and GSE71585.

Here's the code to download and process them if you want to play around
Though you probably want to get rid of devtools::use_data lines.
https://gist.github.com/oganm/c9a0fea369d6e6f9eed71371730062ac

This also gives a decent idea how single cell data sharing looks like right now. character delimeted expression matrices are fairly common.

@esmason
Copy link
Contributor Author

esmason commented Oct 25, 2016

is there a (or several) standardized data formats for scRNA seq yet? We will need to specify at least some degree of formatting requirements to the user.

@oganm
Copy link
Contributor

oganm commented Oct 25, 2016

the most standard thing you can find right side of assembly is an expression matrix in any shape and form possible. I think it'll be enough for us to take in the matrix - sparse or not. If not sparse -> sparsify on intake. Gene name and cell ID (barcode) matrices might be problematic but in reality, we don't really need the cell id matrix. Order is a good enough identifier. I believe sparse matrices do not have row-column names so gene names will have to be supplied as a separate file to those or embedded in somehow (as a first row)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants