-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROSMAP #4
Comments
Notes:
|
Question: Is this metabolomics folder supposed to be empty? Can it be deleted? Question: Imaging metadata? Answer: Don't add. Updates:
rnaSeq
|
Removed tasks related to metabolomics. This assay will be moved out of the main ROSMAP study and will not count toward 'clean' completion. |
rnaSeq
Fixed duplicates in rnaSeq metadata. The problem specimen, above, has all values in it's column for now and is in the first row of the file. Uploaded to cleaning folder here.
ChIPseq Problem specimen: 11464261
Updates:
Multispecimen files (will most likely not be using specimenIDs since these would be RIDs when the 'clean' metadata is uploaded):
|
TMT proteomics
scrnaSeq
snpArray
WGS
methylationArray
miRNAcounts
rnaArray
|
label free proteomics
|
General notes:
|
Mapping IDs to projids
|
biospecimen This is a mess... There are more specimens in ROSMAP biospecimen file than there are specimens in all of the assay metadata files (even without having all assay specimens), meaning there's either too many duplicates in the biospecimen file OR there are missing specimens in the assay metadata files.
ROSMAP_biospecimen_metadata_combined has all assay specimens. The number of specimens matches the total number of specimens in these assay files, with one exception for the single 'control' in proteomics. |
|
Stuff I did yesterday, but didn't click the 'comment' button on:
Some of this needs to be fixed. Namely, I confused the microglia scrnaSeq with the bulk microglia rnaSeq. This data needs to be moved to the rnaSeq metadata and the biospecimen assay column updated with the correct term.
Other update done today:
|
Question: I have 'excludeReason' in biospecimen. Should I also add the boolean 'exclude'?
Question: This is related to Jake's concern. I was thinking this would be a big problem, but it somewhat less so. The idea is that it could be hard to get the exact metadata set desired. For example, we have multiple sets of rnaSeq assays. We can join the metadata files by filtering biospecimen to just rnaSeq assay rows. However, that gives a bigger dataset than what was used in just one of those subsets (microglia, for example). According to Jake, bioinformatics professionals may not be great at joins or cleaning. It would potentially also help with reproducibility/transparency to be able to filter to the exact subset of data. My question is how much work do we want to do for the data users?
Question: Mette mentioned that there appears to be a duplication issue with dlpfc scRNAseq. I'm not following. These seem unique to me.
Question: For the FACS sorted bulk cell rnaSeq, I added _1 and _2 to the specimenID. The reasoning is in a comment above. This needs to be approved or improved before I change the annotations on these files.
Question: How should I be reading the WGS sample swap file? Path forward? Same question regarding "duplicate" file.
Question: There was a question asked about the one specimen with multiple values in the rnaSeq metadata. This was mentioned in a previous comment above, but this will need to be cleared up with whoever is responsible for that data. It is unknown if that sample was run 3 times or if it was accidentally entered 3 times with different values.
Question: Do we even want to mess with multispecimen files at all? Many that I have seen use the projid, which can be found via metadata. The 'annoyance' with these is leading 0's, which is something Jake also mentioned. But overall, there seems to be hesitation with changing ROSMAP data at all so should we consider multispecimen files 'clean'?
Question: Can I delete this empty folder? Was there supposed to have been data here?
Question: What's with this Staging folder in Proteomics (SRM)?
Question: May I update wiki links for the portal as I finish updating wiki's (same question for Mayo) or does there need to be an approval process? The updates are only formatting and merging wikis that should be together.
|
|
Annotations
|
While the checkboxes in the main issue are items that should be completed once we get metadata confirmation, I am adding general reminders here.
Notes:
|
Had a meeting with Mette, Abby, and Yan. Yan said there was probably no one there that could check out the metadata and verify that it was good. He mentioned that our metadata was probably better than what they could provide anyway. With this information, we are going to release the new metadata files. There is one outstanding issue in the rnaSeq metadata where one specimen has 3 batches. Still need to determine which batch they should be in. Released:
Updated naming on TMT quantitation. Covariates to deprecate (?):
|
@Aryllen when you get a minute can you give me edit privileges on this repo? That way I can edit your original comment to check off boxes and such. Thanks! |
Annotations
|
I've been checking through the updated biospecimen and assay metadata to make sure I have all the info I will need to update annotations. There are a few studies that look good and are ready to go, and a few that I have questions on. Questions and issues for each set of data are outlined here in this doc. In the meantime I'll start annotating the RNAseq and scRNAseq files. |
@Aryllen There are ~190 specimens in the updated biospecimen metadata file that are missing individual IDs but do NOT have an exclude = true tag and are NOT pooled samples... I read through all your previous notes but couldn't find anything about this many missing individualIDs. Here's teh breakdown by assay: Are these just missing? Do I need to try to find these individualIDs somewhere? |
Notes on bulk RNAseq annotations:
To do:
|
@avanlinden, I think I mistyped a comment in my notes way up above there, which probably attributed to missing this. Sorry! I believe the solution is to check out this deprecated file. This is most likely a leading 0 problem on projid. The other deprecated covariate files are in that same area ([deprecated ROSMAP] (https://www.synapse.org/#!Synapse:syn20682034)). |
@Aryllen Oh yep, those are them. Thank you! I will get them joined up just for completeness sake and upload a new version. |
Bulk RNAseq annotations are as complete as I can get them:
Remaining issues:
Moving on to another assay. |
rnaArray annotations are as complete as currently possible:
|
confocal imaging annotation updates are done:
|
scrnaSeq annotations are done, with one remaining question about diagnosis:
remaining issues:
|
Study folder: syn3219045
We can expand these checks to be more specific, or mark them off/remove them if they are not relevant.
Folder Structure
Metadata (within file)
Checks for each metadata file:
Metadata (across files)
Annotations
Multispecimen Files
Check that specimenIDs in files match IDs in metadata.
Wikis
Clinical data
Access (Human)
Portal
The text was updated successfully, but these errors were encountered: