Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Banner proteomics metadata discrepancies #14

Open
4 tasks
avanlinden opened this issue Oct 26, 2021 · 5 comments
Open
4 tasks

Banner proteomics metadata discrepancies #14

avanlinden opened this issue Oct 26, 2021 · 5 comments
Assignees
Labels
curation issue related to curation or cleaning of AD portal data

Comments

@avanlinden
Copy link
Collaborator

avanlinden commented Oct 26, 2021

@jgockley62 identified inconsistencies in CERAD scores for individuals from the Banner cohort in the original Banner LFQ proteomics traits file and the new Consensus project TMT proteomics on the same samples.

The original Banner study needs updated metadata files that meet our current metadata standards. The original Banner case IDs (individualIDs) have been corrupted and lost from the existing traits file and can be taken from the consensus project traits file. The CERAD discrepancies are due to a change in how CERAD was evaluated and a note should be added to both study descriptions.

  • create standardized metadata files for original Banner LFQ data
  • re-map Banner caseIDs using consensus file + Tom Beach's data sheet forward from Eric dammer
  • update CERAD information
  • document changes to CERAD metadata in Banner study
@avanlinden
Copy link
Collaborator Author

Jake's original email to Eric Dammer:

Hey Eric,

I was digging around the Banner LFQ/TMT samples and I ran into a bit of a conundrum.
The CERAD scores for individuals are quite different between the former LFQ samples versus the new TMT samples.

The LFQ meta-data synID we have is syn9740295
And I'm using the new TMT from the consensus paper located here: syn25006658

I matched the samples from individuals with TMT and LFQ and noticed some discrepancies:

LFQ - syn9740295
table(comp_trial$CERAD)
-1 0 1 2 3
. 36 4 37 5 78

TMT - syn25006658
table(temp_trial$CERAD)
0 1 2 3
23 15 25 97

And compared also has some discordance beyond a simple adjustment
table( comp_trial$CERAD, temp_trial$CERAD)

     0   1   2  3

-1 22 13 0 1
0 1 1 2 0
1 0 1 17 19
2 0 0 5 0
3 0 0 1 77

Not too sure where the discordance comes from but I thought I'd try and track it down. I cc'd Mette and Abby on our DCC team as they have more info on the LFQ data side.

Best,
Jake

@avanlinden
Copy link
Collaborator Author

avanlinden commented Oct 26, 2021

Eric's reply:

This overlooked CERAD discrepancy is troubling and should be addressed without compromising reproducibility of the published LFQ consensus analyses. See explanation in the email just forwarded to you and Mette (cc: Jim and Erik). I recommend keeping both the Mirra 1991 based score and adding in the updated plaque density-based CERAD, independent of cognition in the traits for the LFQ 201 Banner cases.

I cannot see any Banner case IDs in the LFQ traits on the SynID you provided, but do see the 201 cases with their original batch_runNumber file ID. The Banner IDs which were 2 numbers separated by a dash likely corrupted into date formatted cells by excel and then discarded, had to be remapped to the file IDs so that the CERAD differences for the same Banner IDs are clear. Please rely on the censored traits for the same 201 case samples in the Nature Neurosci TMT Banner traits attached here, based on Tom Beach's February 2019 update of CERAD from the prior Mirra 1991 criteria-based scores. The green tab has the map of Banner ID to LFQ fileID to TMT batch.channel, along with both CERAD score versions (Mirra 1991 and Beach 2019).

Sincerely,

Eric

The files Eric attached contain some potentially PHI so I uploaded them in the Staging folder of the original Banner study here: https://www.synapse.org/#!Synapse:syn26403225.

@avanlinden
Copy link
Collaborator Author

avanlinden commented Oct 26, 2021

Jake identified three missing sample IDs from Eric's attached files that are not in the original LFQ metadata: Sample IDs are: b4_134_04, b4_007_23, and b3_041_03

Eric responded:

It looks like 9 case samples per TMT batch x 22 banner TMT batches = 198, which is short those 3 cases from the 201 originally purchased, received, and run for LFQ proteomics dating back to 2014.

Tom Beach's sheet in response to Erik's questions in the forwarded email should have the 3 corrected/updated CERAD scores, however.

@avanlinden
Copy link
Collaborator Author

Further information from Eric on the CERAD score changes:

Jake,

I confirm the discrepancy in CERAD 0-3 (previously 0, A, B, or C and corresponding literal key) for a number of the same 201 case samples from Banner Sun Health between the LFQ and the TMT traits for prefrontal cortex proteomics. I think the explanation you need dates back to the below February 2019 email from Tom Beach at Banner in response to our request to guarantee accuracy of the scale, and adaptive renumbering he performed at that time, and that we later used for the TMT, but did not correct/update in the LFQ traits. See below.

In a direct reply to your RFI, I will attach the full trait comparison with Banner IDs mapped to both LFQ and TMT batch runNumber/channel and corresponding CERAD. The discrepancies should make sense given the below logic.

Sorry we did not go back and amend the traits for the LFQ at the time.

May I suggest Mette, and the clinician scientists (Jim and Erik, cc:) confer on how best to address the LFQ traits? For reproducibility, the 1991 scoring used for correlations with the LFQ data should probably be retained, but displayed alongside the updated CERAD scores consistent with quantitative plaque density.

Sincerely,

Eric

The files he attached (PDF explaining CERAD scores and mapping file) are in the Staging folder: https://www.synapse.org/#!Synapse:syn26403225.

@avanlinden
Copy link
Collaborator Author

I saved the forwarded 2019 email thread from Erik Johnson and Thomas Beach explaining the CERAD changes as a pdf and uploaded it here (too long): https://www.synapse.org/#!Synapse:syn26403241.

@avanlinden avanlinden added the curation issue related to curation or cleaning of AD portal data label Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
curation issue related to curation or cleaning of AD portal data
Projects
None yet
Development

No branches or pull requests

2 participants