Scirpy side initiation #1

vd-dragan21 · 2024-11-25T12:08:21Z

Understanding ir.datasets.vdjdb() and Scirpy's Data Needs

Go through scirpy documentation
Go through Jupyter Notebook from Francesca
Have a look at GitHub issues:
Hackathon idea for collab issue ‘23: Hackathon: add generic_ir_from_biocypher() function to ingest TCR data scverse/scirpy#404 (comment)
Scirpy needs more databases issue ‘21-’24: Add more epitope databases to datasets scverse/scirpy#308
Read the preprint on TCR specificity methods: https://www.biorxiv.org/content/10.1101/2024.10.26.620398v1
Meeting with Gregor
Document requirements from scirpy side

The text was updated successfully, but these errors were encountered:

vd-dragan21 · 2024-11-26T09:19:43Z

Gregor call sum up:

There are a lot of databases of interest, that ideally all should be merged but the most important are VDJDB and IEDB:
AIT, huARdb, McPAS-TCR, PIRD, Adaptive Biotechnology, SARS-CoV-2 resources, VDJDB, and IEDB: https://github.com/scverse/scirpy/issues/308
"Real data curation effort" from the following issue is about a database bug and can be ignored now: https://github.com/scverse/scirpy/issues/404#issuecomment-1888578330
Current approach: download the database and save in the cache
Wanted approach: accessing a non-static database (KG?) made of several merged databases
Desired output: python dictionary/json file, Gregor will convert it to the AnnData himself
There is a banch of Metadata but the most important ones for now are: species and genes for filtering: https://docs.airr-community.org/en/stable/datarep/rearrangements.html#fields
The preprint paper recommended by Gregor (https://www.biorxiv.org/content/10.1101/2024.10.26.620398v1) is not very useful. They merged 3 databases statically in a table. Yes, we are interested in these dbs but not the static ones:
"The dataset used in this analysis is based on a combination of three publically available databases:
Immune Epitope DataBase (IEDB)[16], VDJ database (VDJdb)[17] and the manually curated
catalogue of pathology associated T-cell receptor sequences (McPAS-TCR)[18], with the data being
collected as of March 2023."

...Mb have a look at the code for merging/scoring later: https://github.com/i3-unit/TCR_Unsupervised_Benchmark

Gregor is not actively working on the project but works on scirpy in his free time, so that is why there won't be any regular updates/meetings with him but he is always happy to help with certain questions

Sum sum up of the project goal:
Public BioCypher API that one can query (Realistic goal: 2 databases: VDJDB and IEDB and 2 metadata objects: species and genes)

vd-dragan21 added the Project Proposal label Nov 25, 2024

vd-dragan21 self-assigned this Nov 25, 2024

vd-dragan21 closed this as completed Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scirpy side initiation #1

Scirpy side initiation #1

vd-dragan21 commented Nov 25, 2024 •

edited

Loading

vd-dragan21 commented Nov 26, 2024 •

edited

Loading

Scirpy side initiation #1

Scirpy side initiation #1

Comments

vd-dragan21 commented Nov 25, 2024 • edited Loading

Understanding ir.datasets.vdjdb() and Scirpy's Data Needs

vd-dragan21 commented Nov 26, 2024 • edited Loading

vd-dragan21 commented Nov 25, 2024 •

edited

Loading

vd-dragan21 commented Nov 26, 2024 •

edited

Loading