Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scirpy side initiation #1

Closed
6 tasks done
vd-dragan21 opened this issue Nov 25, 2024 · 1 comment
Closed
6 tasks done

Scirpy side initiation #1

vd-dragan21 opened this issue Nov 25, 2024 · 1 comment
Assignees

Comments

@vd-dragan21
Copy link
Collaborator

vd-dragan21 commented Nov 25, 2024

Understanding ir.datasets.vdjdb() and Scirpy's Data Needs

@vd-dragan21
Copy link
Collaborator Author

vd-dragan21 commented Nov 26, 2024

Gregor call sum up:

  1. There are a lot of databases of interest, that ideally all should be merged but the most important are VDJDB and IEDB:
    AIT, huARdb, McPAS-TCR, PIRD, Adaptive Biotechnology, SARS-CoV-2 resources, VDJDB, and IEDB: https://github.com/scverse/scirpy/issues/308

  2. "Real data curation effort" from the following issue is about a database bug and can be ignored now: https://github.com/scverse/scirpy/issues/404#issuecomment-1888578330

  3. Current approach: download the database and save in the cache

  4. Wanted approach: accessing a non-static database (KG?) made of several merged databases

  5. Desired output: python dictionary/json file, Gregor will convert it to the AnnData himself

  6. There is a banch of Metadata but the most important ones for now are: species and genes for filtering: https://docs.airr-community.org/en/stable/datarep/rearrangements.html#fields

  7. The preprint paper recommended by Gregor (https://www.biorxiv.org/content/10.1101/2024.10.26.620398v1) is not very useful. They merged 3 databases statically in a table. Yes, we are interested in these dbs but not the static ones:
    "The dataset used in this analysis is based on a combination of three publically available databases:
    Immune Epitope DataBase (IEDB)[16], VDJ database (VDJdb)[17] and the manually curated
    catalogue of pathology associated T-cell receptor sequences (McPAS-TCR)[18], with the data being
    collected as of March 2023.
    "

...Mb have a look at the code for merging/scoring later: https://github.com/i3-unit/TCR_Unsupervised_Benchmark

  1. Gregor is not actively working on the project but works on scirpy in his free time, so that is why there won't be any regular updates/meetings with him but he is always happy to help with certain questions

Sum sum up of the project goal:
Public BioCypher API that one can query (Realistic goal: 2 databases: VDJDB and IEDB and 2 metadata objects: species and genes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant