Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing CellO Evaluation on Test Datasets #1

Open
Qamar-Alissa opened this issue May 19, 2022 · 0 comments
Open

Reproducing CellO Evaluation on Test Datasets #1

Qamar-Alissa opened this issue May 19, 2022 · 0 comments

Comments

@Qamar-Alissa
Copy link

Hello,

I am currently doing my bachelor thesis with the goal of preventing the repeated retraining of the tool you have developed for cell classification - CellO - using some imputation methods. I am having some issues reproducing the results of the paper using the code of the evaluation you have provided for testing CellO on Zheng_PBMC, the lung cancer and the non-droplet datasets (i.e. to get the F1-scores and avg. precision for these datasets).

I downloaded the whole datasets from https://zenodo.org/record/4289064#.YoOS-FTP2Uk and the repository for running the evaluation from https://github.com/deweylab/cell-type-classification-paper.git. However, I could not understand how the code in the cell-type-classification-paper.git relates to the datasets files. In the Snakefiles there are some files required as an input or referenced.
I would have thought they could be found in the datasets files but they are not or are named differently. As an example, I could find bulk_labels.json in the dataset, but only references to labels.json in the code. There are references to expriment_to_study.json or untampered_bulk_primary_cells_with_data which I was also unable to identify.

I was also unsure which python scripts in the cell-type-classification-paper.git repository to run first. I would have thought to run the train_model.py first but as mentioned it required the labels.json and experiment_to_study.jsons files that do not exist in the provided datasets files.

Thus I would be grateful if you could help me with the following issues:

  1. Which scripts should be run in order to reproduce the evaluation on the above mentioned three datasets?

  2. In one of the python scripts in the cell-type-classification-paper.git repository, I found a dictionary for mapping the cell labels used by Zheng_PBMC dataset into the Cell Ontology labels. However, I could not find similar mapping dictionaries for the lung cancer and the non-droplet datasets. Do these exist? If so, I would be very gratful to get the cell types mapping for the remaining two datasets.

  3. I tried to run CellO on the Zheng_PBMC.h5 from a jupyter notbook but it did not work because CellO expected it to be an AnnData (h5ad file). When I run it with command line it worked. Do you have any advice on how to use files with fomats other than h5ad format from jupyter notebook?

I would be very thankful for a quick response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant