Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file of predictors (or notebook that generates such a file) #12

Open
jbloom opened this issue Feb 16, 2024 · 2 comments
Open

file of predictors (or notebook that generates such a file) #12

jbloom opened this issue Feb 16, 2024 · 2 comments

Comments

@jbloom
Copy link
Collaborator

jbloom commented Feb 16, 2024

@marlinfiggins, this relates to our discussion of generating predictors for the model.

I propose to write a notebook that will collate various predictors. I propose it will output two files.

The first will be a mutation-effect file, and will have the following columns:

  • study
  • phenotype
  • parental amino acid
  • site
  • mutant amino acid
  • effect

The second will be a per-Pango-clade file, and will have the following columns:

  • study
  • phenotype
  • Pango clade
  • predicted phenotype
  • mutations lacking data (are there mutations in the Pango clade that lack data)

Does this sound good? Is there any other information that would be needed.

For the per-Pango-clade file, I would propose that you can pass a parameter that specifies the reference clade, and predicted phenotypes are calculated with respect to that reference clade.

@marlinfiggins
Copy link

This seems to be exactly what we need. In the future, I think it would be useful to have a separate script for creating the per-Pango-clade file within this workflow as well. I imagine that we would to update as new clades are added, but I can do this step later based on your notebook.

@jbloom
Copy link
Collaborator Author

jbloom commented Feb 26, 2024

@marlinfiggins, I've just created a GitHub repo that should do this: https://github.com/jbloomlab/SARS2-spike-predictor-phenos

My suggestion would be that you make this a git submodule in your pipeline, perhaps?

Look it over and raise issues if you can't understand it or it isn't doing the correct thing. If it works for your purpose, feel free to close this issue. I am also going to post to Slack channel re this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants