You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@marlinfiggins, this relates to our discussion of generating predictors for the model.
I propose to write a notebook that will collate various predictors. I propose it will output two files.
The first will be a mutation-effect file, and will have the following columns:
study
phenotype
parental amino acid
site
mutant amino acid
effect
The second will be a per-Pango-clade file, and will have the following columns:
study
phenotype
Pango clade
predicted phenotype
mutations lacking data (are there mutations in the Pango clade that lack data)
Does this sound good? Is there any other information that would be needed.
For the per-Pango-clade file, I would propose that you can pass a parameter that specifies the reference clade, and predicted phenotypes are calculated with respect to that reference clade.
The text was updated successfully, but these errors were encountered:
This seems to be exactly what we need. In the future, I think it would be useful to have a separate script for creating the per-Pango-clade file within this workflow as well. I imagine that we would to update as new clades are added, but I can do this step later based on your notebook.
My suggestion would be that you make this a git submodule in your pipeline, perhaps?
Look it over and raise issues if you can't understand it or it isn't doing the correct thing. If it works for your purpose, feel free to close this issue. I am also going to post to Slack channel re this.
@marlinfiggins, this relates to our discussion of generating predictors for the model.
I propose to write a notebook that will collate various predictors. I propose it will output two files.
The first will be a mutation-effect file, and will have the following columns:
The second will be a per-Pango-clade file, and will have the following columns:
Does this sound good? Is there any other information that would be needed.
For the per-Pango-clade file, I would propose that you can pass a parameter that specifies the reference clade, and predicted phenotypes are calculated with respect to that reference clade.
The text was updated successfully, but these errors were encountered: