-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated wastewater docs #56
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good. Things for us to do, besides these comments:
- remove the build files from the repo and get our github action working
- merge in my coverage-handling code (I'll update figE to match)
- move the fig notebooks into the docs
- publish it as version 2.0.0
@@ -89,11 +97,13 @@ def get_descendants(node): | |||
"""Get the set of all descendants of some node.""" | |||
return set(node['children']) | set.union(*[get_descendants(c) for c in node['children']]) if len(node['children']) > 0 else set([]) | |||
|
|||
def gather_groups(clusters, prevalences, count_scores = tuple([0.1, 4, 4, 4, 0.1] + [0] * 256)): | |||
def (clusters, prevalences, count_scores = tuple([0.1, 4, 4, 4, 0.1] + [0] * 256)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function name got accidentally deleted here
docs/source/index.rst
Outdated
|
||
In contrast, wastewater samples have been highly useful for tracking regional infection dynamics while providing less biased abundance estimates than clinical testing. Data collected by tracking viral genomic sequences in wastewater has also improved community prevalence estimates and detects emerging variants earlier on. | ||
|
||
The Andersen Lab has developed improved virus concentration protocols and deconvolution software that fully resolve multiple virus strains from wastewater. The resulting data is now deployed by Python-outbreak-info. In short, SARS-Cov-2 analysis can be done using both clinical and wastewater tools, yet data from the wastewater analysis tools may be more accurate in some situations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. The only thing that I think needs a little more explanation is that with clinical genomics, each sample is one sequence, so we can see if two mutations occur frequently together etc, while with wastewater each sample is a mix of sequences, so we don't know which mutations go with which variants exactly. Basically, just reminding people that they might need some clinical data to answer co-occurence questions.
from outbreak_data import authenticate_user | ||
authenticate_user.authenticate_new_user() | ||
from outbreak_data.authenticate_user import authenticate_new_user | ||
authenticate_new_user() | ||
|
||
and then you should be able access all of the functionality of the package. Most of the rest of the tools are available within the ``outbreak_data`` component of the package. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add a note here that authentication is not required for wastewater data.
|
||
This project is under active development. | ||
Table of Contents: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, in the sidebar, and on the title of each page, it would be good to make clear what submodule each function is in.
docs/source/auth_setup.rst
Outdated
@@ -1,4 +1,4 @@ | |||
authenticate_new_user() | |||
authenticate_new_user | |||
---------------------------------------------------- | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some text about this being only needed for clinical, that access to a web browser is needed, and that the token is saved locally between runs would be useful.
tests/figAB.ipynb
Outdated
"source": [ | ||
"ww_prevalences = outbreak_tools.datebin_and_agg(ww_lineages, weights=outbreak_tools.get_ww_weights(ww_lineages), startdate=startdate, enddate=enddate, freq='7D', rolling=[1,4,1], log=False)\n", | ||
"ww_prevalences_daily_unsmoothed = outbreak_tools.datebin_and_agg(ww_lineages, weights=outbreak_tools.get_ww_weights(ww_lineages), startdate=startdate, enddate=enddate, freq='D', rolling=1, log=False)\n", | ||
"ww_prevalences_daily, ww_prevalences_daily_varis = outbreak_tools.datebin_and_agg(ww_lineages, weights=outbreak_tools.get_ww_weights(ww_lineages), startdate=startdate, enddate=enddate, freq='D', rolling=smooth, log=False, variance=True)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually use the variances for these plots, so we can set variance=False and delete parts of this line and others about the "_varis" to simplify
tests/figC.ipynb
Outdated
"ww_prev_data = ww_prevalences.mul(viral_load_weekly, axis=0).sum()\n", | ||
"clinical_prev_data = clinical_prevalences.mul(viral_load_weekly, axis=0).sum()\n", | ||
"\n", | ||
"ww_clusters = outbreak_clustering.cluster_lineages(ww_prev_data, tree, lineage_key=lineage_key, n=10, alpha=0.25)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to figAB, let's simplify by getting rid of the viral load info from this notebook and just clustering on ww_prevalences.sum(). We can use one set of clusters for both ww and clinical
tests/figC.ipynb
Outdated
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a scatter plot of the daily unsmoothed data on top of this
tests/figD.ipynb
Outdated
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Next we'll need to fetch and aggregate the viral load sample data to get our prevalence data. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop the viral load code from this notebook too
tests/figE.ipynb
Outdated
"id": "5e2ed603-3042-49ca-865e-a823287bdeb8", | ||
"metadata": {}, | ||
"source": [ | ||
"Now we go ahead and query for wastewater data using our defined specifications. After this, we'll need to organize our retrieved sample data by date and site_id within our specified region to get the viral load smaple data." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo here. Also good to explain that we're filtering out viral load information from sites with few samples, and then normalizing each site's viral load signals to have a variance of 1.
…tbreak-info into new_docs Unfinished docs
Merging in Sarah's work