Updated wastewater docs #56

mindoftea · 2024-07-11T14:39:25Z

Merging in Sarah's work

mindoftea

This is looking really good. Things for us to do, besides these comments:

remove the build files from the repo and get our github action working
merge in my coverage-handling code (I'll update figE to match)
move the fig notebooks into the docs
publish it as version 2.0.0

mindoftea · 2024-07-11T14:42:04Z

src/outbreak_tools/outbreak_clustering.py

@@ -89,11 +97,13 @@ def get_descendants(node):
    """Get the set of all descendants of some node."""
    return set(node['children']) | set.union(*[get_descendants(c) for c in node['children']]) if len(node['children']) > 0 else set([])

-def gather_groups(clusters, prevalences, count_scores = tuple([0.1, 4, 4, 4, 0.1] + [0] * 256)):
+def (clusters, prevalences, count_scores = tuple([0.1, 4, 4, 4, 0.1] + [0] * 256)):


I think the function name got accidentally deleted here

mindoftea · 2024-07-11T15:11:19Z

docs/source/index.rst

+
+In contrast, wastewater samples have been highly useful for tracking regional infection dynamics while providing less biased abundance estimates than clinical testing. Data collected by tracking viral genomic sequences in wastewater has also improved community prevalence estimates and detects emerging variants earlier on. 
+
+The Andersen Lab has developed improved virus concentration protocols and deconvolution software that fully resolve multiple virus strains from wastewater. The resulting data is now deployed by Python-outbreak-info. In short, SARS-Cov-2 analysis can be done using both clinical and wastewater tools, yet data from the wastewater analysis tools may be more accurate in some situations.


This looks great. The only thing that I think needs a little more explanation is that with clinical genomics, each sample is one sequence, so we can see if two mutations occur frequently together etc, while with wastewater each sample is a mix of sequences, so we don't know which mutations go with which variants exactly. Basically, just reminding people that they might need some clinical data to answer co-occurence questions.

mindoftea · 2024-07-11T15:12:21Z

docs/source/index.rst

-   from outbreak_data import authenticate_user
-   authenticate_user.authenticate_new_user()
+    from outbreak_data.authenticate_user import authenticate_new_user
+    authenticate_new_user()

 and then you should be able access all of the functionality of the package. Most of the rest of the tools are available within the ``outbreak_data`` component of the package. For example: 


It would be good to add a note here that authentication is not required for wastewater data.

mindoftea · 2024-07-11T15:13:20Z

docs/source/index.rst


-   This project is under active development.
+Table of Contents:


Here, in the sidebar, and on the title of each page, it would be good to make clear what submodule each function is in.

mindoftea · 2024-07-11T15:17:35Z

docs/source/auth_setup.rst

@@ -1,4 +1,4 @@
-authenticate_new_user()
+authenticate_new_user
 ----------------------------------------------------



Some text about this being only needed for clinical, that access to a web browser is needed, and that the token is saved locally between runs would be useful.

mindoftea · 2024-07-11T16:19:16Z

tests/figAB.ipynb

+   "source": [
+    "ww_prevalences = outbreak_tools.datebin_and_agg(ww_lineages, weights=outbreak_tools.get_ww_weights(ww_lineages), startdate=startdate, enddate=enddate, freq='7D', rolling=[1,4,1], log=False)\n",
+    "ww_prevalences_daily_unsmoothed = outbreak_tools.datebin_and_agg(ww_lineages, weights=outbreak_tools.get_ww_weights(ww_lineages), startdate=startdate, enddate=enddate, freq='D', rolling=1, log=False)\n",
+    "ww_prevalences_daily, ww_prevalences_daily_varis = outbreak_tools.datebin_and_agg(ww_lineages, weights=outbreak_tools.get_ww_weights(ww_lineages), startdate=startdate, enddate=enddate, freq='D', rolling=smooth, log=False, variance=True)"


We don't actually use the variances for these plots, so we can set variance=False and delete parts of this line and others about the "_varis" to simplify

mindoftea · 2024-07-11T17:25:00Z

tests/figC.ipynb

+    "ww_prev_data = ww_prevalences.mul(viral_load_weekly, axis=0).sum()\n",
+    "clinical_prev_data = clinical_prevalences.mul(viral_load_weekly, axis=0).sum()\n",
+    "\n",
+    "ww_clusters = outbreak_clustering.cluster_lineages(ww_prev_data, tree, lineage_key=lineage_key, n=10, alpha=0.25)\n",


Similarly to figAB, let's simplify by getting rid of the viral load info from this notebook and just clustering on ww_prevalences.sum(). We can use one set of clusters for both ww and clinical

mindoftea · 2024-07-11T17:25:51Z

tests/figC.ipynb

+   "metadata": {},
+   "outputs": [
+    {
+     "data": {


Let's add a scatter plot of the daily unsmoothed data on top of this

mindoftea · 2024-07-11T17:27:19Z

tests/figD.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next we'll need to fetch and aggregate the viral load sample data to get our prevalence data. "


We can drop the viral load code from this notebook too

mindoftea · 2024-07-11T17:32:14Z

tests/figE.ipynb

+   "id": "5e2ed603-3042-49ca-865e-a823287bdeb8",
+   "metadata": {},
+   "source": [
+    "Now we go ahead and query for wastewater data using our defined specifications. After this, we'll need to organize our retrieved sample data by date and site_id within our specified region to get the viral load smaple data."


Small typo here. Also good to explain that we're filtering out viral load information from sites with few samples, and then normalizing each site's viral load signals to have a variance of 1.

…tbreak-info into new_docs

…tbreak-info into new_docs Unfinished docs

mindoftea and others added 30 commits April 2, 2024 09:36

updating to new indices; mutations; tests

0dd3c0c

all_lineage_prevalences server arg fix

11d1989

updating for new indices

3e0230e

adding demix_success filter

71f878e

local agg helpers; consistency fixes

341a368

ww visualization support; lineage clustering

22922d7

clustering reusability

73c9470

maps etc

aff01ed

vis refinements

2ef0c97

improve combined vis

8b63f17

small refactors and fixes

8c587c0

major refactor, bugfixes

f31bfa6

small import fixes

ea5e84c

compressed tree support

385842b

docstrings and bugfixes

f97daf8

ww and clinial example graphs

70f6a9c

local merge

17911d1

ww and clinical graphs

9dbb9ea

interface consistency; more bugfixes and docstrings

fc6fe2f

pyodide build

2271759

clustering modularity; more pyodide

d1ac8db

stacked stacked plots

56af094

small ux improvements

52c51e1

index changes

e291031

aggregation features

a4be2d2

aggregation refactor

669efe1

improves aggregation date handling

1362976

typo

70a77df

updated docs

475d997

new directory

c25f28b

srandall02 added 8 commits June 4, 2024 22:34

fixed typos

fed48d5

doc fixes

8b20016

doc typo fixes

972d41c

changed dir

840c5a4

outbreak_tools

48e3a1f

more pages, figures

7848dfd

rm DS files

c501f16

example workflows

b927cd0

mindoftea commented Jul 11, 2024

View reviewed changes

mindoftea assigned srandall02 Jul 11, 2024

srandall02 and others added 5 commits July 19, 2024 00:06

Added new example notebooks

2e9081a

doc revisions

a609372

docs revisions

81531da

doc page fixes

2e623c4

lots of little fixes

8cbe0f7

mindoftea changed the base branch from wastewater_sprint_2 to main July 31, 2024 20:47

srandall02 and others added 14 commits August 22, 2024 13:51

declutter and minor edits

a2d75fd

Merge branch 'new_docs' of https://github.com/outbreak-info/python-ou…

a40bd99

…tbreak-info into new_docs

dereference crumbs queries

af4df62

crumbs helper

3baf7a4

query ww by ID list

abfe0aa

more helper functions etc

7dd50d4

small fixes and notebook improvements

64f3787

fancy clustering

0a45e6e

unfinished pages

c746939

unfinished docs

59f8eb4

Merge branch 'new_docs' of https://github.com/outbreak-info/python-ou…

b5d0191

…tbreak-info into new_docs Unfinished docs

cleaning up and merging Sarah's work

8f03f8e

cleaning up

d68d407

one more tidying commit for git

43ad155

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated wastewater docs #56

Updated wastewater docs #56

mindoftea commented Jul 11, 2024

mindoftea left a comment

mindoftea Jul 11, 2024

mindoftea Jul 11, 2024

mindoftea Jul 11, 2024

mindoftea Jul 11, 2024

mindoftea Jul 11, 2024

mindoftea Jul 11, 2024

mindoftea Jul 11, 2024

mindoftea Jul 11, 2024

mindoftea Jul 11, 2024

mindoftea Jul 11, 2024


		In contrast, wastewater samples have been highly useful for tracking regional infection dynamics while providing less biased abundance estimates than clinical testing. Data collected by tracking viral genomic sequences in wastewater has also improved community prevalence estimates and detects emerging variants earlier on.

		The Andersen Lab has developed improved virus concentration protocols and deconvolution software that fully resolve multiple virus strains from wastewater. The resulting data is now deployed by Python-outbreak-info. In short, SARS-Cov-2 analysis can be done using both clinical and wastewater tools, yet data from the wastewater analysis tools may be more accurate in some situations.

Updated wastewater docs #56

Are you sure you want to change the base?

Updated wastewater docs #56

Conversation

mindoftea commented Jul 11, 2024

mindoftea left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment