Skip to content

Commit

Permalink
aggregate titers across sera (#8)
Browse files Browse the repository at this point in the history
* upgrade to `neutcurve 0.10.1

* all serum titers adjusted to pass QC

* mostly complete `aggregate_titers`

* complete `aggregate_titers`

* document aggregate titers

* lint and format

* save all results in test Github Actions artifact

* update config to pass tests and plot QC failure in `serum_titers` in a clearer-to-look-at order
  • Loading branch information
jbloom authored Dec 14, 2023
1 parent 7098442 commit 2475a95
Show file tree
Hide file tree
Showing 12 changed files with 466 additions and 25 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,5 +64,5 @@ jobs:
if: failure()
with:
name: log-files
path: test_example/results/logs
path: test_example/results
retention-days: 7
16 changes: 13 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ The titers are computed by fitting Hill-curve style neutralization curves using
The titers are summarized by the neutralization titer 50% (NT50), which is the serum dilution factor at which the serum neutralizations half of the viral infectivity.
So a NT50 of 200 means that at a 1:200 dilution of the serum, half the viral infectivity is neutralized.
Note that the NT50 is the reciprocal of the IC50 (concentration of serum at which 50% of viral infectivity is neutralized).
Typically there will be multiple replicates, and the reported titer is the median among replicates.

Note also there are several quality control steps with thresholds and exclusions specified in the configuration YAML (see below), and the pipeline will only run to completion once all quality-control is passed.

## Using this pipeline
This pipeline is designed to be included as a modular portion of a larger [snakemake](https://snakemake.readthedocs.io/) analysis.
Expand Down Expand Up @@ -367,9 +370,10 @@ Under each virus, you can set `ignore_qc: true` if you simply want to ignore any

If you want to exclude specific replicates, instead under the virus key set `replicates_to_drop` to be a name of the replicate for that serum-virus as named in `serum_titers`.

## Output of the pipeline
## Results of running the pipeline
The results of running the pipeline are put in the `./results/` subdirectory of your main repo.
We recommend using the `.gitignore` file in [./test_example/.gitignore] in your main repo to only track key results in your GitHub repo.
The key results file if the pipeline runs to completion in `./results/aggregated_titers/titers.csv`.
The set of full created outputs are as follows (note only some will be tracked depending on your `.gitignore`):

- Outputs related to barcode counting:
Expand All @@ -387,7 +391,7 @@ The set of full created outputs are as follows (note only some will be tracked d
- Outputs related to fitting the neutralization curves for each plate:
- `./results/plates/{plate}/curvefits.csv`: the neutralization curve fits to each serum on each plate, including the NT50s. You should track this in repo.
- `./results/plates/{plate}/curvefits.pdf`: PDF rendering the neutralization curves for the plate. You do not need to track this in the repo as a HTML version of a notebook containing the plot is tracked in `./docs/`.
- `./results/plates/{plate}/curvefits.pickle`: pickle files with the `neutcurve.CurveFits` object for the plate. You do not need to track this in the repo as both the plots and numerical data are rendered elsewhere.
- `./results/plates/{plate}/curvefits.pickle`: pickle file with the `neutcurve.CurveFits` object for the plate. You do not need to track this in the repo as both the plots and numerical data are rendered elsewhere.
- `./results/plates/{plate}/curvefits_{plate}.ipynb`: Jupyter notebook that does the curve fitting. You do not need to track this in the repo as a HTML version of the notebook is tracked in `./docs/`.
- `./results/plates/{plate}/curvefits_{plate}.html`: HTML rendering of Jupyter notebook that does the curve fitting. You do not need to track this in the repo as it will be rendered in `./docs/` when the pipeline runs successfully.

Expand All @@ -396,10 +400,16 @@ The set of full created outputs are as follows (note only some will be tracked d
- `./results/sera/{serum}/titers_median.csv`: titer for each virus against the serum, reported as the median across replicates. You should track this file in the repo.
- `./results/sera/{serum}/titers_per_replicate.csv`: titers for each replicate of each virus against the serum. You should track this file in the repo.
- `./results/sera/{serum}/curves.pdf`: PDF rendering of the neutralization curves for the serum. You do not need to track this in the repo as a HTML version of a notebook containing the plots is tracked in `./docs/`.
- `./results/sera/{serum}/curvefits.pickle`: pickle files with the `neutcurve.CurveFits` object for this serum, after applying QC filters. You do not need to track this in the repo as both the plots and numerical data are rendered elsewhere.
- `./results/sera/{serum}/curvefits.pickle`: pickle file with the `neutcurve.CurveFits` object for this serum, after applying QC filters. You do not need to track this in the repo as both the plots and numerical data are rendered elsewhere.
- `./results/sera/{serum}/serum_titers_{serum}.ipynb`: Jupyter notebook that aggregates titers for a serum across all plates. You do not need to track this in the repo as a HTML version of the notebook is tracked in `./docs/`.
- `./results/sera/{serum}/serum_titers_{serum}.html`: HTML rendering of the Jupyter notebook that aggregates titers for a serum across all plates. You do not need to track this in the repo as it will be rendered in `./docs/` when the pipeline runs successfully.

- Results related to aggregated titers across all sera after applying all quality control:
- `./results/aggregated_titers/titers.csv`: titers for all sera / virus (median of replicates). You should track this file as it has the final processed results.
- `./results/aggregated_titers/curvefits.pickle`: pickle file with the `neutcurve.CurveFits` object holding all final curves. You do not need to track this in the repo, but if you have further code that makes specific plots you may want to use this.
- `./results/aggregated_titers/titers.html`: interactive plot of titers for all sera. You do not need to track this in the repo as it is rendered in `./docs/` when the pipeline runs successfully.
- `./results/aggregated_titers/aggregate_titers.ipynb`: Jupyter notebook that aggregates all the titers. You do not need to track this in the repo.

- `./logs/`: logs from `snakemake` rules, you may want to look at these if there are rule failures. They do not need to be tracked in the repo.

## Running pipeline to identify QC failures and fixing them
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ dependencies:
- ruff
- pip:
- dms_variants==1.4.3
- neutcurve==0.10.0
- neutcurve==0.10.1
232 changes: 232 additions & 0 deletions notebooks/aggregate_titers.py.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "abe395dc-8a57-4ee8-b22e-8a1acacf1619",
"metadata": {},
"source": [
"# Aggregate titers across all sera\n",
"Aggregate the titers across all sera, failing if there are QC failures for any individual sera."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f404cc02-6d9a-4a3a-892e-fd73f1e00b52",
"metadata": {},
"outputs": [],
"source": [
"import pickle\n",
"\n",
"import altair as alt\n",
"\n",
"import neutcurve\n",
"\n",
"import pandas as pd\n",
"\n",
"_ = alt.data_transformers.disable_max_rows()"
]
},
{
"cell_type": "markdown",
"id": "be4112bc-7d79-4b38-8e76-ec81588f40e9",
"metadata": {},
"source": [
"Get variables from `snakemake`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83425f55-9905-4cc0-b42c-440926e9a81e",
"metadata": {},
"outputs": [],
"source": [
"qc_failures_file = snakemake.input.qc_serum_titer_failures\n",
"input_pickles = snakemake.input.pickles\n",
"input_titers = snakemake.input.titers\n",
"output_pickle = snakemake.output.pickle\n",
"output_titers = snakemake.output.titers\n",
"titers_chart_html = snakemake.output.titers_chart\n",
"viral_strain_plot_order = snakemake.params.viral_strain_plot_order"
]
},
{
"cell_type": "markdown",
"id": "d3668ae9-ad6d-436e-8e7b-9a0cd3b30548",
"metadata": {},
"source": [
"Check for quality control failures for any individual sera:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1fbc6445-bbd0-4d7e-b50e-a861e03c06b6",
"metadata": {},
"outputs": [],
"source": [
"with open(qc_failures_file) as f:\n",
" qc_failures = f.readlines()\n",
"if not all(line.strip().endswith(\"serum passed all QC\") for line in qc_failures):\n",
" raise ValueError(\n",
" f\"QC failures for some serum titers. See {qc_failures_file}:\\n\\n\"\n",
" + \"\".join(qc_failures)\n",
" )\n",
"else:\n",
" print(\"All serum titers pass the QC filters.\")"
]
},
{
"cell_type": "markdown",
"id": "7a89d9f5-a789-454b-9ec6-3186a5d55c7b",
"metadata": {},
"source": [
"Get the merged titers and merged `CurveFits` object:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dda8dec6-8981-4f10-9fe2-afc35066929e",
"metadata": {},
"outputs": [],
"source": [
"assert len(input_titers) == len(input_pickles)\n",
"\n",
"titers = pd.concat([pd.read_csv(f) for f in input_titers], ignore_index=True)\n",
"assert len(titers) == len(titers.groupby([\"serum\", \"virus\"]))\n",
"print(f\"Writing aggregated titers to {output_titers}\")\n",
"titers.to_csv(output_titers, index=False, float_format=\"%.4g\")\n",
"\n",
"fits_list = []\n",
"for fname in input_pickles:\n",
" with open(fname, \"rb\") as f:\n",
" fits_list.append(pickle.load(f))\n",
"curvefits = neutcurve.CurveFits.combineCurveFits(fits_list)\n",
"print(f\"Pickling aggregated `CurveFits` to {output_pickle}\")\n",
"with open(output_pickle, \"wb\") as f:\n",
" pickle.dump(curvefits, f)"
]
},
{
"cell_type": "markdown",
"id": "e194786d-c227-403f-bbf9-d8ab3b19b502",
"metadata": {},
"source": [
"Plot all the titers:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb2c2a97-e54a-4875-bc75-d57975017565",
"metadata": {},
"outputs": [],
"source": [
"viruses = [v for v in viral_strain_plot_order if v in curvefits.allviruses]\n",
"\n",
"sera = curvefits.sera\n",
"\n",
"virus_selection = alt.selection_point(fields=[\"virus\"], on=\"mouseover\", empty=False)\n",
"\n",
"serum_selection = alt.selection_point(\n",
" fields=[\"serum\"],\n",
" bind=\"legend\",\n",
" toggle=\"true\",\n",
")\n",
"\n",
"ncols = 8\n",
"\n",
"titers_chart = (\n",
" alt.Chart(titers)\n",
" .add_params(virus_selection, serum_selection)\n",
" .transform_filter(serum_selection)\n",
" .encode(\n",
" alt.X(\n",
" \"nt50\",\n",
" title=\"neutralization titer\",\n",
" scale=alt.Scale(nice=False, padding=4, type=\"log\"),\n",
" axis=alt.Axis(labelOverlap=True),\n",
" ),\n",
" alt.Y(\"virus\", sort=viruses),\n",
" alt.Facet(\n",
" \"serum\",\n",
" header=alt.Header(\n",
" title=None, labelFontSize=11, labelFontStyle=\"bold\", labelPadding=0\n",
" ),\n",
" spacing=3,\n",
" columns=ncols,\n",
" ),\n",
" alt.StrokeWidth(\n",
" \"serum:N\",\n",
" scale=alt.Scale(domain=sera, range=[1] * len(sera)),\n",
" legend=alt.Legend(\n",
" orient=\"bottom\",\n",
" columns=ncols,\n",
" symbolFillColor=\"black\",\n",
" title=\"serum (click to select)\",\n",
" ),\n",
" ),\n",
" color=alt.condition(virus_selection, alt.value(\"red\"), alt.value(\"black\")),\n",
" tooltip=[\n",
" \"serum\",\n",
" \"virus\",\n",
" alt.Tooltip(\"nt50\", title=\"NT50\", format=\".3g\"),\n",
" \"n_replicates\",\n",
" ],\n",
" )\n",
" .mark_line(point=True)\n",
" .configure_axis(grid=False)\n",
" .configure_point(size=45)\n",
" .properties(\n",
" height=alt.Step(11),\n",
" width=100,\n",
" title=alt.TitleParams(\n",
" \"Interactive chart of serum neutralization titers\",\n",
" subtitle=\"Mouseover points for details, click serum legend at bottom to select sera to show\",\n",
" fontSize=15,\n",
" dx=100,\n",
" dy=-5,\n",
" ),\n",
" autosize=alt.AutoSizeParams(resize=True),\n",
" )\n",
")\n",
"\n",
"print(f\"Saving chart to {titers_chart_html}\")\n",
"titers_chart.save(titers_chart_html)\n",
"\n",
"titers_chart"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "27ff41ab-c967-42a9-adb1-5856810c7f34",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
29 changes: 19 additions & 10 deletions notebooks/serum_titers.py.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -239,12 +239,12 @@
"per_rep_chart = (\n",
" alt.Chart(per_rep_titers)\n",
" .encode(\n",
" alt.X(\"virus\", sort=viruses),\n",
" alt.Y(\n",
" alt.X(\n",
" \"nt50\",\n",
" title=\"neutralization titer\",\n",
" scale=alt.Scale(nice=False, padding=5, type=\"log\"),\n",
" ),\n",
" alt.Y(\"virus\", sort=viruses),\n",
" alt.Shape(\"nt50_bound\", title=\"is titer bound?\"),\n",
" strokeWidth=alt.condition(virus_selection, alt.value(2), alt.value(0)),\n",
" tooltip=[\n",
Expand All @@ -266,12 +266,12 @@
"median_chart = (\n",
" alt.Chart(median_titers)\n",
" .encode(\n",
" alt.X(\"virus\", sort=viruses),\n",
" alt.Y(\n",
" alt.X(\n",
" \"nt50\",\n",
" title=\"neutralization titer\",\n",
" scale=alt.Scale(nice=False, padding=5, type=\"log\"),\n",
" ),\n",
" alt.Y(\"virus\", sort=viruses),\n",
" alt.Shape(\"nt50_bound\", title=\"is titer bound?\"),\n",
" strokeWidth=alt.condition(virus_selection, alt.value(2), alt.value(0)),\n",
" tooltip=[\n",
Expand All @@ -294,8 +294,8 @@
" (per_rep_chart + median_chart)\n",
" .add_params(virus_selection)\n",
" .properties(\n",
" width=alt.Step(14),\n",
" height=200,\n",
" height=alt.Step(14),\n",
" width=250,\n",
" title=f\"{serum} median (orange) and per-replicate (blue) titers\",\n",
" )\n",
" .configure_axis(grid=False)\n",
Expand All @@ -321,9 +321,14 @@
"source": [
"qc_failures = []\n",
"\n",
"insufficient_replicates = median_titers[\n",
" median_titers[\"n_replicates\"] < qc_thresholds[\"min_replicates\"]\n",
"].query(\"virus not in @viruses_ignore_qc\")\n",
"insufficient_replicates = (\n",
" median_titers.sort_values(\"virus\", key=lambda s: s.map(lambda v: viruses.index(v)))\n",
" .reset_index(drop=True)[\n",
" median_titers[\"n_replicates\"] < qc_thresholds[\"min_replicates\"]\n",
" ]\n",
" .query(\"virus not in @viruses_ignore_qc\")\n",
" .drop(columns=\"serum\")\n",
")\n",
"if len(insufficient_replicates):\n",
" print(f\"The following viruses fail {qc_thresholds['min_replicates']=}\")\n",
" display(insufficient_replicates)\n",
Expand All @@ -343,7 +348,11 @@
" | (x[\"fold_change_from_median\"] < 1 / max_fold_change_from_median)\n",
" ),\n",
" )\n",
" .query(\"excess_fold_change and (virus not in @viruses_ignore_qc)\")\n",
" .sort_values(\"virus\", key=lambda s: s.map(lambda v: viruses.index(v)))\n",
" .reset_index(drop=True)\n",
" .query(\"excess_fold_change and (virus not in @viruses_ignore_qc)\")[\n",
" [\"virus\", \"replicate\", \"median_nt50\", \"nt50\", \"fold_change_from_median\"]\n",
" ]\n",
")\n",
"if len(excess_fold_change_from_median):\n",
" print(f\"\\nThese replicates fail {qc_thresholds['max_fold_change_from_median']=}\")\n",
Expand Down
Loading

0 comments on commit 2475a95

Please sign in to comment.