Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation #11

Merged
merged 8 commits into from
Dec 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,14 @@ and this project adheres to [Semantic Versioning][].
[keep a changelog]: https://keepachangelog.com/en/1.0.0/
[semantic versioning]: https://semver.org/spec/v2.0.0.html

## v1.0.0

- Update tutorials and docstrings of `.cond()` and `.contrast()` ([#11](https://github.com/scverse/formulaic-contrasts/pull/11))
- No other changes, but the API is considered stable now.

## v0.2.0

- Rename `FormulaicContrasts.design` to `FormulaicContrasts.design_matrix`
- Rename `FormulaicContrasts.design` to `FormulaicContrasts.design_matrix`

## v0.1.0

Expand Down
154 changes: 139 additions & 15 deletions docs/contrasts.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -251,13 +251,135 @@
"For instance, we could \n",
"investigate differences between responders and non-responders, independent of treatment by fitting the model \n",
"`~ response + treatment` and then comparing the category `\"responder\"` in the column `response` with the category `\"non_responder\"`.\n",
"This can be achieved using the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` method. "
"\n",
"Given the data frame from above and the model `~ response + treatment`, the design matrix contains the following distinct\n",
"entries, encoding the different combinations of response and drug. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Intercept</th>\n",
" <th>response[T.responder]</th>\n",
" <th>treatment[T.drugB]</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>70</th>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Intercept response[T.responder] treatment[T.drugB]\n",
"0 1.0 0 0\n",
"10 1.0 1 0\n",
"40 1.0 0 1\n",
"70 1.0 1 1"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from formulaic import model_matrix\n",
"\n",
"model_matrix(\"~ response + treatment\", df).drop_duplicates()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `response[T.responder]` column encodes `\"responder\"` as 1 and `\"non_responder\"` as 0. The \n",
"intercept is always 1 and the other column is irrelevant for our desired comparison. The entries a contrast vector \n",
"always correspond to the columns of the design matrix. We therefore need a contrast vector\n",
"that compares `(1, 1, 0)` vs. `(1, 0, 0)`:\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 0])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"\n",
"contrast = np.array((1, 1, 0)) - np.array((1, 0, 0))\n",
"contrast"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using formulaic-contrast's {func}`~formulaic_contrasts.FormulaicContrasts.cond` function, we can build the same\n",
"contrast vector by specifying the categories of interest:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -268,7 +390,7 @@
"Name: 0, dtype: float64"
]
},
"execution_count": 3,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -278,24 +400,21 @@
"\n",
"mod = FormulaicContrasts(df, \"~ response + treatment\")\n",
"\n",
"contrast = mod.contrast(\n",
" column=\"response\",\n",
" baseline=\"non_responder\",\n",
" group_to_compare=\"responder\",\n",
")\n",
"contrast = mod.cond(response=\"responder\") - mod.cond(response=\"non_responder\")\n",
"contrast"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is equivalent to the following {func}`~formulaic_contrasts.FormulaicContrasts.cond` call:"
"For this very common case of comparing two categories of the same variable, {func}`~formulaic_contrasts.FormulaicContrasts.contrast` \n",
"provides a convenient shortcut for building the same contrast:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"outputs": [
{
Expand All @@ -307,13 +426,18 @@
"Name: 0, dtype: float64"
]
},
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mod.cond(response=\"responder\") - mod.cond(response=\"non_responder\")"
"contrast = mod.contrast(\n",
" column=\"response\",\n",
" baseline=\"non_responder\",\n",
" group_to_compare=\"responder\",\n",
")\n",
"contrast"
]
},
{
Expand All @@ -328,7 +452,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"outputs": [
{
Expand All @@ -341,7 +465,7 @@
"Name: 0, dtype: float64"
]
},
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -368,7 +492,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"outputs": [
{
Expand All @@ -381,7 +505,7 @@
"Name: 0, dtype: float64"
]
},
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
Expand Down
18 changes: 9 additions & 9 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,11 +155,11 @@ This will automatically create a git tag and trigger a Github workflow that crea
Please write documentation for new or changed features and use-cases.
This project uses [sphinx][] with the following features:

- The [myst][] extension allows to write documentation in markdown/Markedly Structured Text
- [Numpy-style docstrings][numpydoc] (through the [napoloen][numpydoc-napoleon] extension).
- Jupyter notebooks as tutorials through [myst-nb][] (See [Tutorials with myst-nb](#tutorials-with-myst-nb-and-jupyter-notebooks))
- [sphinx-autodoc-typehints][], to automatically reference annotated input and output types
- Citations (like {cite:p}`Virshup_2023`) can be included with [sphinxcontrib-bibtex](https://sphinxcontrib-bibtex.readthedocs.io/)
- The [myst][] extension allows to write documentation in markdown/Markedly Structured Text
- [Numpy-style docstrings][numpydoc] (through the [napoloen][numpydoc-napoleon] extension).
- Jupyter notebooks as tutorials through [myst-nb][] (See [Tutorials with myst-nb](#tutorials-with-myst-nb-and-jupyter-notebooks))
- [sphinx-autodoc-typehints][], to automatically reference annotated input and output types
- Citations (like {cite:p}`Virshup_2023`) can be included with [sphinxcontrib-bibtex](https://sphinxcontrib-bibtex.readthedocs.io/)

See scanpy’s {doc}`scanpy:dev/documentation` for more information on how to write your own.

Expand All @@ -183,10 +183,10 @@ please check out [this feature request][issue-render-notebooks] in the `cookiecu

#### Hints

- If you refer to objects from other packages, please add an entry to `intersphinx_mapping` in `docs/conf.py`.
Only if you do so can sphinx automatically create a link to the external documentation.
- If building the documentation fails because of a missing link that is outside your control,
you can add an entry to the `nitpick_ignore` list in `docs/conf.py`
- If you refer to objects from other packages, please add an entry to `intersphinx_mapping` in `docs/conf.py`.
Only if you do so can sphinx automatically create a link to the external documentation.
- If building the documentation fails because of a missing link that is outside your control,
you can add an entry to the `nitpick_ignore` list in `docs/conf.py`

(docs-building)=

Expand Down
24 changes: 14 additions & 10 deletions docs/model_usage.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"for the use with `formulaic-contrasts`. The aim is to build a model that takes a pandas DataFrame and a formulaic formula as input\n",
"allows to fit the model to a continuous variable from the dataframe and perform a statistical test for a given contrast. \n",
"\n",
"This can be achived with the following class definition. The constructor, the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` and {func}`~formulaic_contrasts.FormulaicContrasts.cond` methods are inherited from the {class}`~formulaic_contrasts.FormulaicContrasts`\n",
"This can be achieved with the following class definition. The constructor, the {func}`~formulaic_contrasts.FormulaicContrasts.contrast` and {func}`~formulaic_contrasts.FormulaicContrasts.cond` methods are inherited from the {class}`~formulaic_contrasts.FormulaicContrasts`\n",
"base class:"
]
},
Expand All @@ -28,9 +28,13 @@
"import formulaic_contrasts\n",
"import numpy as np\n",
"import statsmodels.api as sm\n",
"import pandas as pd\n",
"\n",
"\n",
"class StatsmodelsOLS(formulaic_contrasts.FormulaicContrasts):\n",
" def __init__(self, data: pd.DataFrame, design: str):\n",
" super().__init__(data, design)\n",
"\n",
" def fit(self, variable: str):\n",
" self.mod = sm.OLS(self.data[variable], self.design_matrix)\n",
" self.mod = self.mod.fit()\n",
Expand Down Expand Up @@ -198,7 +202,7 @@
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"c0 -1.6492 0.935 -1.764 0.082 -3.512 0.213\n",
"c0 1.9563 0.775 2.525 0.014 0.413 3.499\n",
"=============================================================================="
]
},
Expand All @@ -208,7 +212,7 @@
}
],
"source": [
"model = StatsmodelsOLS(df, \"~ treatment * response\")\n",
"model = StatsmodelsOLS(df, \"~ treatment + response\")\n",
"model.fit(\"biomarker\")\n",
"model.t_test(\n",
" model.contrast(\"response\", baseline=\"non_responder\", group_to_compare=\"responder\")\n",
Expand Down Expand Up @@ -273,7 +277,7 @@
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"c0 -1.6492 0.935 -1.764 0.082 -3.512 0.213\n",
"c0 1.9563 0.775 2.525 0.014 0.413 3.499\n",
"=============================================================================="
]
},
Expand All @@ -283,7 +287,7 @@
}
],
"source": [
"model = StatsmodelsOLS(df, \"~ treatment * response\")\n",
"model = StatsmodelsOLS(df, \"~ treatment + response\")\n",
"model.fit(\"biomarker\")\n",
"model.t_test(\n",
" model.contrast(\"response\", baseline=\"non_responder\", group_to_compare=\"responder\")\n",
Expand Down Expand Up @@ -338,7 +342,7 @@
"outputs": [],
"source": [
"design_mat = materializer_class(df, record_factor_metadata=True).get_model_matrix(\n",
" \"~ treatment * response\"\n",
" \"~ treatment + response\"\n",
")"
]
},
Expand Down Expand Up @@ -371,15 +375,15 @@
" drop_field='non_responder',\n",
" column_names=('non_responder',\n",
" 'responder'),\n",
" colname_format='{name}[T.{field}]')],\n",
" colname_format='{name}[{field}]')],\n",
" 'treatment': [FactorMetadata(name='treatment',\n",
" reduced_rank=True,\n",
" custom_encoder=False,\n",
" categories=('drugA', 'drugB'),\n",
" kind=<Kind.CATEGORICAL: 'categorical'>,\n",
" drop_field='drugA',\n",
" column_names=('drugA', 'drugB'),\n",
" colname_format='{name}[T.{field}]')]})\n"
" colname_format='{name}[{field}]')]})\n"
]
}
],
Expand Down Expand Up @@ -571,10 +575,10 @@
"defaultdict(set,\n",
" {'np.log': {'np.log(biomarker)'},\n",
" 'biomarker': {'np.log(biomarker)'},\n",
" 'C': {'C(response)',\n",
" \"C(treatment, contr.treatment(base='drugB'))\"},\n",
" 'treatment': {\"C(treatment, contr.treatment(base='drugB'))\"},\n",
" 'contr.treatment': {\"C(treatment, contr.treatment(base='drugB'))\"},\n",
" 'C': {'C(response)',\n",
" \"C(treatment, contr.treatment(base='drugB'))\"},\n",
" 'response': {'C(response)'}})"
]
},
Expand Down
Loading
Loading