Skip to content

Commit

Permalink
Add a tutorial for Cox models (#39)
Browse files Browse the repository at this point in the history
* Add an initial version of new user guide for the coxph model.

* fix firste typos

* Rework a bit

* Move the new getting started page further up

Co-authored-by: Alex Arslan <[email protected]>
  • Loading branch information
lauswullum and ararslan authored Jul 9, 2022
1 parent 34b815f commit 43886ce
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ makedocs(
authors = "Alex Arslan",
pages = [
"Home" => "index.md",
"Getting Started" => "getting_started.md",
"Event Times" => "events.md",
"Kaplan-Meier" => "km.md",
"Nelson-Aalen" => "na.md",
Expand Down
100 changes: 100 additions & 0 deletions docs/src/getting_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Getting Started

## Cox proportional hazards regression model

This tutorial shows how to fit a Cox model and set up the [`EventTime`](@ref)
values for right censored events.

### Dataset

In this example we'll use a table containing criminal
[recidivism](https://en.wikipedia.org/wiki/Recidivism) data from Rossi et al. (1980).
The data pertain to 432 convicts who were released from Maryland state prisons
in the 1970s and who were followed for one year after release.
The released convicts were randomly assigned to receive or not receive financial aid
with equal probability.
The outcome of interest is the time from release to rearrest.

The dataset is available as a CSV file in this package's `test/data/` directory.
To load the data as a `DataFrame`, we'll use the [CSV](https://github.com/JuliaData/CSV.jl)
and [DataFrames](https://github.com/JuliaData/DataFrames.jl) packages.

```julia-repl
julia> using Survival, StatsModels, CSV, DataFrames
julia> rossi = CSV.read(joinpath(pkgdir(Survival), "test", "data", "rossi.csv")), DataFrame);
```

### Fitting the model

We now construct the event times used as the model response.
The `EventTime` constructor accepts a time value and an indicator of whether the value
was observed (`true`) or right-censored (`false`).
The times in this data frame are in the column `week` and the arrest status in `arrest`.
An `arrest` value of 1 indicates an observed event (arrest) and 0 indicates censoring.

```julia-repl
julia> rossi.event = EventTime.(rossi.week, rossi.arrest .== 1);
julia> first(rossi, 10)
5×10 DataFrame
Row │ arrest week fin age race wexp mar paro prio event
│ Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 EventTim…
─────┼───────────────────────────────────────────────────────────────────────────
1 │ 1 20 0 27 1 0 0 1 3 20
2 │ 1 17 0 18 1 0 0 1 8 17
3 │ 1 25 0 19 0 1 0 1 13 25
4 │ 0 52 1 23 1 1 1 1 1 52+
5 │ 0 52 0 19 0 1 0 1 3 52+
```

To fit the Cox model, we can use `fit(CoxModel, ...)` or the shorthand `coxph(...)`.

```julia-repl
julia> model = coxph(@formula(event ~ fin + age + race + wexp + mar + paro + prio), rossi)
StatsModels.TableRegressionModel{CoxModel{Float64}, Matrix{Float64}}
event ~ fin + age + race + wexp + mar + paro + prio
Coefficients:
────────────────────────────────────────────────
Estimate Std.Error z value Pr(>|z|)
────────────────────────────────────────────────
fin -0.379416 0.191379 -1.98253 0.0474
age -0.0574299 0.0219988 -2.61059 0.0090
race 0.31392 0.307995 1.01924 0.3081
wexp -0.14981 0.212226 -0.705898 0.4803
mar -0.433724 0.38187 -1.13579 0.2560
paro -0.0848615 0.195756 -0.433505 0.6646
prio 0.091521 0.0286469 3.1948 0.0014
────────────────────────────────────────────────
```

### Accessing values

Many of the common functions for accessing model parameters used in packages such as
[GLM](https://github.com/JuliaStats/GLM.jl) are extended for use with `CoxModel`s.

For example, the model coefficient estimates can be extracted with `coef`:

```julia-repl
julia> coef(model)
7-element Vector{Float64}:
-0.3794158823466362
-0.057429889713653676
0.313920393830735
-0.14980964863737226
-0.43372380447995285
-0.08486148372086805
0.09152099594619753
```

Similarly, the standard errors of the estimates are accessible with `stderror(model)`
and the full variance-covariance matrix with `vcov(model)`.

Other available functions include:
- `loglikelihood`, the log likelihood of the fitted model
- `nullloglikelihood`, the log likelihood of the null model
- `dof`, degrees of freedom
- `nobs`, the number of observations used to fit the model
- `coeftable`, a table of coefficient names, estimates, standard errors, z-values, and p-values

0 comments on commit 43886ce

Please sign in to comment.