-
-
Notifications
You must be signed in to change notification settings - Fork 42
/
README.Rmd
346 lines (235 loc) · 14.2 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
---
output: github_document
bibliography: paper/paper.bib
csl: apa.csl
---
# see: Model Visualisation Toolbox for 'easystats' and 'ggplot2' <img src='man/figures/logo.png' align="right" height="139" />
```{r, echo=FALSE, warning=FALSE, message=FALSE}
library(poorman)
library(see)
knitr::opts_chunk$set(
dpi = 150,
collapse = TRUE,
out.width = "100%",
fig.path = "man/figures/",
warning = FALSE,
message = FALSE
)
```
[![DOI](https://joss.theoj.org/papers/10.21105/joss.03393/status.svg)](https://doi.org/10.21105/joss.03393)
[![downloads](https://cranlogs.r-pkg.org/badges/see)](https://cran.r-project.org/package=see)
[![total](https://cranlogs.r-pkg.org/badges/grand-total/see)](https://cranlogs.r-pkg.org/)
***"Damned are those who believe without seeing"***
*easystats* is a collection of packages that operate in synergy to provide a consistent and intuitive syntax when working with statistical models in the R programming language [@base2021]. Most *easystats* packages return comprehensive numeric summaries of model parameters and performance. The *see* package complements these numeric summaries with a host of functions and tools to produce a range of publication-ready visualizations for model parameters, predictions, and performance diagnostics. As a core pillar of *easystats*, the *see* package helps users to utilize visualization for more informative, communicable, and well-rounded scientific reporting.
# Statement of Need
The grammar of graphics [@wilkinson2012grammar], largely due to its implementation in the *ggplot2* package [@Wickham2016], has become the dominant approach to visualization in R. Building a model visualization with *ggplot2* is somewhat disconnected from the model fitting and evaluation process. Generally, this process entails:
1. Fitting a model.
2. Extracting desired results from the model (e.g., model parameters and intervals, model predictions, diagnostic statistics) and arranging them into a data frame.
3. Passing the results data frame to `ggplot()` and specifying the graphical parameters. For example:
```{r}
library(ggplot2)
# step-1
model <- lm(mpg ~ factor(cyl) * wt, data = mtcars)
# step-2
results <- fortify(model)
# step-3
ggplot(results) +
geom_point(aes(x = wt, y = mpg, color = `factor(cyl)`)) +
geom_line(aes(x = wt, y = .fitted, color = `factor(cyl)`))
```
A number of packages have been developed to extend *ggplot2* and assist with model visualization (for a sampling of these packages, visit [ggplot2-gallery](https://exts.ggplot2.tidyverse.org/gallery/)). Some of these packages provide functions for additional geoms, annotations, or common visualization types without linking them to a specific statistical analysis or fundamentally changing the *ggplot2* workflow (e.g., *ggrepel*, *ggalluvial*, *ggridges*, *ggdist*, *ggpubr*, etc.). Other *ggplot2* extensions provide functions to generate publication-ready visualizations for specific types of models (e.g., *metaviz*, *tidymv*, *sjPlot*, *survminer*). For example, the *ggstatsplot* package [@Patil2021] offers visualizations for statistical analysis of one-way factorial designs, and the *plotmm* package [@Waggoner2020] supports specific types of mixture model objects.
The aim of the *see* package is to produce visualizations for a wide variety of models and statistical analyses in a way that is tightly linked with the model fitting process and requires minimal interruption of users' workflow. *see* accomplishes this aim by providing a single `plot()` method for objects created by the other *easystats* packages, such as *parameters* tables, *modelbased* predictions, *performance* diagnostic tests, *correlation* matrices, and so on. The *easystats* packages compute numeric results for a wide range of statistical models, and the *see* package acts as a visual support to the entire *easystats* ecosystem. As such, visualizations corresponding to all stages of statistical analysis, from model fitting to diagnostics to reporting, can be easily created using *see*. *see* plots are compatible with other *ggplot2* functions for further customization (e.g., `labs()` for a plot title). In addition, *see* provides several aesthetic utilities to embellish both *easystats* plots and other *ggplot2* plots. The result is a package that minimizes the barrier to producing high-quality statistical visualizations in R.
The central goal of *easystats* is to make the task of doing statistics in R as easy as possible. This goal is realized through intuitive and consistent syntax, consistent and transparent argument names, comprehensive documentation, informative warnings and error messages, and smart functions with sensible default parameter values. The *see* package follows this philosophy by using a single access point---the generic `plot()` method---for visualization of all manner of statistical results supported by *easystats*.
# Installation
[![CRAN](https://www.r-pkg.org/badges/version/see)](https://cran.r-project.org/package=see) [![see status badge](https://easystats.r-universe.dev/badges/see)](https://easystats.r-universe.dev) [![R-CMD-check](https://github.com/easystats/see/workflows/R-CMD-check/badge.svg?branch=main)](https://github.com/easystats/see/actions) [![codecov](https://codecov.io/gh/easystats/see/branch/main/graph/badge.svg)](https://app.codecov.io/gh/easystats/see)
The *see* package is available on CRAN, while its latest development version is available on R-universe (from _rOpenSci_).
Type | Source | Command
---|---|---
Release | CRAN | `install.packages("see")`
Development | r-universe | `install.packages("see", repos = "https://easystats.r-universe.dev")`
Development | GitHub | `remotes::install_github("easystats/see")`
Once you have downloaded the package, you can then load it using:
```{r}
library("see")
```
> **Tip**
>
> Instead of `library(see)`, use `library(easystats)`.
> This will make all features of the easystats-ecosystem available.
>
> To stay updated, use `easystats::install_latest()`.
# Plotting functions for 'easystats' packages
Below we present one or two plotting methods for each *easystats* package, but many other methods are available. Interested readers are encouraged to explore the range of examples on the package [website](https://easystats.github.io/see/articles/).
## [parameters](https://github.com/easystats/parameters)
The *parameters* package converts summaries of regression model objects into data frames [@Lüdecke2020parameters]. The *see* package can take this transformed object and, for example, create a dot-and-whisker plot for the extracted regression estimates simply by passing the `parameters` class object to `plot()`.
```{r parameters1}
library(parameters)
library(see)
model <- lm(wt ~ am * cyl, data = mtcars)
plot(parameters(model))
```
As *see* outputs objects of class `ggplot`, *ggplot2* functions can be added as layers to the plot the same as with all other *ggplot2* visualizations. For example, we might add a title using `labs()` from *ggplot2*.
```{r parameters2}
library(parameters)
library(see)
model <- lm(wt ~ am * cyl, data = mtcars)
plot(parameters(model)) +
ggplot2::labs(title = "A Dot-and-Whisker Plot")
```
Plotting functions for the **parameters** package are demonstrated [in this vignette](https://easystats.github.io/see/articles/parameters.html).
## [bayestestR](https://github.com/easystats/bayestestR)
Similarly, for Bayesian regression model objects, which are handled by the *bayestestR* package [@Makowski2019], the *see* package provides special plotting methods relevant for Bayesian models (e.g., Highest Density Interval, or *HDI*). Users can fit the model and pass the model results, extracted via *bayestestR*, to `plot()`.
```{r bayestestR}
library(bayestestR)
library(rstanarm)
library(see)
set.seed(123)
model <- stan_glm(wt ~ mpg, data = mtcars, refresh = 0)
result <- hdi(model, ci = c(0.5, 0.75, 0.89, 0.95))
plot(result)
```
Plotting functions for the **bayestestR** package are demonstrated [in this vignette](https://easystats.github.io/see/articles/bayestestR.html).
## [performance](https://github.com/easystats/performance)
The *performance* package is primarily concerned with checking regression model assumptions [@Lüdecke2020performance]. The *see* package offers tools to visualize these assumption checks, such as the normality of residuals. Users simply pass the fit model object to the relevant *performance* function (`check_normality()` in the example below). Then, this result can be passed to `plot()` to produce a *ggplot2* visualization of the check on normality of the residuals.
```{r performance}
library(performance)
library(see)
model <- lm(wt ~ mpg, data = mtcars)
check <- check_normality(model)
plot(check, type = "qq")
```
Plotting functions for the **performance** package are demonstrated [in this vignette](https://easystats.github.io/see/articles/performance.html).
## [effectsize](https://github.com/easystats/effectsize)
The *effectsize* package computes a variety of effect size metrics for fitted models to assesses the practical importance of observed effects [@Ben-Shachar2020]. In conjunction with *see*, users are able to visualize the magnitude and uncertainty of effect sizes by passing the model object to the relevant *effectsize* function (`omega_squared()` in the following example), and then to `plot()`.
```{r effectsize}
library(effectsize)
library(see)
model <- aov(wt ~ am * cyl, data = mtcars)
plot(omega_squared(model))
```
Plotting functions for the **effectsize** package are demonstrated [in this vignette](https://easystats.github.io/see/articles/effectsize.html).
## [modelbased](https://github.com/easystats/modelbased)
The *modelbased* package computes model-based estimates and predictions from fitted models [@Makowski2020modelbased]. *see* provides methods to quickly visualize these model predictions. For the following example to work, you need to have installed the *emmeans* package first.
```{r modelbased1}
library(modelbased)
library(see)
data(mtcars)
mtcars$gear <- as.factor(mtcars$gear)
model <- lm(mpg ~ wt * gear, data = mtcars)
predicted <- estimate_expectation(model, data = "grid")
plot(predicted)
```
One can also visualize *marginal means* (i.e., the mean at each factor level averaged over other predictors) using `estimate_means()`, that is then passed to `plot()`.
```{r modelbased2, error=TRUE}
means <- estimate_means(model)
plot(means)
```
Plotting functions for the **modelbased** package are demonstrated [in this vignette](https://easystats.github.io/see/articles/modelbased.html).
## [correlation](https://github.com/easystats/correlation)
The *correlation* package provides a unified syntax and human-readable code to carry out many types of correlation analysis [@Makowski2020]. A user can run `summary(correlation(data))` to create a construct a correlation matrix for the variables in a dataframe. With *see*, this matrix can be passed to `plot()` to visualize these correlations in a correlation matrix.
```{r correlation, error=FALSE}
library(correlation)
library(see)
results <- summary(correlation(iris))
plot(results, show_data = "points")
```
Plotting functions for the **correlation** package are demonstrated [in this vignette](https://easystats.github.io/see/articles/correlation.html).
# Themes
### Modern
```{r}
library(ggplot2)
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, color = Species)) +
geom_point2() +
theme_modern()
```
### Lucid
```{r}
library(ggplot2)
p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, color = Species)) +
geom_point2()
p + theme_lucid()
```
### Blackboard
```{r}
p + theme_blackboard()
```
### Abyss
```{r}
p + theme_abyss()
```
# Palettes
This is just one example of the available palettes. See [this vignette](https://easystats.github.io/see/articles/seecolorscales.html) for a detailed overview of palettes and color scales.
### Material design
```{r}
p1 <- ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot() +
theme_modern(axis.text.angle = 45) +
scale_fill_material_d()
p2 <- ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violin() +
theme_modern(axis.text.angle = 45) +
scale_fill_material_d(palette = "ice")
p3 <- ggplot(iris, aes(x = Petal.Length, y = Petal.Width, color = Sepal.Length)) +
geom_point2() +
theme_modern() +
scale_color_material_c(palette = "rainbow")
```
## Multiple plots
The `plots()` function allows us to plot the figures side by side.
```{r}
plots(p1, p2, p3, n_columns = 2)
```
The `plots()` function can also be used to add **tags** (*i.e.*, labels for subfigures).
```{r}
plots(p1, p2, p3,
n_columns = 2,
tags = paste("Fig. ", 1:3)
)
```
# Geoms
## Better looking points
`geom_points2()` and `geom_jitter2()` allow points without borders and contour.
```{r, fig.width=9.5, fig.height=5.8}
normal <- ggplot(iris, aes(x = Petal.Width, y = Sepal.Length)) +
geom_point(size = 8, alpha = 0.3) +
theme_modern()
new <- ggplot(iris, aes(x = Petal.Width, y = Sepal.Length)) +
geom_point2(size = 8, alpha = 0.3) +
theme_modern()
plots(normal, new, n_columns = 2)
```
## Half-violin Half-dot plot
Create a half-violin half-dot plot, useful for visualising the distribution and the sample size at the same time.
```{r}
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_violindot(fill_dots = "black") +
theme_modern() +
scale_fill_material_d()
```
## Radar chart (Spider plot)
```{r}
library(poorman)
library(datawizard)
# prepare the data in tidy format
data <- iris %>%
group_by(Species) %>%
summarise(across(everything(), mean)) %>%
reshape_longer(c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"))
data %>%
ggplot(aes(
x = name,
y = value,
color = Species,
group = Species,
fill = Species
)) +
geom_polygon(linewidth = 1, alpha = 0.1) +
coord_radar() +
theme_radar()
```
# Contributing and Support
In case you want to file an issue or contribute in another way to the package, please follow [this guide](https://github.com/easystats/see/blob/master/.github/CONTRIBUTING.md). For questions about the functionality, you may either contact us via email or also file an issue.
# Code of Conduct
Please note that this project is released with a
[Contributor Code of Conduct](https://easystats.github.io/see/CODE_OF_CONDUCT.html). By participating in this project you agree to abide by its terms.
# References