-
Notifications
You must be signed in to change notification settings - Fork 9
/
README.Rmd
188 lines (123 loc) · 6.56 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
output:
md_document:
variant: markdown_github
---
```{r, echo = FALSE, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "# ",
fig.path = "figures/README-"
)
library(ggplot2)
library(lindia)
data(Cars93)
```
#lindia
lindia is an extention to [**ggplot2**](http://ggplot2.org) to provide streamlined plotting features of linear model diagnostic plots. The following demonstrates basic plotting features of `lindia`. All functions in `lindia` takes in an lm object (including `lm()` and `glm()`) and returns linear diagnostic plots in forms of `ggplot` objects. The following code demonstrates how you can create a simple linear model from `Cars93` dataset, then use a call to `lindia::gg_diagnose()` to visualize overall features of the distribution.
```{r, warning = F, message = F, fig.width = 12, fig.height = 10}
# create linear model
cars_lm <- lm(Price ~ Passengers + Length + RPM, data = Cars93)
# visualize diagnostic plots with a call to lindia
gg_diagnose(cars_lm)
```
The functionality of lindia is an improvement of the features provided in base-R graphs. Using base-R, user can also create a series of diagnostic plots using the following line:
```{r, warning = F, message = F, fig.width = 12, fig.height = 10}
par(mfrow = c(2,2))
plot(cars_lm)
```
However, the output from a call to `plot()` lacks flexibility and comprehensiveness. For example, residual vs. predictor plots are not shown, and that users cannot easily maipulate graph type and aesthetics. In that sense, `lindia` is an extension to features offered in base-R.
#Installation
+ From Github: `devtools::install_github("yeukyul/lindia")`
#Overview
Followed are functions implemented in `lindia`:
+ [`gg_reshist()`](#gg_reshist): Histogram of residuals
+ [`gg_resfitted()`](#gg_resfitted): Residual plot of residuals by fitted value
+ [`gg_resX()`](#gg_resX): All residual plots of all predictors by fitted value, layed out in a grid
+ [`gg_qqplot()`](#gg_qqplot): Normaility quantile-quantile plot (QQPlot) with qqline overlayed on top
+ [`gg_boxcox()`](#gg_boxcox): Boxcox graph with optimal transformation labeled on graph
+ [`gg_scalelocation()`](#gg_scalelocation): Scale-location plot (also called spread-location plot)
+ [`gg_resleverage()`](#gg_resleverage): Residual by leverage plot
+ [`gg_cooksd()`](#gg_cooksd): Cook's distance plot with potential outliars labeled on top
+ [`gg_diagnose()`](#gg_diagnose): All diagnostic plots, layed out in a grid
`gg_resX()` and `gg_diagnose()` would return multiple plots after a call to the function. By default, they would return one aggregate plot of all diagnostic plots as one arranged grid. If user needs more flexibility in determining graphical elements and inclusion/exclusion of certain plots, set `plot.all` parameter in the function call to `FALSE`. It will return a list of all plots, which user can manipulate.
An example would as be followed:
```{r, warning = F, message = F}
plots <- gg_diagnose(cars_lm, plot.all = FALSE)
names(plots)
exclude_plots <- plots[-c(1, 3)] # exclude certain diagnostics plots
include_plots <- plots[c(1, 3)] # include certain diagnostics plots
```
In addition, `lindia` provides a `plot_all()` feature that allows user to pass in a list of plots and output as a formatted grid of plots using `gridExtra::grid.arrange()`.
```{r, warning = F, message = F, fig.width = 12, fig.height = 8}
plot_all(exclude_plots)
```
All graphical styles returned by lindia graphing function can be overwritten by a call to `ggplot::theme()` (except `gg_resX()` and `gg_diagnose()`, which would return a list rather than a single ggplot object).
```{r, warning = F, message = F}
gg_resfitted(cars_lm) + theme_bw()
```
#Functions in Lindia
The following gives a brief demonstration of how to use the functions provided in `lindia`.
#gg_reshist
Plots distribution of residuals in histograms. Number of bins picked by default.
```{r, warning = F, message = F}
gg_reshist(cars_lm)
```
Can also specify number of bins using `bins` argument:
```{r, warning = F, message = F}
gg_reshist(cars_lm, bins = 20)
```
#gg_resX
Plots residual plots of all predictors vs. residuals and return all plots as one plot that is arranged by package `gridExtra`. If a variable is continuous, it would be plotted as scatterplot. If a given variable is category is categorical, lindia would plot boxplot for that variable.
```{r, warning = F, message = F}
cars_lm_2 <- lm(Price ~ Passengers + Length + RPM + Origin, data = Cars93)
gg_resX(cars_lm)
```
`lindia` can also handle linear models with interaction terms.
```{r, warning = F, message = F}
cars_lm_inter <- lm(Price ~ Passengers * Length, data = Cars93)
gg_resX(cars_lm_inter)
```
#gg_resfitted
Plots residual against fitted value.
```{r, warning = F, message = F}
gg_resfitted(cars_lm)
```
#gg_qqplot
Plots quantile quantile plot of a linear model, with qqline overlayed on top.
```{r, warning = F, message = F}
gg_qqplot(cars_lm)
```
#gg_boxcox
Plots boxcox graph of given lm object, with labels showing optimal transforming power of response variable using box-cox transformation. Can hide labels on graph by setting `showlambda` to `FALSE`.
```{r, warning = F, message = F}
gg_boxcox(cars_lm)
```
#gg_scalelocation
Plots scale location graph of linear model.
```{r, warning = F, message = F}
gg_scalelocation(cars_lm)
```
#gg_resleverage
Plots residual vs. leverage of data points to detect outliers using cook's distance.
```{r, warning = F, message = F}
gg_resleverage(cars_lm)
```
#gg_cooksd
Plots cook's distance plot that helps identify outliers. Observation numbers are plotted next to data points that are considered anamolies.
```{r, warning = F, message = F}
gg_cooksd(cars_lm)
```
#gg_diagnose
An aggregate plot of all diagnostics plots, layed out on one panel as demonstrated in the beginning of this README document. User can set `theme` parameter to a specific theme type in order to apply to all plots in the panel.
```{r, warning = F, message = F, fig.width = 12, fig.height = 10}
gg_diagnose(cars_lm_2, theme = theme_bw())
```
If user set `plot.all` parameter to false, `gg_diagnose` would return a list of ggplot objects which user can manipulate. `lindia` also provides a handy trick that allows user to scale all point sizes and linewidth at once, using the parameter `scale.factor`. Followed is an extreme example by setting `scale.factor` to 0.3.
```{r, warning = F, message = F, fig.width = 12, fig.height = 10}
gg_diagnose(cars_lm_2, theme = theme_bw(), scale.factor = 0.3)
```
#Package Dependency
Lindia is built on top of the following few packages:
+ `ggplot2`: ver 2.1.0
+ `gridExtra`: ver 2.1.1