-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathstep5.qmd
157 lines (109 loc) · 6.41 KB
/
step5.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
```{r, include=FALSE}
library(knitr)
library(mlr)
library(tidyverse)
set.seed(1234567)
testing_data <- read.csv("testing_data.csv")
KNN_model_RQ_1 <- readRDS("KNN_model_RQ_1.rds")
decision_tree_model_RQ_1 <- readRDS("decision_tree_model_RQ_1.rds")
SVM_model_RQ_1 <- readRDS("SVM_model_RQ_1.rds")
KNN_model_RQ_2 <- readRDS("KNN_model_RQ_2.rds")
decision_tree_model_RQ_2 <- readRDS("decision_tree_model_RQ_2.rds")
SVM_model_RQ_2 <- readRDS("SVM_model_RQ_2.rds")
KNN_model_RQ_3 <- readRDS("KNN_model_RQ_3.rds")
random_forest_model_RQ_3 <- readRDS("random_forest_model_RQ_3.rds")
```
# Step 5: Compare Model Outcomes
So far, we know how the models worked on the training data, but we haven't fed our models any of our held-out testing data. To compare how each of the models did against each other we need to look at how each model does on new *unseen* data. We will do this by looking at prediction accuracy and (poorly named!) [confusion matrices](https://blog.roboflow.com/what-is-a-confusion-matrix/). A confusion matrix is far from confusing: it is simply a 4 by 4 table showing the false positive, false negative, true positive, and true negative rates that were produced by a specific model:
```{r, include=FALSE}
confusion_matrix <- data.frame(
Predicted_value = c("Model Predicted 1", "Model predicted 0"),
Actual_value_was_1 = c("True Positive", "False Negative"),
Actual_value_was_0 = c("False Positive", "True Negative"))
kable(confusion_matrix)
```
Now that we know how to read a confusion matrix, let's make the testing data frames we need to feed into the models.
```{r}
#Research Question #1
#extract only the two columns from the large pre-processed testing dataset we made in step 1 that are needed for the research question: training score and the ostensive binary outcome variable
testing_data_RQ_1 <- testing_data |>
select(training_score, ostensive_binary) |>
mutate(training_score = as.numeric(training_score),
ostensive_binary = as.factor(ostensive_binary))
#Research Question #2
#extract only the two columns from the large pre-processed dataset that are needed for the research question: training score and the nonostensive binary outcome variable
testing_data_RQ_2 <- testing_data |>
select(training_score, nonostensive_binary) |>
mutate(training_score = as.numeric(training_score),
nonostensive_binary = as.factor(nonostensive_binary))
#Research Question #3
#take out the two outcome variables that we don't want to use in this analysis but leave every other predictor
testing_data_RQ_3_factor <- testing_data |>
select(c(sex:miscellaneous_score, nonos_best)) |>
mutate(across(c(sex, desexed, purebred, gaze_follow, nonos_best), as.factor))
```
Now that we have our testing data for each of the three research questions, we can use the two functions in `mlr` that help us look at model outcomes: `predict()` and `performance()`. `predict()` lets us feed a model and new data into the function to get the predicted classification for each observation in the testing data. You then feed the object you made with `predict()` into `performance()` to get the confusion matrix information. Cool, right? Let's do it!
```{r}
#predicting new values and getting performance metrics for all 3 models run with RQ 1
#KNN model
knn_predictions_RQ1 <- predict(KNN_model_RQ_1, newdata = testing_data_RQ_1)
performance(knn_predictions_RQ1)
calculateConfusionMatrix(knn_predictions_RQ1)
```
The KNN model is classifying 75% of the cases correctly, which sounds okay. However, when we look further into the confusion matrix, we can see that the algorithm is classifying every observation as 0 (i.e., that almost every dog was performing below chance at the ostensive task, which we know isn't correct). So, this algorithm is probably not a good one to use when understanding the true relationship between the predictor and variable.
```{r}
#Decision Tree
decision_tree_predictions_RQ1 <- predict(decision_tree_model_RQ_1, newdata = testing_data_RQ_1)
performance(decision_tree_predictions_RQ1)
calculateConfusionMatrix(decision_tree_predictions_RQ1)
```
The decision tree model did just as poorly as the simplest model as it predicted *every* observation would be 0. Again, not very useable.
```{r}
#SVM
SVM_predictions_RQ_1 <- predict(SVM_model_RQ_1, newdata = testing_data_RQ_1)
performance(SVM_predictions_RQ_1)
calculateConfusionMatrix(SVM_predictions_RQ_1)
```
The SVM algorithm has the same results as our decision tree except for 1 data point even though it took 10 times as long to train. Again, not very useable.
Now let's do the same thing for research questions 2.
```{r}
#predicting new values and getting performance metrics for all 3 models run with RQ 2
#KNN model
knn_predictions_RQ2 <- predict(KNN_model_RQ_2, newdata = testing_data_RQ_2)
performance(knn_predictions_RQ2)
calculateConfusionMatrix(knn_predictions_RQ2)
```
This KNN model had the same issue in that it predicted everything would be 0. (These models really don't think much of our dog's intelligence!)
```{r}
#Decision Tree
decision_tree_predictions_RQ2 <- predict(decision_tree_model_RQ_2, newdata = testing_data_RQ_2)
performance(decision_tree_predictions_RQ2)
calculateConfusionMatrix(decision_tree_predictions_RQ2)
```
This decision tree did just as bad as the KNN.
```{r}
#SVM
SVM_predictions_RQ2 <- predict(SVM_model_RQ_2, newdata = testing_data_RQ_2)
performance(SVM_predictions_RQ2)
calculateConfusionMatrix(SVM_predictions_RQ2)
```
This algorithm had similar issues to all our other models.
Now let's do the same thing for research question 3!
```{r}
#predicting new values and getting performance metrics for all 3 models run with RQ 2
#KNN model
knn_predictions_RQ3 <- predict(KNN_model_RQ_3, newdata = testing_data_RQ_3_factor)
performance(knn_predictions_RQ3)
#Dig deeper into predictions with a confusion matrix
calculateConfusionMatrix(knn_predictions_RQ3)
```
Again, the algorithm is predicted that everything will be 0 (i.e., that all dogs will perform better at the ostensive task than the nonostensive). This is also not a very helpful algorithm to use in the future.
```{r}
#Random Forest
randomforest_predictions_RQ3 <- predict(random_forest_model_RQ_3, newdata = testing_data_RQ_3_factor)
#Measure how well the model did at predictions
performance(randomforest_predictions_RQ3)
#Dig deeper into predictions with a confusion matrix
calculateConfusionMatrix(randomforest_predictions_RQ3)
```
The random forest algorithm has the exact same results as the KNN.