Skip to content

Evaluation Metrics

StefanKennedy edited this page Mar 11, 2019 · 3 revisions

Common metrics

These are the same metrics used since experiment 5.

Precision

Of all the reviews identified as fake, what percentage are actually fake? If we classify all reviews as fake, then our precision will be low. If we classify all reviews as genuine, then we wont have any precision either.

In our case it might be more important to have a high recall, if we don't want to miss any fake reviews. Otherwise if we want to be as accurate as possible we can balance recall and precision

Recall

Of all the fake reviews, what percentage were identified as fake? This is not subject to the imbalanced classification problem. We aim to maximise it as an indication of how well we are really identifying our fake reviews.

We cannot focus solely on recall, because we could identify all reviews as fake and achieve 100% recall. Precision must be included in the consideration.

F1 score

This is a harmonic mean of precision and recall. Because of this it punishes extreme values such as a recall of or a precision of 0.0

This also acts as a single number metric representing precision and recall.

AUC

This gives us a measure of discrimination, how well we correctly classify both classes. This does not use a 'Yes' or 'No' which can make it more interesting than accuracy.

At different classification thresholds, how well do we predict fake reviews as 'more fake' than genuine reviews. We plot the true positive rate against the false positive rate to get a graph. Changing the threshold allows us to create a graph because at low thresholds we will have more fake reviews, increasing the true positives rate. Decreasing the treshold means we will have less genuine reviews, decreasing the true negative rate, which therefore increases the false positive rate.

An AUC of 0.8 means the chance of the model distinguishing positive and negative classes is 80%.

Classification Accuracy

This is the percentage of correct classifications. The higher the better, however it is not appropriate in all cases.

This metric falls to the imbalanced classification problem. When there are many more of one class, the classifier can choose it much more, or all the time, to achieve a high accuracy. When we calculate this we should do it with balanced classes.

Cost of misclassification

We should be clear about what the problem is and which misclassifications are most important to prevent.

Clone this wiki locally