Skip to content

Logistic Regression

StefanKennedy edited this page Mar 9, 2019 · 6 revisions

Logistic Regression

Logistic regression is a generalized linear model (GLM), meaning it is related to linear regression. Logistic regression is better than linear regression for the categorization case because linear regression is unbounded, whereas logistic regression is bounded between 0 and 1.

In linear regression we find a linear relationship between the independent variables and the dependent variable. This means we find a linear relationship of the form 𝑦=𝑚𝑥+𝑐. If we tried to use this linear regression to calculate a probability that our sample is in a category, then this line would allow us to find probabilities above 1 and below 0. Therefore we can't treat this as modelling a probability.

In logistic regression we do not have a linear relationship. Instead our line reaches 1 as the characteristics of one class are met, and zero as the characteristics of the other class are met.

Linear regression uses least squares to find this relationship. This means it finds gradient of the line as the sum of all the squared differences of the x values to the mean of the x values, and divides the sum of all the products of the difference of x to the mean of x with the difference of y to the mean of y by this. The y-intercept can be derived from this by substituting x as zero.

least-squares

where a-bar is the mean of all 𝑎 values

In this case the differences of the value to the mean of the value is squared so negative values dont cancel out the positive values.

Logistic regression cannot do this because this generates a line that goes above 1 and below 0. Instead we use a technique called Maximum Likelihood Estimation.

In MLE we first change our probability prediction to the log(odds) prediction. To do this we use the formula 𝑙𝑜𝑔(𝑝1−𝑝) so that p=0.5 maps to 0 (the center value). p=1 will map to +infinity, and p=0 will map to -infinity.

The non-linear line from before now becomes linear. And now we can work with this the same way we can work with linear regression, however, our coefficients (used to convert our features to a target value) are now in terms of log(odds).

Also, because our sample points are at +infinity and -infinity, we can't use use least squares to find a best fitting line. Using MLE, we project each sample point onto our candidate line, giving us a candidate log(odds) value. We can transform this value back to a probability, and use how well our probabilities match with the real class to score our candidate line.

Our score can be calculated as the product of all the probabilities that should be classed as 1 and 'one minus ' of all the probabilities that should be classified as 0. The algorithm can try multiple lines to find the one that scores best.

Clone this wiki locally