Logistic Regression Statistics

MLAI/Regression 2020. 1. 20. 12:54

1. Overview

1.1 Likelihood function

It is a function which estimates how likely it is that the model at hand describes the real underlying

relationship of the variables. The bigger the likelihood function the higher the probability that our model is correct.

MLE tries to maximize the likelihood function.

The computer is going through different values until it finds a model for which the likelihood is the highest when it can no longer improve it. It will just stop the optimization. That is also how any typical machine learning process goes.

1.2 Log-Likelihood function

it is much more convenient to take the log-likelihood when performing MLE. Because of this convenience, the log-likelihood is the more popular metric. By the way, the value of the log-likelihood is almost but not always negative. And the bigger it is the better.

1.3 Log Likelihood-Null (LL-Null)

LL null is the log-likelihood of a model which has no independent variables.

$$y=\beta_{0}$$

Actually the same why is the dependent variable of that model with the sole independent variable it's an array of 1s. This array is the constant we are adding with the ADD constant method.

If we create a logistic regression based on it it will have a log-likelihood equal to the LL-NulL of the previous model. You may want to compare the log-likelihood of your model with the ll-null to see if your model has any explanatory power. Seeing if our model is significant. There was this F-test for the linear regression. There must be one for logistic regression, too. It is called the log-likelihood ratio test or LLR or it is based on the log-likelihood of the model and the LLNL measures if our model is statistically different from the L-L now aka a useless model. We have it's P-value and that's all we need as we can see it is very low around 0.000

1.4 Pseudo $R^{2}$

It's also called Some terms you may have heard are AIC, BIC, and McFadden's $R^{2}$. A good pseudo $R^{2}$ is somewhere between zero point 2 and 0.4. Moreover, this measure is mostly useful for comparing variations of the same model. Different models will have completely different incomparable pseudo $R^{2}$.

1.5 Equations

When the SAT score increases by 1, the odds of admittance increase by 4.2%

$$\Delta odds=e^{b_{k}}$$

The change in the odds equals the exponential of the coefficient.

1.6 Binary Predictors

1.6.1 Gender and Admitted

y = data['Admitted']
x1 = data['Gender']

The model is significant and the gender variable is significant too. The model is

$$log(odds)=-0.64+2.08\times Gender$$

If we take the exponential of both sides we get that the odds of a female to get admitted are 7.99 times higher than those of a male. That's the interpretation of binary predictors coefficients.

1.6.2 Admitted and SAT and Gender

y = data['Admitted']
x1 = data[['SAT','Gender']]

What we get is a regression that has a much higher log-likelihood meaning it is a better one and it makes sense. SAT was an outstanding predicter we can see that the gender variable is significant but we no longer have those 0.000

The new coefficient of gender is 1.94. The exponential of 1.94 is around 7. The interpretation is the following. Given the same S.A.T. score, a female is seven times more likely to be admitted than a male.

It seems that in this particular university or in this particular degree it is much easier for females to enter.

1.7 Accuray

$$\frac{159}{168}=0.946=94.6%Accuracy=\frac{159}{168}=0.946=94.6%$$

1.8 Underfitting and Overfitting

1.8.1 Underfitting

It provides an answer but does not capture the underlying logic. It doesn't have strong predictive power under fitted models that are clumsy and have low accuracy. You will quickly realize that either there are no relationships to be found or you need a different. You will quickly realize that either there are no relationships to be found or you need a different model.

1.8.2 Overfitting

Overfitting refers to models that are so super good at modeling the data that they fit or at least come very near each observation. The problem is that the random noise is captured inside an overfit model.

Fix: Split the initial dataset into two which are training and test set.

저작자표시

'MLAI > Regression' 카테고리의 다른 글

Polynomial Linear Regression (0)	2020.01.20
Cluster Analysis (0)	2020.01.20
Ordinary Least Squares Assumptions (0)	2020.01.20
Correlation vs Regression (0)	2020.01.19
Multiple Linear regression (0)	2020.01.19

ABOUT ME

Demyank's Tlog Demyank's Tlog

1. Overview

1.1 Likelihood function

1.2 Log-Likelihood function

1.3 Log Likelihood-Null (LL-Null)

1.4 Pseudo $R^{2}$

1.5 Equations

1.6 Binary Predictors

1.6.1 Gender and Admitted

1.6.2 Admitted and SAT and Gender

1.7 Accuray

1.8 Underfitting and Overfitting

1.8.1 Underfitting

1.8.2 Overfitting

'MLAI > Regression' 카테고리의 다른 글

티스토리툴바

ABOUT ME

1. Overview

1.1 Likelihood function

1.2 Log-Likelihood function

1.3 Log Likelihood-Null (LL-Null)

1.4 Pseudo $R^{2}$

1.5 Equations

1.6 Binary Predictors

1.6.1 Gender and Admitted

1.6.2 Admitted and SAT and Gender

1.7 Accuray

1.8 Underfitting and Overfitting

1.8.1 Underfitting

1.8.2 Overfitting

'MLAI > Regression' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바