# What is a good McFadden R2?

McFadden’s pseudo R-squared value between of 0.2 to 0.4 indicates excellent fit.

1) Falk and Miller (1992) recommended that R2 values should be equal to or greater than 0.10 in order for the variance explained of a particular endogenous construct to be deemed adequate.

R-squared is the percentage of the dependent variable variation that a linear model explains. 0\% represents a model that does not explain any of the variation in the response variable around its mean. The mean of the dependent variable predicts the dependent variable as well as the regression model.

With PROC LOGISTIC, you can get the deviance, the Pearson chi-square, or the Hosmer-Lemeshow test. These are formal tests of the null hypothesis that the fitted model is correct, and their output is a p-value–again a number between 0 and 1 with higher values indicating a better fit.

In other fields, the standards for a good R-Squared reading can be much higher, such as 0.9 or above. In finance, an R-Squared above 0.7 would generally be seen as showing a high level of correlation, whereas a measure below 0.4 would show a low correlation.

LL-based pseudo-R2 measures draw comparisons between the LL of the estimated model and the LL of the null model. The null model contains no parameters but the intercept. Pseudo-R2s can then be interpreted as a measure of improvement over the null model in terms of LL and thus give an indication of goodness of fit.

A pseudo R-squared only has meaning when compared to another pseudo R-squared of the same type, on the same data, predicting the same outcome. In this situation, the higher pseudo R-squared indicates which model better predicts the outcome.

McFadden’s R squared measure is defined as. where denotes the (maximized) likelihood value from the current fitted model, and. denotes the corresponding value but for the null model – the model with only an intercept and no covariates.

When analyzing data with a logistic regression, an equivalent statistic to R-squared does not exist. The model estimates from a logistic regression are maximum likelihood estimates arrived at through an iterative process.

The Hosmer-Lemeshow goodness-of-fit statistic is computed as the Pearson chi-square from the contingency table of observed frequencies and expected frequencies. Similar to a test of association of a two-way table, a good fit as measured by Hosmer and Lemeshow’s test will yield a large p-value.

McFadden’s R squared measure is defined as. where denotes the (maximized) likelihood value from the current fitted model, and denotes the corresponding value but for the null model – the model with only an intercept and no covariates.

McFadden’s pseudo-R squared Logistic regression models are fitted using the method of maximum likelihood – i.e. the parameter estimates are those values which maximize the likelihood of the data which have been observed. McFadden’s R squared measure is defined as

The question is often asked: “what’s a good value for R-squared?” or “how big does R-squared need to be for the regression model to be valid?” Sometimes the claim is even made: “a model is not useful unless its R-squared is at least x”, where x may be some fraction greater than 50\%.

A rule of thumb that I found to be quite helpful is that a McFadden’s pseudo R-squared ranging from 0.2 to 0.4 indicates very good model fit. As such, the model mentioned above with a McFadden’s pseudo R-squared of 0.192 is likely not a terrible model, at least by this metric, but it isn’t particularly strong either.