R-square and F value

R-square is the statistical measure of how much close the data are fitted to the regression line. R-squared also known as ‘coefficient of determination’ or coefficient of multiple determination for the multiple regression. R-squared is defined as the percentage (%) of response variable variation which is explained by the linear model.

R-squared = (Explained variation) / (Total variation)

R-squared is lies between 0 % and 100%:

  • 0% shows that model describes none of variability of response data about its mean.
  • 100% shows that model describes the entire variability of response data about its mean.

Generally, which model is best fit with higher R-square, or better with lower R-square is better? R-square determines whether coefficient of estimates and predictions are biased, that is why we must evaluate the residual plots. There could be low R-square value for the good model, or high R-squared value for the model which doesn’t fit the data. There are some fields which entirely expects that R-squared values would be low. For example, a field which try to predict human behavior like psychology, has R-squared values less than 50%. Humans are difficult to predict than physical processes. If the value of R-squared is low but we have statistically important predictors, still we can draw many conclusions about how variation in the predictor values are related with the changes in response value.

F-test in regression make the comparisons of the fits of different linear models. T-tests can assess single regression coefficient at one time, while the F-test can assess many coefficients concurrently.

The F-test compares a model with zero predictors to the model which you specify. A regression model which contains no predictors is known as intercept-only model.

Hypotheses for the F-test of overall significance are follows:

  • Null hypothesis: The fit of an intercept-only model and your selected model are equal.
  • Alternative hypothesis: The fit of an intercept-only model is considerably reduced compared to your selected model.

If P value for F-test of the overall significance test is lesser than your significance level, you could reject null hypothesis and draw the conclusion that your model gives better fit as compare to the intercept-only model.

In an intercept-only model, all the fitted values equal to the mean of response variable. Therefore, if P value of overall F-test is significant, the regression model predicts the response variable better than the mean of the response.

While R-squared provides an estimate of the strength of the relationship between the model and the response variable, it does not provide a formal hypothesis test for this relationship. The overall F-test determines whether this relationship is statistically significant. If the P value for the overall F-test is less than your significance level, you can conclude that the R-squared value is significantly different from zero.