# Goodness of fit

This article
needs additional citations for verification. (January 2018) (Learn how and when to remove this template message) |

Part of a series on Statistics |

Regression analysis |
---|

Models |

Estimation |

Background |

The **goodness of fit** of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-squared test). In the analysis of variance, one of the components into which the variance is partitioned may be a lack-of-fit sum of squares.

## Contents

## Fit of distributions[edit]

In assessing whether a given distribution is suited to a data-set, the following tests and their underlying measures of fit can be used:

- Kolmogorov–Smirnov test
- Cramér–von Mises criterion
- Anderson–Darling test
- Shapiro–Wilk test
- Chi-squared test
- Akaike information criterion
- Hosmer–Lemeshow test
- Kuiper's test
- kernelized Stein discrepancy
^{[1]}

## Regression analysis[edit]

In regression analysis, the following topics relate to goodness of fit:

- Coefficient of determination (the R-squared measure of goodness of fit);
- Lack-of-fit sum of squares;
- Reduced chi-squared
- Regression validation

## Categorical data[edit]

The following are examples that arise in the context of categorical data.

### Pearson's chi-squared test[edit]

Pearson's chi-squared test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:

where:

*O*= an observed frequency (i.e. count) for bin_{i}*i**E*= an expected (theoretical) frequency for bin_{i}*i*, asserted by the null hypothesis.

The expected frequency is calculated by:

where:

*F*= the cumulative Distribution function for the distribution being tested.*Y*= the upper limit for class_{u}*i*,*Y*= the lower limit for class_{l}*i*, and*N*= the sample size

The resulting value can be compared with a chi-squared distribution to determine the goodness of fit. The chi-squared distribution has (*k* − *c*) degrees of freedom, where *k* is the number of non-empty cells and *c* is the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution plus one. For example, for a 3-parameter Weibull distribution, *c* = 4.

#### Example: equal frequencies of men and women[edit]

For example, to test the hypothesis that a random sample of 100 people has been drawn from a population in which men and women are equal in frequency, the observed number of men and women would be compared to the theoretical frequencies of 50 men and 50 women. If there were 44 men in the sample and 56 women, then

If the null hypothesis is true (i.e., men and women are chosen with equal probability in the sample), the test statistic will be drawn from a chi-squared distribution with one degree of freedom. Though one might expect two degrees of freedom (one each for the men and women), we must take into account that the total number of men and women is constrained (100), and thus there is only one degree of freedom (2 − 1). Alternatively, if the male count is known the female count is determined, and vice versa.

Consultation of the chi-squared distribution for 1 degree of freedom shows that the probability of observing this difference (or a more extreme difference than this) if men and women are equally numerous in the population is approximately 0.23. This probability is higher than conventional criteria for statistical significance (.001-.05), so normally we would not reject the null hypothesis that the number of men in the population is the same as the number of women (i.e. we would consider our sample within the range of what we'd expect for a 50/50 male/female ratio.)

### Binomial case[edit]

A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are *n* trials each with probability of success, denoted by *p*. Provided that *np*_{i} ≫ 1 for every *i* (where *i* = 1, 2, ..., *k*), then

This has approximately a chi-squared distribution with *k* − 1 degrees of freedom. The fact that there are *k* − 1 degrees of freedom is a consequence of the restriction . We know there are *k* observed cell counts, however, once any *k* − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only *k* − 1 freely determined cell counts, thus *k* − 1 degrees of freedom.

## Other measure of fit[edit]

The likelihood ratio test statistic is a measure of the goodness of fit of a model, judged by whether an expanded form of the model provides a substantially improved fit.

## See also[edit]

- Deviance (statistics) (related to GLM)
- Overfitting

## References[edit]

**^**Liu, Qiang; Lee, Jason; Jordan, Michael (20 June 2016). "A Kernelized Stein Discrepancy for Goodness-of-fit Tests".*Proceedings of the 33rd International Conference on Machine Learning*. The 33rd International Conference on Machine Learning. New York, New York, USA: Proceedings of Machine Learning Research. pp. 276–284.