V33 Hypothesis Testing

Welcome to part nine of our video series in support of hypothesis testing. In this video, we are going to discuss the goodness of fit test. I'm Renee Clark from the Swanson School of Engineering at the University of Pittsburgh.

Okay, so, thus far, we've tested hypotheses about population parameters, right? A quantity such as the population mean, mu, or perhaps a population proportion, or the population variance, or the difference in two means, or the ratio of two population variances, right, or maybe the difference in two population proportions. Okay, so, we've tested hypotheses about parameter values. Next, we are going to test hypotheses, but not about parameters, but about data distributions. Okay, so, specifically, this type of a hypothesis test asks a question such as, “does my observed or sample data that I have… does its distribution follow a certain, we'll call it, theoretical or hypothesized distribution?”

So, for example, does my observed data that I have follow, say, a normal distribution, or does it follow a uniform distribution, okay, or any other distribution that exists? Exponential, for example. Okay, this type of hypothesis test is known… known as a goodness of fit test between two distributions. Those two distributions being my observed, or my sample distribution, the sample of data that I have, versus some theoretical distribution that I'd like to compare my sample data to.

Let's look at the example of tossing a six-sided die. Okay, now, with a die, the outcomes have a uniform distribution, okay, assuming the sides are likely to have an equal probability for each side of the die to occur. Okay, in fact, these outcomes have a discrete uniform distrib… distribution, okay, because of the six distinct or separate sides of the die. Okay, so, let's say we wanted to investigate this issue. Okay, we would begin with the null hypothesis that our outcomes have a discrete uniform distribution, meaning that each face of the die is equally likely to occur. Okay, so, the probability function for a fair die looks like this: f (x) = 1 over 6 for each of your six distinct outcomes, okay, or six sides of the die. Okay, so, in this probability function, there are six equal probabilities, okay, each worth 1… 1/6 each. Okay, so, let's say we toss a die 120 times, and that die is in fact fair, okay, or we hypothesize that die to be fair- that the outcomes have a discrete uniform distribution. Okay, then, we expect- and this is a key word- we expect each face of the die to appear 20 times, okay, as shown in the table. Okay, which we would obtain by taking 120 over 6 for 20. Okay, so, that's what we expect.

Okay, now, what we actually observe, though, when rolling that die may not exactly match what we had expected, okay? So, a goodness of fit test is going to explore the differences between what you expect and what you actually observe. So, for example, for outcome one, or face one, we, in this case, did in fact observe 20, and we expect a 20. But, for face number three, we actually observed 17 although we expected the 20. But, in other cases, for phase six, for example, you know we observed more than 20. Okay, so, each of these columns does add to 120. Okay, so, a goodness of fit test that I have abbreviated right there is going to explore whether the differences between your expected and your observed frequencies are due, perhaps, to chance, okay, or, alternatively, whether they're due to the distribution not being uniformly discrete or, alternative… or alternatively, that the… the die… it's due to the die not being fair.

We wish to thank the National Science Foundation under Grant 233582 for supporting our work. Thank you for watching.