V31 Hypothesis Testing

Welcome to part seven in support of our video series on hypothesis testing. In this video, we will discuss use of the Z distribution for performing inference on the difference in two population proportions. I'm Renee Clark from the Swanson School of Engineering at the University of Pittsburgh.

So, when might you need or want to perform inference for the diff… for the difference in two proportions from independent populations? Okay, so, for example, you might need to explore sedans versus full-size trucks in terms of their… each of their reliabilities. Okay, let's say that p1 is the proportion of all sedans that break down on the road before accruing, say, 50,000 miles. P2 would be the proportion of all full-size trucks that break down on the road before accruing 50,000 miles. You could use inference on the difference in two proportions to determine whether these population proportions differ.

Okay, I'm going to jog your memory a little bit. As you'll recall, this is how we calculated the confidence interval for p1 minus p2, okay, or the population proportions from two independent populations. Okay, so, we used the difference in the p hats as the point estimate. We use the Z distribution, or the normal approximation to the binomial, and, under the radical, we used the individual p1 and p2 hats as such. Okay, now, our typical null hypothesis when we're performing inference with two population proportions is that p1 equal p2, okay, and, of course, another way to write that is that p1 minus p2 is zero. So, in other words, there is no hypothesized difference in the two population proportions.

Okay, so, two things I'm going to call your attention. First, to the Z random variable or the Z test statistic that we're going to use for inference on two proportions. Okay, now, notice it's similarity to the confidence interval, and that's why I first showed you the confidence interval. Okay, so, as our point estimate here, again, you'll see the difference in the two p hats. Okay, now, because we are hypothesizing no difference in the two population proportions, in order to do a proof by contradiction, that hypothesized difference gets inserted into the test statistic, and, of course, that term vanishes to zero. Okay, now, under the radical in the denominator, you'll notice that it has a similar look to what's under the radical for the confidence interval, okay, has… has quite a similar look. But, what you'll notice in the denominator, here, is that we have a new quantity simply called p hat… p hat time 1 minus P hat, and then times 1/N1 + 1/N2.

Okay, so let's discuss where this…where this P hat comes from, okay? It goes back to, again, where we are hypothesizing no difference in the two…proportion from the two independent populations. If we're hypothesizing no difference, then it… then P1 equal P2, and we just… we can simply replace that by one variable called P. Okay, so, in order to estimate this 1 P, which we will call P hat, what we can simply do is we can pool. Pool- you've seen that term before with sample variances. We're going to pool the information from the two… um… uh samples out of the… out of the populations. In the numerator, we are going to pool the num… the counts or the number of successes. Okay, these two are of course binomial random variables, right? Y1 is number of successes from the…in the first sample, Y2 number of successes in the second sample. These are binomial random variables, and… and…and then, in denominator, we're simply going to pool, or bring together, the number of trials in each case. Okay, but, again, because we're hypothesizing that those proportions are equal, it makes sense to pool them in order… order to get one estimate, P hat, just like we did when we were assuming or had reason to believe that our population variances were equal. Remember we use the pooled estimate of the sample variance? Same thing here.

Okay, so, then, in the denominator, that's what appears here in the denominator of the test statistic, p hat. That's your pooled estimate. P hat times 1 minus p hat times 1/ N1 + 1 / N2 square root thereof, and that's what's in the denominator for the Z random variable. Okay, now, in order to use the Z distribution to do the inference, we have to meet the following two key assumptions, which should look familiar to you. We say p hat * N1 + N2 must be greater than or equal to 5, and, again, this is the pooled estimate that we just discussed, right, equal to y1 + y 2 over N1 + N2, and remember P1 hat was y1 over N1, P2 hat Y 2 over N2.

Okay, and then that's assumption… one assumption. Two: what should also look familiar to you is that 1 minus P hat times N1 + N2 must be greater than or equal to 5. Very similar in flavor to what you've seen before when working with proportions. Okay, and finally, let's talk about the relationship of this to the confidence interval. Okay, again, our typical null hypothesis is that P1 equal P2, and another way to write that is P1 minus P2 is zero. So, in other words, no difference in those two proportions. Okay, so, let's say I calculate the following confidence interval representing the difference in two population proportions, so -0.6 to 0.2. Okay, so, in this particular confidence interval, starred in blue here, 0 is contained in that interval, right? So, zero is plausible for P1 minus P2. So, we certainly would not reject that… we would not reject zero as a plausible value for P1 minus P2. Okay, so, we would, in this case, fail to reject the null hypothesis. Okay, however, let's say that this was the confidence interval that was calculated for the difference in the proportions: 0.3 to 0.4. Okay, so, in that case, for the confidence interval in red, zero is not contained in that interval, right? It falls below 0.3. Okay, so, in this case, zero is not plausible for the difference in the two proportions. Okay, so, in this case, we would reject zero as plausible for P1 minus P2. So, we would end up rejecting the null hypothesis in this case.

We wish to thank the National Science Foundation under Grant 2335802 for supporting our work. Thank you for watching.