|
V31
Hypothesis Testing Welcome to part seven in support
of our video series on hypothesis testing. In this video, we will discuss use of the Z distribution for performing inference on the
difference in two population proportions. I'm Renee Clark from the Swanson
School of Engineering at the University of Pittsburgh. So, when might you need or want
to perform inference for the diff… for the difference in two proportions from
independent populations? Okay, so, for example, you might need to explore
sedans versus full-size trucks in terms of their… each of their reliabilities.
Okay, let's say that p1 is the proportion of all sedans that break down on
the road before accruing, say, 50,000 miles. P2 would be the proportion of
all full-size trucks that break down on the road before accruing 50,000 miles.
You could use inference on the difference in two proportions to determine
whether these population proportions differ. Okay, I'm going to jog your
memory a little bit. As you'll recall, this is how we calculated the
confidence interval for p1 minus p2, okay, or the population proportions from
two independent populations. Okay, so, we used the difference in the p hats
as the point estimate. We use the Z distribution, or the normal approximation
to the binomial, and, under the radical, we used the
individual p1 and p2 hats as such. Okay, now, our typical null hypothesis
when we're performing inference with two population proportions is that p1 equal p2, okay, and, of course, another way to write that
is that p1 minus p2 is zero. So, in other words, there is no hypothesized
difference in the two population proportions. Okay, so, two things I'm going to
call your attention. First, to the Z random variable or the Z test statistic
that we're going to use for inference on two proportions. Okay, now, notice
it's similarity to the confidence interval, and that's why I first showed you
the confidence interval. Okay, so, as our point estimate here, again, you'll
see the difference in the two p hats. Okay, now, because we are hypothesizing
no difference in the two population proportions, in order
to do a proof by contradiction, that hypothesized difference gets
inserted into the test statistic, and, of course, that term vanishes to zero.
Okay, now, under the radical in the denominator, you'll notice that it has a
similar look to what's under the radical for the confidence interval, okay,
has… has quite a similar look. But, what you'll
notice in the denominator, here, is that we have a new quantity simply called
p hat… p hat time 1 minus P hat, and then times 1/N1 + 1/N2. Okay, so let's discuss where this…where
this P hat comes from, okay? It goes back to, again, where we are
hypothesizing no difference in the two…proportion from the two independent
populations. If we're hypothesizing no difference, then it… then P1 equal P2,
and we just… we can simply replace that by one variable called P. Okay, so, in order to estimate this 1 P, which we will call P hat,
what we can simply do is we can pool. Pool- you've seen that term before with
sample variances. We're going to pool the information from the two… um… uh
samples out of the… out of the populations. In the numerator, we are going to
pool the num… the counts or the number of successes. Okay, these two are of
course binomial random variables, right? Y1 is number of successes from the…in
the first sample, Y2 number of successes in the second sample. These are
binomial random variables, and… and…and then, in denominator, we're simply
going to pool, or bring together, the number of trials in each case. Okay,
but, again, because we're hypothesizing that those proportions are equal, it
makes sense to pool them in order… order to get one estimate, P hat, just
like we did when we were assuming or had reason to believe that our
population variances were equal. Remember we use the pooled estimate of the
sample variance? Same thing here. Okay, so, then, in the denominator,
that's what appears here in the denominator of the test statistic, p hat. That's
your pooled estimate. P hat times 1 minus p hat times 1/ N1 + 1 / N2 square
root thereof, and that's what's in the denominator for the Z random variable.
Okay, now, in order to use the Z distribution to do
the inference, we have to meet the following two key assumptions, which
should look familiar to you. We say p hat * N1 + N2 must be greater than or
equal to 5, and, again, this is the pooled estimate that we just discussed,
right, equal to y1 + y 2 over N1 + N2, and remember P1 hat was y1 over N1, P2
hat Y 2 over N2. Okay, and then that's assumption…
one assumption. Two: what should also look familiar
to you is that 1 minus P hat times N1 + N2 must be greater than or equal to 5.
Very similar in flavor to what you've seen before when working with
proportions. Okay, and finally, let's talk about the relationship of this to
the confidence interval. Okay, again, our typical null hypothesis is that P1 equal P2, and another way to write that is P1 minus P2 is
zero. So, in other words, no difference in those two
proportions. Okay, so, let's say I calculate the following confidence interval
representing the difference in two population proportions, so -0.6 to 0.2. Okay,
so, in this particular confidence interval, starred
in blue here, 0 is contained in that interval, right? So, zero is plausible
for P1 minus P2. So, we certainly would not reject that… we would not reject
zero as a plausible value for P1 minus P2. Okay, so, we would, in this case,
fail to reject the null hypothesis. Okay, however, let's say that this was
the confidence interval that was calculated for the difference in the
proportions: 0.3 to 0.4. Okay, so, in that case, for the confidence interval
in red, zero is not contained in that interval, right? It falls below 0.3. Okay,
so, in this case, zero is not plausible for the difference in the two proportions.
Okay, so, in this case, we would reject zero as plausible for P1 minus P2. So,
we would end up rejecting the null hypothesis in this case. We wish to thank the National
Science Foundation under Grant 2335802 for supporting our work. Thank you for
watching. |