|
V27
Hypothesis Testing Welcome to part three of our
video series in support of hypothesis testing. In this video, we are going to
discuss use of the Z distribution for performing
inference on the difference in two means from independent populations. I'm
Renee Clark from the Swanson School of Engineering at the University of
Pittsburgh. Okay, so, let's talk first about
the case in which our population variances are known. So, sigma 1 squared
known and sigma 2 squared known. Okay, so, again, we're… we're performing
inference on the difference in two means from independent populations. Okay,
now, when our sigmas are known, we can use Z,
correct? Because Z involves sigmas in the
denominator. However, in order to use Z, the
quantity that I have circled here must be normally distributed in order to
transform that quantity to a z random variable. So, in
order to use Z, X1 bar less X2 bar must be normally distributed. Okay, when will that be the case?
Okay, the first possibility is that each of your sample sizes, N1 and N2, out
of the population are each greater than or equal to 30, okay, in which case
the central limit theorem applies. Okay, the second possibility is when each
of your underlying populations is normally distributed, okay, so X1 normally distributed and X2 normally distributed. Okay, so,
if either of those two possibilities, or both, is true, okay, that means then
X1 bar normally distributed and X2 bar normally distributed. Okay, now, if we
know that this is the case, that each of those sample averages is normally
distributed, then the difference in the sample averages will be normally
distributed because a linear combination of normally distributed random
variables is also normally distributed. Remember we learned that earlier,
right? So, in other words, if we know X1 bar normal, X2 bar normal, okay,
then this quantity here is a linear combination of each of these two random
variables. Okay, and, if they're normal, the linear combination will be
normal. Okay, let's go back
up to the test statistic. Okay, how do we actually
calculate the Z test statistic when we're
doing inference on the difference in two means? Well, we subtract off from X1
bar less X2 bar… we subtract off its expected value, okay, and we'll recall
from earlier that the expected value of the difference in the two means is
simply mu 1 less mu 2. Okay, in the denominator we have the standard
deviation of X1 bar less X2 bar. We recall earlier that this is the equation
for the variance of X1 bar less X2 bar: sigma 1^ 2 over N1 plus sigma 2^ 2 over
N2. To get the standard deviation in the denominator, we simply take the
square root of the variance to get the standard deviation. Okay, that is how
you calculate the Z test statistic in this case.
Okay, so, same case (our sigmas are known), okay, but, now, we want to go by… go through the proof by contradict…
contradiction aspect. Okay, now, when you are doing inference on two means,
this is often what your null and al… and alternative hypothesis looks like. Normally
we say, or hypothesize, that mu 1 equals mu 2 versus the alternative that mu 1
not equal to mu 2. Okay, now, if we are hypothesizing that mu 1 is mu 2,
another way to write that is that mu 1 minus mu 2 is zero, meaning there's no
difference between them, and, if you look at it algebraically, if you simply
subtract mu 2 from both sides, you'll get mu 1 minus mu 2 on the left… left,
zero on the right. Okay, we call this difference D sub zero.
This is our hypothesized difference. Now, typically when we're doing
hypothesis tests, typically the hypothesized difference is zero because
oftentimes this is what we're testing- that there's no difference in the
means. However, the hypothesized difference can be non zero if you're hypothesizing up front that
there is a difference. Okay, but, in this case, there is no difference. Okay,
so, we now want to proceed with our proof by contradiction because that's how
hypothesis tests proceed. What do we do in that case? Remember, proof by
contradiction involves… in order to try to prove the
alternative, you have to assume the null to be true. Well, if the null is
true, there is no difference in mu 1 and mu 2. Okay,
so, that's why we insert a hypothesized difference of zero into this test
statistic, and when we do, that term vanishes. It goes to zero. But, of
course, in the denominator we still have our standard deviation of X1 bar
less X2 bar of Sigma 1^ 2 over N1 plus Sigma 2^ 2 N2. The square root thereof.
Let's now talk about the case of
our population variances unknown, okay, or our Sigma 1^ 2 and sigma 2^ 2
unknown. We're still doing inference on the difference in two means, but
population variance is unknown. However, we have the advantage that our
sample sizes are each large. Okay, well, in that case, we can still use Z,
okay, as our test statistic, or we can use the Z
distribution. The reason for that is when our sample sizes are large, okay,
you'll recall that our sample standard deviation is a very good estimate of
our population standard deviation. Okay, that would obviously hold for both
populations, okay, or, another way to write this, S1^2
good estimator for Sigma 1^ 2 and S2^2 good estimator for Sigma 2^ 2. Okay, so, with those s's being
good estimators for the sigma, that means that, in our variance equation for
the difference in the sample means, we can simply replace the sigmas with the corresponding s's. Okay, and so that's
why, in the denominator, you see the s's instead of the sigmas.
Okay, and, of course, with our sample sizes large, we know that that quantity
will be normally distributed by the central limit theorem… by application of
the central limit theorem, and then also knowing that a linear combination of
normally distributed random variables is also normal. Okay, this is often
what our null and alternative hypotheses look like when you are performing
inference on the difference in two means. We typically hypothesize no
difference in those two means so that… another way of saying that is that mu 1
less mu 2 is zero. That's our hypothesized difference, in this case, of zero
to proceed by proof by contradiction. That zero gets inserted into the test
statistic for mu 1 minus mu 2. Of course, that term vanishes to zero because
we're hypothesizing no difference. We wish to thank the National
Science Foundation under Grant 233582 for supporting our work. Thank you for watching. |