|
V28
Hypothesis Testing Welcome to part four of our video
series in support of hypothesis testing. In this video, we are going to
discuss use of the T distribution for performing
inference on the difference in two means from independent populations. I'm
Renee Clark from this Swanson School of Engineering at the University of
Pittsburgh. Okay, so, when we are doing
inference on the difference in two means, mu 1 minus mu 2, from independent
populations, okay, in the case where our population variances are unknown,
okay, so our Sigma 1^ squared and our Sigma 2^ squared are both unknown, and,
in addition, our sample sizes out of the population are small, or less than
30, what do we do? In that case, well, we recall that we must use the T
distribution in this case for inference, okay? Okay, now, when our variances
are unknown, or our population variances are unknown, but we have reason to
believe that they are equal or that that spread of the two populations is
equal, okay, what we can use in that case to get a better estimate of the
variance is what's known as the pooled estimate of the variance, okay? That's
given by the symbol S Sub p squared, and what the pooled estimate does is it
brings together your two sample variances, S1 squared and S2 squared… it
pulls them or combines them using their degrees of freedom as weights. Okay. Okay, so let's
then talk about the case of performing inference on the difference in
two means. We have independent populations, okay, but we don't know our
population variances. We've got small samples,
however, we have a reason to believe that our population variances are equal.
Okay, so, we're going to use the T distribution. Here is our test statistic,
or our… our T random variable. Okay, in order to use
T, okay, it must be the case that each of the underlying populations is
normally distributed. So, X1 must be normally distributed and X2 must be
normally distribut… distributed. Okay, the central
limit theorem does not play a role in the T distribution. Okay, but, again,
the case of where we have reason to believe that the sigmas
or Sigma squares are equal, we can use the pooled estimate of the variance, sp^ 2. Okay, so, you'll notice in our
test statistic we have SP in the denominator. Okay,
our degrees of freedom are given by N1 + N2 minus 2. Okay, in calling your
attention to our null hypothesis, this is our typical null hypothesis mu1
equal mu 2 that can be rewritten as mu1 minus mu 2 is zero, okay, or our
hypothesized difference is zero. Okay, we then insert that zero into the test
statistic equation for the hypothesized difference, and, again, just keeping
in mind that this D Sub 0 equal mu1 minus mu 2. Okay, but, when we insert
that zero, that term vanishes. Okay, so, let's talk about the
opposite case. Again, still performing inference on the difference in two
means, independent populations, we still don't know our sigmas,
we have small samples, and, in this case, we have reason to believe that the sigmas are not equal, or we have… we do not have reason
to believe that they are equal. So, in this case, we cannot use the pooled
estimate of the variance, okay? We don't want to use the pooled estimate of
the variance, okay? It doesn't make sense to pool those sample variances when
we don't have reason to believe that the sigmas are equal. Okay, in this case, we use T…
still using the T distribution… but you'll notice in the denominator there is
no s sub p. There's no pooled standard deviation. Rather, the sample
variances are given individually. Okay, but, in
using T, our underlying population for X1 as well as X2 must be normally
distributed. The central limit theorem does not play a role, okay? The
degrees of freedom, in this case, is calculated in
that very messy formula (which we won't be doing any such calculations). That's
the degrees of freedom, okay, but if you were to do these calculations,
likely you're going to get a non-whole number, something like degrees of
freedom is 9.54 is possible, right? So, if you were to do… if you were to
calculate these degrees of freedom, you would round down to the largest whole
number if the value that you get has a… has a decimal component. So, in this
case, if we were to get 9.54, we would say, in this case, our degrees of
freedom is 9. Okay, but, again, null hypothesis mu 1
equal mu 2 is the typical null hypothesis. Another way to say that is our
hypothesized difference of D Sub 0 equal mu1 – mu 2 = 0. To proceed with a
proof by contradiction, we insert that zero into the test statistic, which that term then vanishes. So, finally in this video, I
wanted to discuss the relationship to the confidence interval. Okay, so,
we're doing inference on the difference in two means
and we have the following typical null hypothesis, okay, mu 1 equal mu 2.
Okay, which another way to write that is that the
hypothesized difference equal to mu 1 minus mu 2 is zero, right? So, if we're
hypothesizing that they're equal, we're essentially hypothesizing no
difference between the two means. Okay, now, this general relationship
applies to any distribution you're using whether it be T, we're talking about
t in this… in this video, but also applies to use of the Z distribution. Okay,
so, let's look at an example. Let's say, for a given hypothesis test that
we're running, we happen to calculate an associated confidence interval of
2.95 to 3.65, okay, and it is a confidence interval on the difference between
mu 1 and… and mu 2. Okay, so, it… there's some lower limit and some upper
limit that we calculate. Okay, as you'll notice in this confidence interval
that I have starred in blue, 0 is not contained in that interval, right? 0 is
less than the 2.95. Okay, so, if 0 is not in this particular interval, 0 or the hypothesized difference of 0
is not a plausible value for the difference in the mean. So, if it's not
plausible, we reject zero as plausible for mu 1 minus mu 2, and, ultimately,
what we end up doing is rejecting the null hypothesis in this case. Okay, as a second example, let's
say I calculate a confidence interval in this test of -1.25 to 2.25. Okay,
I'll star this one in green, okay? So, for the
confidence interval starred in green, you'll see that zero is contained in
that confidence interval, right, because the lower limit is negative, the
upper limit is positive. So, that confidence interval crosses over 0. So, in
this case, 0 is a plausible value for the difference in the means for mu 1 and
mu 2. So, we're certainly going to not reject ,or
we're going to fail to reject, zero as plausible for the difference in the
means. So, ultimately, in this case, we would fail to reject the null
hypothesis. We wish to thank the National
Science Foundation under Grant 233582 for supporting our work. Thank you for watching. |