|
V29
Hypothesis Testing Welcome to part five of our video
series in support of hypothesis testing. In this video, we are going to
review paired data from dependent populations. We're also going to discuss use of the T and Z distributions for performing inference
on the difference in two means from dependent populations. I'm Renee Clark
from the Swanson School of Engineering at the University of Pittsburgh. Okay, so, in review, paired or
dependent data setups occur when one the same subject is studied under two
different conditions, okay, and one of the most well-known or popular
experiments for this would be a before and after study, okay where you record
measurements for the same person both before and after your intervention, or
your something you're trying to test. So, for example, you might record
scores for the same person both before and after a new fitness program that
you're trying to test, or a new diet regimen, or, perhaps, a new teaching
method in the classroom. Okay, another example of same
subject…same…a same subject experiment is where, for example, you might be
trying to test the wear for two different brands of shoes: brand A versus
brand B. Okay, so, you might put brand A of… of the shoe on the left shoe of
a person, or on the left foot of a person. You would then take brand B shoe,
put it on the right foot of a person (same person). Okay, so, the benefit in
doing that is that that same person is going to treat both of those brands
the same in terms of how much they walk or how they walk, okay, to more
fairly assess the wear for each brand. Okay, now the second case of a paired
or dependent data setup occurs with matched subject studies. Okay, again, two
different persons but they're matched in some definitive way such as a
brother and sister pair or a husband and wife pair ,or
um… uh puppies from the same litter. Okay, in general, paired data setups
reduce unwanted variability that would be inherent in testing your two
different… you know, or in testing your method with two completely
independent groups or two completely different groups of people. Okay, so, it
reduces the variance. Okay, so, for the first case,
when we're doing inference on the difference in two means from dependent
populations or paired data, okay, so, for that first case, we're going to
consider where our population variance of the differences is unknown, and, in
addition, we have a small sample size or small number of pairs. Okay, in that
case, as you've become familiar with, we have to use
the T distribution, right? Okay, so this is what the T test statistic looks
like, and you'll notice that, instead of the sigma in the denominator, we
have the sample standard deviation in the denominator. Okay, but, let's look at the numerator as well. Okay, D in the
numerator is the mean or the average of the sample differences. Okay, so,
what are the sample differences? Let me point your attention into this table here of before and after measurements that
were collected in a pair data setup. Okay, this particular
example or table has six different pairs of data. So, n in general
represents the number of pairs that you have. In this case, our n is six. Okay,
so, each before and after measurement is recorded. In this case, X1
represents the before, X2 the after. Okay, the individual d sub I, or differences, are calculated by taking either x1-
x2 or x2 - x1. It doesn't matter in which order you do it. In this case, X2
minus X1 was done so that the first difference of 1 was obtained by 2 - 1 or
X2 - X1. In the second case, the value of 2 was obtained by 5 – 3, and so on.
Okay, if we were to average these six individual differences, we would get a
value of 3.5. We call that our d bar. Okay, so, d bar is the mean, or the
average… mean or the average of the sample differences, or the sample d sub i’s. S sub d, that I referred to previously in the
denominator, is your sample standard deviation of the di. Okay, so, if you
were to, again, take these individual di’s and you
were to calculate the standard deviation of those six numbers, you would come
up with a value of 1.9 in this case. We call that our s sub d, or the sample
standard deviation of the differences. Okay, for T, it's the s sub d that
appears in the denominator. Now, in order
to use T, our differences, our d sub I, must be normally distributed. Okay,
but, in proceeding with a proof by contradiction, what we're going to do is
we're going to insert our hypothesized difference of zero, which will be the
case if we are assuming no difference in the means, okay, or if our…
sometimes we write that as mu sub d is zero, we're going to insert that into
the test statistic for t to do a proof by contradiction. And when we insert
that value of zero, of course, that term vanishes. Okay, so, for the case in which,
however, our sigma squared sub d has… happens to be known, okay, when we are
looking at the inference for the difference in two means from dependent
populations, okay, then, of course, we use the Z distribution. Okay, keeping in mind that it's sigma sub d that's in the
denominator for the Z distribution, but the test statistic looks very similar
to that of the T. Okay, now, in order to use Z, what
we must know is that our D in the numerator, our d s, must be normally
distributed. Okay, when will D be normally distributed? One of two cases. If
our n is large or our number of pairs is large, okay, then D Bar will be
normally distributed by the central limit theorem. Okay, the second case is
that if our D's, or distribution of the differences, is normally distributed.
Then, automatically, our D bars will be normally distributed. Okay, but, you'll proceed in the same way, proof by
contradiction, if you're hypothesizing no difference in the means or
population means, the hypothesized difference is zero. This gets inserted
into the numerator of the test statistic to do a
proof by contradiction, in which case that term vanishes. And, finally, for
the case in which our population variance is unknown, or our Sigma 2 squared
sub d is unknown, but we have the advantage of a large n when we're doing
inference on the difference in means from dependent populations, then, of
course, we can go back to using the Z distribution, right? Now, typically with the Z
distribution we have sigma in the denominator, right, or sigma sub d in the denominator.
However, when n is large, S d becomes, or the sample standard deviation of
the differences, becomes a good estimator of the sigma sub d, or the
population value, since n is large or we have a
large sample. Okay, in which case, then, in the denominator we can replace
that sigma value by the S sub d value, and that's why you see that shown
there in the denominator for the case of a large n. Okay, and also, of course,
when your n is large, okay, your D bar will be normally distributed by the
central limit theorem, okay, in which case you can transform that d to a z
random variable. Okay, our typical null hypothesis
assumes no difference in the means, okay, and thus, to proceed by… to proceed
with a proof by contradiction, we insert that hypothesized difference of zero
into the numerator, in which case that term vanishes. Finally, I want to talk about the
relationship to a confidence interval in discussing the… performing inference
on the difference in two means, and, of course, we've been talking about in
this particular video series two dependent means. Again,
this… this particular relationship applies whether
your means are independent or dependent, but… but let's review it again. Let's
say that our hypothesis is that there is no difference in
those two means. Okay, so, our hypothesized difference equal to mu1 minus mu 2,
and, in the case of dependent means, we sometimes label that as mu sub d is
zero. Okay, so, let's say I happen to calculate a confidence interval of 0.4
to 1.5 associated with this particular hypothesis
test. Okay, zero is not included in that confidence interval, right? 0 occurs
before 0.4, therefore 0 is not plausible for mu sub d = to mu 1 minus mu 2. Okay,
so, we would tend to reject zero as a plausible value for mu sub d. Okay, however, let's say I
calculated a confidence interval of .95 to 1.65. Now, in this case, as you
can see, zero is included in this confidence interval because the lower limit
is negative, the upper limit is positive. So, zero is included. Okay, in this
case, 0 is a plausible value for mu 1 minus mu 2 equal mu d… sub d. So, we
would not reject zero as a plausible value for mu sub d in this case. Okay,
so, in this case, we would fail to reject that null hypothesis. We wish to thank the National
Science Foundation under Grant 2335802 for supporting our work. Thank you for
watching. |