|
V14
Sampling Distribution Theory Welcome to part nine of our video
series on sampling distribution theory. I'm Renee Clark from the Swanson
School of Engineering at Pitt. In this video, we are going to discuss the T distribution, also known as the student T
distribution. We're… we're going to compare the T and the Z distributions and
further discuss various properties of the T distribution. Okay, so, thus far we've talked
about the Z distribution. So, you're familiar with the Z distribution. Okay,
the standardization of x-bar to a z random variable is shown there on the
right. So, x-bar, subtract off its mean, divide by
its standard deviation, sigma, over the square root of n (where sigma is the
population standard deviation). Okay, so, Z… the Z distribution
makes use of the population standard deviation, sigma. Okay, now, knowing the
parameter value, sigma, might be reasonable if one is highly familiar with a
given population or a process. But, in general, sigma, or other parameters,
are… are unknown/ tend to be unknown and we must estimate them. Okay, another
way to think about this is if mu is unknown, right? Typically, we estimate mu.
Then, sigma is likely going to be unknown as well. Okay, however, as an
academic exercise, and as a place to get started, to this point we've assumed
sigma to be known and we used the Z distribution. Okay,
now, in reality, typically we have to estimate sigma
(which is the population standard deviation). Okay, we must estimate it using S,
which is the sample standard deviation, right? We can take a sample out of
the population and estimate sigma. Okay, when we do this, what we actually obtain is a random variable that's distributed
according to the T distribution. Okay, and it looks a lot like Z. Okay, the
difference being that, instead of using sigma, we use S. Okay, so, T is
obtained by taking your sample mean, subtracting off mu, or the population
mean, dividing by S over the square root of n. S is the estimate of sigma, so
s is the sample standard deviation. Okay, so, T has
n minus 1 degrees of freedom, and what's key about t is that your
parent population of X must be normally distributed, okay, in
order to use T. So, another way of writing this is X must be
distributed normally. Okay, T looks a lot like the Z distribution. This is a
picture, here, of the T distribution. Okay, as you can see, it is symmetric.
T, that is, is symmetric about zero. So, 50% of the area, or probability is to
the left of zero, 50% is to the right. So, T is indeed bell-shaped. Okay, and,
in fact, as your sample size goes to infinity, the T and the Z distributions
become the same. So, they become the same distributions.
Now, in reality, T does not differ much from Z when
your sample sizes are 30 or more. So, some more about the
properties of the T distribution. The T distribution is more variable than…
than the Z distribution, and why is this the case? It's the case because T depends on the fluctuation of two statistics, and, as
we recall, statistics are random variables. So, it depends on the fluctuation
of two statistics from sample to sample, and, what
are these two statistics? X-bar and S, as shown in the formula for t. Okay,
compare that in the formula for Z. There's only one statistic, or one random
variable, and that's x-bar. Okay, and that's because sigma, being a parameter,
is not a random variable. Okay it's a constant that characterizes your
population, okay. Some additional things about t: the central limit theorem
does not relate to T. The central limit theorem relates just to the normal
distribution, or the Z distribution, and actually requ… requires the use of sigma. Let's look at a picture of the Z versus the T distribution. Okay, so, I am going to
highlight in green a picture of the Z distribution. Okay, so, that's the Z,
or the standard normal, distribution in Green. In red is a picture of the T
distribution with a sample size of about close to 30… somewhere close to 30. So,
just visually what you can see, okay, is that the Z distribution is pointier,
or more pointy, or has a
higher peak than the t distribution. Okay, so the T distribution, then,
is less pointy than Z. Okay, so, what that means is that the T distribution actually has more area in its tails. So, what do we mean
by the tails? The tails, we mean, out… far away from the center or the… the
mean. Okay, so, let's look at… let's take a picture, or a point, right here
on the x-axis. Okay, you'll notice, for that point along the x axis, that the
red curve, which is T, sits higher than the green curve, which is Z. Okay, so, that means there's more area under the red curve,
or the T curve, at that point than there is under the green curve. Okay,
yellow is just what's under the green curve, and I'm going to superimpose it
in blue. Here is all the area under the red curve, so T has more area in the tails,
or far out from, or further out from mu. Okay, another way… another way to
state that is that T has heavier, or thicker, tails than the Z distribution. Okay,
this means that the probability of getting a value far from mu, or from the
center, is greater with the T distribution versus the Z distri…
versus the Z distribution. Okay, and the reason for this greater area, or
probability, in the tails is that, again, the T distribution is flattened, or
flatter, or the red distribution is flatter compared to the green, or the Z
distribution. So, it's like that area is getting pushed out towards the tail.
The area is getting pushed out from the center, further out into the tails
with the T distribution. That's why there's more area in the tails with T, or
higher probability in the tails with the T distribution relative to the Z
distribution. Thank you to the NSF under Grant
number 233 5802 for supporting our work. Thank you for watching. |