V14 Sampling Distribution Theory

Welcome to part nine of our video series on sampling distribution theory. I'm Renee Clark from the Swanson School of Engineering at Pitt. In this video, we are going to discuss the T distribution, also known as the student T distribution. We're… we're going to compare the T and the Z distributions and further discuss various properties of the T distribution.

Okay, so, thus far we've talked about the Z distribution. So, you're familiar with the Z distribution. Okay, the standardization of x-bar to a z random variable is shown there on the right. So, x-bar, subtract off its mean, divide by its standard deviation, sigma, over the square root of n (where sigma is the population standard deviation).

Okay, so, Z… the Z distribution makes use of the population standard deviation, sigma. Okay, now, knowing the parameter value, sigma, might be reasonable if one is highly familiar with a given population or a process. But, in general, sigma, or other parameters, are… are unknown/ tend to be unknown and we must estimate them. Okay, another way to think about this is if mu is unknown, right? Typically, we estimate mu. Then, sigma is likely going to be unknown as well. Okay, however, as an academic exercise, and as a place to get started, to this point we've assumed sigma to be known and we used the Z distribution. Okay, now, in reality, typically we have to estimate sigma (which is the population standard deviation).

Okay, we must estimate it using S, which is the sample standard deviation, right? We can take a sample out of the population and estimate sigma. Okay, when we do this, what we actually obtain is a random variable that's distributed according to the T distribution. Okay, and it looks a lot like Z. Okay, the difference being that, instead of using sigma, we use S. Okay, so, T is obtained by taking your sample mean, subtracting off mu, or the population mean, dividing by S over the square root of n. S is the estimate of sigma, so s is the sample standard deviation.

Okay, so, T has n minus 1 degrees of freedom, and what's key about t is that your parent population of X must be normally distributed, okay, in order to use T. So, another way of writing this is X must be distributed normally. Okay, T looks a lot like the Z distribution. This is a picture, here, of the T distribution. Okay, as you can see, it is symmetric. T, that is, is symmetric about zero. So, 50% of the area, or probability is to the left of zero, 50% is to the right. So, T is indeed bell-shaped. Okay, and, in fact, as your sample size goes to infinity, the T and the Z distributions become the same. So, they become the same distributions. Now, in reality, T does not differ much from Z when your sample sizes are 30 or more.

So, some more about the properties of the T distribution. The T distribution is more variable than… than the Z distribution, and why is this the case? It's the case because T depends on the fluctuation of two statistics, and, as we recall, statistics are random variables. So, it depends on the fluctuation of two statistics from sample to sample, and, what are these two statistics? X-bar and S, as shown in the formula for t. Okay, compare that in the formula for Z. There's only one statistic, or one random variable, and that's x-bar. Okay, and that's because sigma, being a parameter, is not a random variable. Okay it's a constant that characterizes your population, okay. Some additional things about t: the central limit theorem does not relate to T. The central limit theorem relates just to the normal distribution, or the Z distribution, and actually requ… requires the use of sigma.

Let's look at a picture of the Z versus the T distribution. Okay, so, I am going to highlight in green a picture of the Z distribution. Okay, so, that's the Z, or the standard normal, distribution in Green. In red is a picture of the T distribution with a sample size of about close to 30… somewhere close to 30. So, just visually what you can see, okay, is that the Z distribution is pointier, or more pointy, or has a higher peak than the t distribution.

Okay, so the T distribution, then, is less pointy than Z. Okay, so, what that means is that the T distribution actually has more area in its tails. So, what do we mean by the tails? The tails, we mean, out… far away from the center or the… the mean. Okay, so, let's look at… let's take a picture, or a point, right here on the x-axis. Okay, you'll notice, for that point along the x axis, that the red curve, which is T, sits higher than the green curve, which is Z. Okay, so, that means there's more area under the red curve, or the T curve, at that point than there is under the green curve. Okay, yellow is just what's under the green curve, and I'm going to superimpose it in blue. Here is all the area under the red curve, so T has more area in the tails, or far out from, or further out from mu.

Okay, another way… another way to state that is that T has heavier, or thicker, tails than the Z distribution. Okay, this means that the probability of getting a value far from mu, or from the center, is greater with the T distribution versus the Z distri… versus the Z distribution. Okay, and the reason for this greater area, or probability, in the tails is that, again, the T distribution is flattened, or flatter, or the red distribution is flatter compared to the green, or the Z distribution. So, it's like that area is getting pushed out towards the tail. The area is getting pushed out from the center, further out into the tails with the T distribution. That's why there's more area in the tails with T, or higher probability in the tails with the T distribution relative to the Z distribution.

Thank you to the NSF under Grant number 233 5802 for supporting our work.

Thank you for watching.