V21 Estimation

Welcome to part four of our video series on estimation. In this video, we are going to review topics that are relevant to estimating the difference in two independent means, okay, and these topics include standardization of the difference in the sample means to a z random variable, ensuring normality of this difference, and calculating the pooled estimate of the sample variance for use with the distribution. I'm Renee Clark from the Swanson School of Engineering at the University of Pittsburgh.

Okay, so, let's first discuss, or review, how to transform the difference in two sample means to a z random variable. Okay, and this is something that we covered in our chapter on sampling distribution theory. Okay, so, let's first focus on the picture on the right in which we have two independent populations, one and two, and we're going to take a sample out of each of them of a certain size, N1 and N2. And, in this case, we are assuming that our population variances are known. Okay, so, this is the case of our sigmas each being known. Okay, in order to transform the difference in the two sample averages, X1 bar minus X2 bar, to a z random variable, okay, we are going to proceed like we always do. We are going to subtract off the mean of this difference, or the expected value of the difference, which we remember from our sampling distribution theory. The expected value, or the mean, of the difference in the sample means is simply mu1 – mu 2, so we subtract off mu1 - mu2, and then we divide by the standard deviation of the difference in the means. Now, you'll recall that the variance of the difference in the means looks like this. It's equal to Sigma 1^ 2/ N1 plus Sigma 2^ 2 over N2. So, in order to get the standard deviation that's shown here in the denominator, we simply take the square root of that, and that becomes the standard deviation, okay? Okay, now, in order to transform the difference in those two sample means to a z, okay, like we just did, in order to do this transformation- I'm just rewriting this, the transformation- in order to do this, we must know that the difference in those two means is normally distributed, right? In order to transform any variable, any random variable, to a z random variable, it must be normally distributed.

Okay so the question is, under what conditions will this quantity be normally distributed? That's the question. Okay, these are the possibilities. Okay, if your underlying pops… if your underlying populations are each normally distributed, so in other words, if X1 is normally distributed and X2 is normally distributed, then automatically X1 bar will be normally distributed and X2 bar will be normally distributed. We learned that earlier. Okay, so, that's one possibility. The second possibility is if your sample sizes N1 and N2 are each sufficiently large, meaning greater than or equal to 30, okay, and these are the sample sizes used to calculate each of your sample averages, X1 bar and X2 bar, if they are each sufficiently large, then, again, X1 bar will be normally distributed, X2 bar will be normally distributed, and this is by the central limit theorem, right? Okay, so, if either one of these cases is true, okay, if we can arrive to either one of those, then we know that the difference in the two averages will be normally distributed because C, the linear combination of normally distributed random variables, is also, itself, normally distributed.

Okay, we learned that earlier. So, this is a review, okay, and that's what we wanted to arrive at, right? We wanted to arrive at that difference being normally distributed so that we can transform it. It needs to be normally distributed in order to transform to a z and use the Z probability tables in the back of the book.

Okay, one last topic that I wanted to talk to you about is what's known as the pooled estimate of the variance, okay, and this is something that we're going to be using with the T distribution. Okay, recall that we use the T distribution. T looks something like this. We use the T distribution when sigma, or the population standard deviation, is unknown, okay, in which case then we have to use the sample standard deviation.

Okay, now, when your population variances are unknown, and again, we have two independent… we still have two independent populations, okay, if your population variances are unknown, so if these are both unknown but you have some reason to believe they are equal, or that the variance or the spread of those two populations is equal, then you can use what's known as the pooled estimate of the variance. Okay, and this is how you calculate the pooled estimate of the variance, and you'll see that subscript P there which stands for pooled. Okay, but, you'll see that this F estimate takes into account a combination of your sample variances. Okay, so, with a pooled estimate of the variances, we say that our sample variances are pooled, or combined. Okay, so pooled is just another way to say combined, brought together. Okay, so this Sp sqaured, or the pooled estimate of the variances, is actually a weighted average of your two sample variances, S1 and S2 squared, that are both shown in the formula.

Okay? Okay, and sp^2 is a weighted average, okay, which means that S1^2 and S2^2 are each weighted by their degrees of freedom. What are their degrees of freedom? N 1^ 2 or, I'm sorry, N1 minus one and N2 minus one. Okay, those are the degrees of freedom, okay, and why we use a pooled estimate of the variance is because it's a better estimate of the variances… of the variance versus simply using your sample variances individually in the calculation.

We wish to thank the National Science Foundation under Grant 233582 for supporting our work. Thank you for watching.