V22 Estimation

Welcome to part five of our video series on estimation. In this video, we are going to return to the concept of pairing data, but we're going to talk specifically about why we pair data. Okay, and then we're going to discuss the data setup that paired data takes in order to do statistical or inferential analysis with the data. I'm Renee Clark from the Swanson School of Engineering at the University of Pittsburgh.

Okay. Okay, so, let's talk about why we pair data. Okay, let's recall before and after studies that we discussed in a previous video, okay, in which the same person, or it could… it could even be an item, is measured both before and after a certain intervention or method that you are trying, or testing, out. Okay, okay, let's say that intervention happens to be a new teaching method in the classroom, or something you're trying different in the classroom. Okay, and you are trying to assess students’ abilities with your enhanced, or new method now with pairing. Okay, you would be able to test… you're able to test that same child's ability both before and after you apply your new method.

Now, if you're able to do that, to test that same child 's ability both before and after, okay, that's better than, okay, testing your method using two completely independent groups of students that have different students in each… completely different students in each group, okay, in which you may test one group with your method and then compare that to a second group in which you didn't use your method. Okay, pairing is better than… than using two independent groups, and the reason for that is, with pairing, okay, you're able to control for variables that exist between students. Okay, things such as, perhaps, what might have been their prior knowledge on a topic or in a certain area, or what their natural capabilities might be. Could also include variables such as more social variables such as parental oversight or socioeconomic status, etc. Right? There are many, many, many variables that lead to differences student to student.

But, with pairing, you're able to control for these variables. Okay, so, in essence, what pairing does is it eliminates or controls for these other sources of variability. That's key. Okay, so, you'll call… recall that with pairing, okay, the experimental unit remains the same both before and after, or without and with, your intervention, right? So, let's say we are measuring Renee. She is an experimental unit, but we're going to record her measurements both before and after… after. Okay, each row, or each subject, is an experimental unit, and they remain the same throughout the study both before and after.

Okay, so, this experimental unit remains the same, or has the same variables, both before and after, including variables that are not being tested, right? Okay, so, mathematically, pairing reduces the variance. Okay, it reduces the variance in the difference between your two variables. So, X and Y represent your two… each of your two populations… dependent populations. Okay, this is the formula for the variance of the difference in the two populations, and why, mathematically, the variance is reduced is because you are subtracting the positive co-variance there.

Okay, you can see how you're subtracting that. You're subtracting two times that. But, the covariance term is positive, okay, and so just recall that covariance is a measure of the nature of the linear relationship between two variables. In this case, the two variables would be X and Y, and these variables are not independent, right? They're dependent because they're paired. So, if they're dependent and not… and not independent, you would expect them to have a relationship of some sort. Okay, so, with paired data, it has a certain setup in order for statistical analysis. Okay, so, with paired data, we say that we have, in rows, subjects or pairs. However, you want to say it. Okay, so, in this example shown here, we have six rows, or six pairs of before and after data. X1 represents the before measurement. X2 represents the after measurement.

Okay, and, if you recall from an earlier video, the… the quantity that we're actually going to be analyzing with a paired data analysis is the difference between the two measurements. Okay, and we call these differences between the X1 and the X2 d sub I, where I is just simply the subscript of the row number. Okay, but the difference is calculated either by taking X1 - X2 or X2 - X1. It doesn't matter in which order you take the difference, okay? Okay, so, for the first row, for example, you see that its di, or its difference, is equal to one, which is ob… which was obtained by 2 – 1. So, we took X2 - X1, second row value of two was 5 – 3, and so on down the table. Okay, so, with this data, we obtain six individual differences, right, because there were six rows. Okay, so, what you do next with these differences is that you average them. Okay, so, it just so happens that these differences happen to be equal to 1 2 3 4 5 6. But, if you were to take the average of those six numbers, you would calculate 3.5, which you can see is right in the middle there (we call that average D Bar). Okay, so, it is… D is considered the average across all your individual differences in that table. Okay, using those six individual differences, 1 2 3 4 5 6, we also calculate the standard deviation, which, again, if you were to calculate the standard deviation of 1 2 3 4 5 6, you would come up with 1.9. We give that the symbol S sub d for standard deviation of the differences. Okay, but, what S sub d is… it is the standard deviation for all of your individual differences D i.

We wish to thank the National Science Foundation under Grant 233582 for supporting our work. Thank you for watching.