V6 Sampling Distribution Theory

 

Welcome to part one of sampling distribution theory. I'm Renee Clark from the Swanson School of Engineering at Pitt.

In part one of the sampling distribution theory videos, we're going to cover the topics of the process of statistical inference and how that unfolds, simple random sampling, stratified random sampling, and the concept of a biased sample.

Okay, this is a process of statistical inference. Okay, so, statistical inference starts with the population. Okay, recall the definition of the population. The population is all observations of interest to us. Okay, keep in mind that all members or items of the population may not be known or identifiable to us. So, for example, if your population of interest is all people around the world who played golf last year, it may not may not be possible to identify every single person in the world who played golf last year. Okay, so, therefore, through a random selection process, okay, we take a sample out of the population, right? Remember what a sample is: simply a subset of the population. Okay, so, a sample is something we can work with. Okay, we know who or what those items are. Okay, and what we can do with a sample, okay, is that we can calculate a summary measure of it known as a descriptive statistic, or more simply, just a statistic. Okay, an example of such a statistic would be the sample mean. Okay, which we characterize as xbar, average, sample average. Okay, so, for example, with a subset of people, we could calculate, for example, their average weight. Okay, using that sample statistic, we then can perform inference, okay, in which we are attempting to make a general conclusion or a general statement about the population, all of which may not be identifiable to us by attempting to estimate a parameter of the population. Okay, or, for example, in this case, what is the population mean weight, or the mean weight of all members in the population? Okay, the parameter value for population mean is given by the Greek letter mu. So, xbar- sample average, mu- population average. Okay, so, the process of statistical inference goes from population to sample to statistic such as xbar to parameter such as mu.

Okay, so, statistical inference therefore relies on appropriate sampling from the population, or taking that appropriate subset from the population. Okay, there's two types of sampling we're going to review next. The first is simple random sampling. The second is stratified random sampling.

Simple random samples are best for small populations, okay, when each item can be identified or we know who or what each item is. So, for example… an example of a population that would be suitable for simple random sampling would be all industrial engineering majors in the Swanson school at Pitt. This is a defined group of students reasonably sized. Okay, with simple random sampling, each item that you're going to sample has an equal probability of being selected. Okay, because those selections are made randomly and independently in which case then the sample is

 

representative of the population. Okay, which allows good inference.

Stratified random sampling is more appropriate for larger, more diverse populations. Okay, and it divides the population into subgroups known as strata. Okay, and the strata are formed based on differences in key characteristics. Okay, where these key characteristics could include items such as income level or age group or gender. Okay, others as well, depending on the type of data that you're working with. Okay, the way that stratified random sampling proceeds is that it draws a simple random sample from each strata. Okay, so, creating many different simple random samples. You then combine all those different random samples to form your final sample. Okay, and, in this way, your final sample is going to be sure to have items or subjects from every subgroup or every strata. Okay, in what this ensures is that your final sample is diverse and representative of the population, enabling solid inference. Okay, in general, you're going to get a better estimate of the variable that you… that you are… are trying to estimate in the population.

Okay, the concept… cept of a biased sample. Okay, if I gave you the task to estimate the average grade point average of all undergraduates at the University of Pittsburgh, can you sample only Swanson engineering undergraduates? The answer is no, that's not a good approach. Okay, why are… why not? Well, because there are other schools at Pitt, there are other majors at Pitt, right? We have a school of nursing, we have a School of Arts and Sciences, we have a School of Business, we have a school of computing, etc. Okay, so, if you were to sample just Swanson engineering students at Pitt, that sample would not be representative of the population of all Pitt students. Okay, which is… which you were asked to work with, right? You were asked to estimate the average GPA at Pitt in general for all Pitt students. Okay, so, if you were to sample just Swanson students, any estimate of the average GPA would… would essentially be limited to just Swanson students. Okay, so, this isn't good in this case because your sample, okay, and the estimate that you got from it really doesn't align with the task that you were given, which was to estimate the average GPA for all undergraduates at Pitt. So, in this case, a biased sample would potentially lead to inference that was not valid.

Thank you to the NSF for supporting our work under Grant 233582. Thank you.