|
V6
Sampling Distribution Theory Welcome to part one of sampling
distribution theory. I'm Renee Clark from the Swanson School of Engineering
at Pitt. In part one of the sampling
distribution theory videos, we're going to cover the topics of the process of
statistical inference and how that unfolds, simple random sampling,
stratified random sampling, and the concept of a biased sample. Okay, this is a process of
statistical inference. Okay, so, statistical inference starts with the
population. Okay, recall the definition of the population. The population is
all observations of interest to us. Okay, keep in mind that all members or
items of the population may not be known or identifiable to us. So, for
example, if your population of interest is all people around the world who
played golf last year, it may not may not be
possible to identify every single person in the world who played golf last
year. Okay, so, therefore, through a random selection process, okay, we take
a sample out of the population, right? Remember what a sample is: simply a
subset of the population. Okay, so, a sample is something we can work with. Okay,
we know who or what those items are. Okay, and what we can do with a sample,
okay, is that we can calculate a summary measure of it known as a descriptive
statistic, or more simply, just a statistic. Okay, an example of such a
statistic would be the sample mean. Okay, which we
characterize as xbar, average, sample average. Okay,
so, for example, with a subset of people, we could calculate, for example,
their average weight. Okay, using that sample statistic, we then can perform
inference, okay, in which we are attempting to make a general conclusion or a
general statement about the population, all of which may not be identifiable to
us by attempting to estimate a parameter of the population. Okay, or, for
example, in this case, what is the population mean
weight, or the mean weight of all members in the population? Okay, the
parameter value for population mean is given by the Greek letter mu. So, xbar- sample average, mu- population average. Okay, so,
the process of statistical inference goes from population to sample to
statistic such as xbar to parameter such as mu. Okay, so, statistical inference
therefore relies on appropriate sampling from the population,
or taking that appropriate subset from the population. Okay, there's
two types of sampling we're going to review next. The first is simple random
sampling. The second is stratified random sampling. Simple random samples are best
for small populations, okay, when each item can be identified
or we know who or what each item is. So, for example… an example of a
population that would be suitable for simple random sampling would be all
industrial engineering majors in the Swanson school at Pitt. This is a defined group of students reasonably sized. Okay, with
simple random sampling, each item that you're going to sample has an equal
probability of being selected. Okay, because those selections are made
randomly and independently in which case then the
sample is representative of the population.
Okay, which allows good inference. Stratified random sampling is
more appropriate for larger, more diverse populations. Okay, and it divides
the population into subgroups known as strata. Okay, and the strata are
formed based on differences in key characteristics. Okay, where these key
characteristics could include items such as income level or age group or
gender. Okay, others as well, depending on the type of data that you're
working with. Okay, the way that stratified random sampling proceeds is that
it draws a simple random sample from each strata. Okay,
so, creating many different simple random samples. You then combine all those
different random samples to form your final sample. Okay, and, in this way,
your final sample is going to be sure to have items or subjects from every
subgroup or every strata. Okay, in what this ensures
is that your final sample is diverse and representative of the population,
enabling solid inference. Okay, in general, you're going to get a better
estimate of the variable that you… that you are… are trying to estimate in
the population. Okay, the concept… cept of a biased sample. Okay, if I gave you the task to
estimate the average grade point average of all undergraduates at the
University of Pittsburgh, can you sample only Swanson engineering undergraduates?
The answer is no, that's not a good approach. Okay, why are… why not? Well,
because there are other schools at Pitt, there are other majors at Pitt,
right? We have a school of nursing, we have a School of Arts and Sciences, we
have a School of Business, we have a school of computing, etc. Okay, so, if
you were to sample just Swanson engineering students at Pitt, that sample
would not be representative of the population of all Pitt students. Okay,
which is… which you were asked to work with, right? You were asked to
estimate the average GPA at Pitt in general for all Pitt students. Okay, so,
if you were to sample just Swanson students, any estimate of the average GPA
would… would essentially be limited to just Swanson students. Okay, so, this
isn't good in this case because your sample, okay, and the estimate that you
got from it really doesn't align with the task that you were given, which was
to estimate the average GPA for all undergraduates at Pitt. So, in this case,
a biased sample would potentially lead to inference that was not valid. Thank you to the NSF for
supporting our work under Grant 233582. Thank you. |