|
V35
Hypothesis Testing Welcome to part 11 of our video
series in support of hypothesis testing. In this video, we are going to
discuss a Q-Q plot, or a quantile-quantile plot and the concept of an
empirical cumulative distribution function. I'm Renee Clark from the Swanson
School of Engineering at the University of Pittsburgh. Okay, first let's discuss a Q-Q
plot. Okay, this is shorthand for quantile-quantile plot. Okay, and a Q-Q
plot… an example of one is shown right there. Q-Q plots are often used to
visually determine the normality of a data set that you might have. Okay, so,
for example, these are used to answer the question: is my data set normally
distributed? You know, does my data look approximately like that? However,
you can use a Q-Q plot really to assess for any
distribution that you, you know, believe or want to test that your data may
follow, such as an exponential distribution. Okay, with a normal Q-Q plot,
okay, which we will allow software to determine for us… determine or drawn by
software such as Minitab, okay, the points are going to lie perfectly on a
straight line, similar to the points that you see in this graph, at a 45
degree angle from the origin if your data set is normally distributed. Okay,
so, with a Q-Q plot in which you're trying to assess normality, your points
will lie pretty much as shown on a straight line if your data is in fact
normally distributed. Okay, so, a Q-Q plot is a way to visually assess
whether your data approximately follows a normal distribution. Okay, so, in
addition, two ways of visually assessing whether your data may or may not
follow a particular distribution. There are formal hypothesis tests that can
be run to do this… the same thing, okay, and in running these hypothesis
tests, we have to determine your empirical
cumulative distribution function, or ecdf. Okay, so, when we say empirical,
what we mean is the actual or observed data. So, empirical just means
experience, or by experience. Okay, so, this is as opposed to the theoretical
distribution that we're trying to test it against. Okay, so, cumulative
simply means that the frequency or the relative frequency is allowed to
accumulate or add up. Okay, so, on the right, you may remember this graph
that was shown in a prior video. This is a cumulative frequency graph, okay,
because the frequency is allowed to accumulate. So, you'll notice that, as we
go to the right in the graph, the frequency associated with each vertical R
accumulates, right? So, it takes the form of an upward set of stairs, right,
or almost like a set of steps. Okay, so, if we want to run a
formal hypothesis test to, for example, assess normality or some other
distribution for your data, we're going to need to generate the empirical CDF,
and this is how we do it. Very simple. So, the first step that you want to do, which I'm going to show here in column one, is to take
your observed or empirical data, your actual data, and list it ascending. Okay, so, you'll notice that this data goes
from 0 all the way up to 0.42 and it's ascending. Okay, so, that's step one-
sort your data ascending. Next, in column two, we're going
to assign an order number to each row. Very simple. There's
actually 20 rows of data here, so each row gets a… a… a… ascending order,
number 1, 2, 3, 4, 5, all the way up to 20. Okay, step three is associated
with column 3. That's where we specify our sample size- in column 3. There
are 20 pieces of data, 20 rows. So, the value 20 is
listed in all rows in column 3. Okay, then, in column 4, to get our empirical
cumulative distribution function, which is what we're trying to get, we
divide K by N. Okay, so, for Row 1, 1 over 20, or K Over N equal .05. For Row
2, K Over N is 2 over 20 or .1. Okay, now, what you'll notice about column
four is that the value jumps by 0.005 with each row, right? So, column 4 is actually a step function that jumps by 1 / n .051, 0.152,
all the way up to the value one. Okay, so, what's happening here is that this
value is accumulating, right? So, hence our cumulative distribution function.
So, it's that step function that’s similar to what
we saw on the previous slide. We wish to thank the National
Science Foundation under Grant 233582 for supporting our work. Thank you for watching. |