V35 Hypothesis Testing

Welcome to part 11 of our video series in support of hypothesis testing. In this video, we are going to discuss a Q-Q plot, or a quantile-quantile plot and the concept of an empirical cumulative distribution function. I'm Renee Clark from the Swanson School of Engineering at the University of Pittsburgh.

Okay, first let's discuss a Q-Q plot. Okay, this is shorthand for quantile-quantile plot. Okay, and a Q-Q plot… an example of one is shown right there. Q-Q plots are often used to visually determine the normality of a data set that you might have. Okay, so, for example, these are used to answer the question: is my data set normally distributed? You know, does my data look approximately like that? However, you can use a Q-Q plot really to assess for any distribution that you, you know, believe or want to test that your data may follow, such as an exponential distribution.

Okay, with a normal Q-Q plot, okay, which we will allow software to determine for us… determine or drawn by software such as Minitab, okay, the points are going to lie perfectly on a straight line, similar to the points that you see in this graph, at a 45 degree angle from the origin if your data set is normally distributed. Okay, so, with a Q-Q plot in which you're trying to assess normality, your points will lie pretty much as shown on a straight line if your data is in fact normally distributed. Okay, so, a Q-Q plot is a way to visually assess whether your data approximately follows a normal distribution. Okay, so, in addition, two ways of visually assessing whether your data may or may not follow a particular distribution. There are formal hypothesis tests that can be run to do this… the same thing, okay, and in running these hypothesis tests, we have to determine your empirical cumulative distribution function, or ecdf.

Okay, so, when we say empirical, what we mean is the actual or observed data. So, empirical just means experience, or by experience. Okay, so, this is as opposed to the theoretical distribution that we're trying to test it against. Okay, so, cumulative simply means that the frequency or the relative frequency is allowed to accumulate or add up. Okay, so, on the right, you may remember this graph that was shown in a prior video. This is a cumulative frequency graph, okay, because the frequency is allowed to accumulate. So, you'll notice that, as we go to the right in the graph, the frequency associated with each vertical R accumulates, right? So, it takes the form of an upward set of stairs, right, or almost like a set of steps.

Okay, so, if we want to run a formal hypothesis test to, for example, assess normality or some other distribution for your data, we're going to need to generate the empirical CDF, and this is how we do it. Very simple. So, the first step that you want to do, which I'm going to show here in column one, is to take your observed or empirical data, your actual data, and list it ascending. Okay, so, you'll notice that this data goes from 0 all the way up to 0.42 and it's ascending. Okay, so, that's step one- sort your data ascending.

Next, in column two, we're going to assign an order number to each row. Very simple. There's actually 20 rows of data here, so each row gets a… a… a… ascending order, number 1, 2, 3, 4, 5, all the way up to 20. Okay, step three is associated with column 3. That's where we specify our sample size- in column 3. There are 20 pieces of data, 20 rows. So, the value 20 is listed in all rows in column 3. Okay, then, in column 4, to get our empirical cumulative distribution function, which is what we're trying to get, we divide K by N. Okay, so, for Row 1, 1 over 20, or K Over N equal .05. For Row 2, K Over N is 2 over 20 or .1. Okay, now, what you'll notice about column four is that the value jumps by 0.005 with each row, right? So, column 4 is actually a step function that jumps by 1 / n .051, 0.152, all the way up to the value one. Okay, so, what's happening here is that this value is accumulating, right? So, hence our cumulative distribution function. So, it's that step function that’s similar to what we saw on the previous slide.

We wish to thank the National Science Foundation under Grant 233582 for supporting our work. Thank you for watching.