|
V34
Hypothesis Testing Welcome to Part 10 of our video
series in support of hypothesis testing. In this video, we are going to cover
a contingency table which is used in a test of independence, and we're going
to discuss the marginal frequencies or totals within a contingency table as
well as the cell probabilities of the contingency table. I'm Renee Clark from
the Swanson School of Engineering at the University of Pittsburgh. Okay, so, what is a contingency
table? An example of a contingency table is shown here on the right. Okay, in
general, a contingency table is a matrix that cross
classifies observations or data, and that's what's shown in the middle here,
based on categorical variables. Okay, the category… categorical variables are
indicated along the left hand side and across the
top. Now, as you recall, categorical variables, which have categories, are
qualitative variables, okay, as opposed to quantitative variables. Okay, in
this particular contingency table, the two
categorical variables are Hawaiian island and sweetness rating. Okay, so, Hawaiian island has the
three categories at least with this example, or with this data set of Maui,
Kawai, and Oahu. Those are the three different categories. Sweetness rating has
two categories in this example, and those two categories are high sweetness
rating and super high sweetness rating. Okay, so, for example, based on this
contingency table, there are 23 high sweetness pineapples from Oahu, okay,
because 203… the count of 203 occurs at the intersection of high and Oahu
categories. Okay, this particular sample of data has
1,000 total pineapples represented there. Okay, in general, a contingency
table has R rows and C columns, and we… you'll often see it referred to as an
R by C table. Okay, so, in this example, there are two rows and three columns.
Okay, so, this is a 2x3 contingency table. Okay, now, in a contingency table,
the row and the column totals, which I'm going to circle here, these are the
column totals, these are the row totals. Okay, those totals are called
marginal totals or frequencies (either one). Okay, so, the marginal
probability of a pineapple being from Maui with this data set is, we would
take the marginal total associated with Maui of 336, divided by the total
number of pineapples in the data set. Okay, so that would be the marginal
probability associated with a pineapple being from Maui in this… in this
example. So, likewise, the marginal probability of having a super high
sweetness level would be 402 over 1,000. Okay, we could… we could label that
as the probability of s for being associated… with being a super high
pineapple is 402 out of 1000. Again, this is the marginal probability for the
super high sweetness level. Okay, so, these concepts, and certainly
contingency tables in general, are used in tests of independence, which are
another type of hypothesis test that we will be learning about. Okay, so, let's discuss cell
probabilities associated with our contingency table. In
order to do that, I'm going to bring to your attention, or have you
recall, the following concept or theorem that you learned about in your
probability studies. Okay, and that is the following. If you've got two indep…two independent events, okay, the… the probability
that they both occur, which you could write such as… such the probability of
a and b, or the probability of A intersect B, the probability on the left is actually equal to the product of the individual
probabilities. So, the product on the left equal the
probability of A times the probability of B, assuming that A and B are
independent events. Now that's theorem 2.11 in your book. Okay, now, if we're
running a test of independence then what we do is we null hypothesize that
our categorical variables, so in this case, in our example, Hawaiian island
or Hawaiian origin and sweetness level or sweetness rating, we null
hypothesize that those two variables are independent variables. Okay, and the
same sense that we're talking about right here. Okay. Okay, so… so, assuming,
as we will, that these two variables in our contingency table are independent.
Okay, then, if we want to find the probability that two events occur, okay,
then we proceed as follows. For example, let's take number
one. If we want to determine the probability that a pineapple is from Maui,
and it has a high sweetness level, okay, since we're assuming those two
variables to be independent, that's equal to the product of the probability
of being from Maui and the probability of having a high sweetness level. Okay,
which is equal to… in order to calculate… culate this, we're going to use the marginal frequencies.
Okay, so, the probability of being from Maui is equal to its marginal total
or frequency of 336 over 1,000 total data points. Okay, the probability of
being of a high sweetness rating. Okay, we're going to use its marginal total
of 598 over 1,000 total observations. Okay, so, recapping then the
probability that a pineapple is from Maui and has a high sweetness rating is
equal to 336 over 1000 times that by 598 over 1000. Okay, so, we are actually multiplying here two marginal probabilities. Okay, let's try another one. What's
the probability? Let's try this one. What's the probability that a pineapple
is from Kawai and it has a super high sweetness
rating? Okay, so, we're going to use the marginal probabilities for that,
assuming the events are independent. Probability of K and S is the
probability of K times the probability of S. Okay, what's the probability of
being from Kawai? Going to use its marginal total of 351 over 1,000 total
observations. Okay, what's the probability of having a super high sweetness level?
402, or its marginal total, over 1,000 total observations. Okay, so, then
this probability is the product of 351/ 1000 * 402 / 1000. Okay, so, what we
have here are six different probabilities. These are called cell
probabilities because they are associated with each of these six cells of the
contingency table. Okay, so, a cell occurs at the intersection of a
particular value of your column variable, in this case Hawaiian island and
sweetness rating. Okay, high or super high- these cell probabilities, okay,
which we obtain by taking products of the marginal probabilities
of these independent variables. We will be using these in our test of independence.
We wish to thank the National
Science Foundation under Grant 233582 for supporting our work. Thank you for watching. |