|
V5
Descriptive Statistics Welcome to our fourth video, or
part four, of descriptive statistics. I'm Renee Clark
from the Swanson School of Engineering at Pitt. Okay, in this part four video,
our agenda is as follows…is as follows. We're going to talk about graphical
displays of data. Okay, in particular, a histogram,
bar chart, stem and leaf plot, and a cumulative frequency graph. Okay, so, the first step in any
data analysis that you want to do is graphically summarizing
your data because all good analysts know what their data looks like. Okay, so,
the first two we're going to talk about are the histogram and the bar chart.
Okay, both are shown on this slide and they both look very similar and actually accomplish a similar thing. The difference is
that one is used for continuous data, meaning the histogram. Okay, and the
other, the bar chart, is used for discrete count, or categorical, or
qualitative data. Okay, so let's talk about the
histogram first. Okay, so, the histogram is used to graphically summarize
continuous data. So, an example of a continuous variable… able or continuous
data would be weight. For example, in pounds, right? Because weight is
continuous because its decimal portion can theoretically go on and on and on,
right? You can have something that weighs 10.55 76 blah blah blah pounds. Of
course, the restriction being can you measure out that far? But,
theoretically speaking, number of decimals can go on
and on and on. Okay, it's continuous data. Okay. But, a histogram, on the x-axis,
your data or your variable is represented, right? So, these are weight in
pounds. Okay, so, here's 50 lb, 70 lb ,250 lb. Okay, and what is shown on your x axis is actually intervals for weight. So, for example this
interval goes from 90 to 10. This interval goes from 150… 150 lb to 170 lb. Okay, and then the height of the bar
represents the frequency or the number of occurrences of Weights between, for
example, 90 and 110 lbs. Okay, and the vertical bars are shown as touching
because the data is continuous, right? Okay, contrast that with a bar chart
which shows the very same thing. In other words, your variable is represented
on the x axis. Okay, but, it's… it's going to be a
discrete variable or a categorical variable. Okay, so, in this case, our
discrete variable is number of children in a
household, right? And, in this data, it ranges from one child up to seven
children per household. Okay, so, for example, the number of households
having three children, at least, for this data was 11. Okay, now the bars…
the vertical bars are not shown as touching because this data is discrete,
right? And there is, for example, no concept of 5.5 children, right? Or 1.2
children. You don't have 1.2… 1.2 child, right? Okay, an example of… that's
an example of discrete or count data. An example of a qualitative variable,
or a categorical variable or data, would be, for example, M&M colors,
right? How often do the various M&M colors occur in a bag of M&M's? Okay, so, let's talk a little bit
more about a histogram. A histogram, as we said,
shows how continuous data are distributed. Okay, a
histogram shows the center of your data. Okay, so, for example, with this
weight data shown here, you know the center is roughly somewhere maybe around
160 lb. Somewhere right in there, it shows the variation or the spread of
your data. So, this data goes all the way from, I believe, 50 up to around
230 pounds. It shows how spread out your weight values are. Okay, it also
shows the shape of your data. Okay, is your data symmetric
or is your data skewed? Okay, well, this data is actually
fairly symmetric. We see this one outlying interval here which might
tend to slightly, you know, skew the data left. But, in general,
this data is fairly symmetric. Okay,
another graphical summary tool that serves essentially the same function as a
histogram or bar chart is known as a stem and leaf plot, and, actually, you can think of a stem and leaf plot as
either a histogram or a bar chart turned on its side. And we'll talk about
that in just a second. But, how was the stem and
leaf chart set up? Okay, so, let's say you have some data. Okay, and this particular data ranges from the values of 15 up to 41. Okay,
so, the first thing you want to do is determine what… what might be the
appropriate stem to represent this data. Okay, so
since we have data ranging from 15 up to 41, the appropriate stem, in this
case, are the single digits 1 2 3 and 4 that represent the first digit of
your values. Okay, and then the leaf repress…
represents the second digit. So, for example, with 15, the
stem is there: five. Then goes into the leaf. Okay,
likewise, for 16, the one is the stem, the six is another leaf. Onto that
same stem, okay, 32. Three represents the stem, two represents one of the
leaves off of that stem of three. Okay, now picture
this stem and leaf plot as rotated counterclockwise by 90°. Picture that and,
when you do so, you'll notice how that stem and leaf plot rotated looks like
a histogram or a bar chart, right? Where the number of leaves- let's take the
stem value of two in this case, the number of leaves 1, 3, 3, 6, 6, there's
five of them- represents the frequency with which values occur with stem two.
Or, another way to say that is how many values in
our data set are in their 20s? There are five of them. 1, 2, 3, 4, five. And
that gives us a relative height here of five compared to the other stems. Okay, and the final graph I want
to go over is called the cumulative frequency graph. Okay, another graphical
summary tool. Okay, on the left is a cumulative frequency table, and let's
say it represents number of visits made by a person
to a certain store. Okay, so let's say for 1 to five visits, 35 people made
one to five visits to that store. Okay, so, at this point that accumulates to
just 35, right? Okay, but, let's look at the… the
bin or the bucket of six to 10 visits. Okay, let's say 70 people made 6 to 10
visits. Okay, when you cumulate the… the 70 and the 35, you get 105. That's
our cumulative frequency at that point. Okay, let's take 11 to 15 visits. Let's
say 105 people make 11 to 15 visits. When you cumulate that 105 with the…
with the 105 in the cumulative column, you now cumulate at 210. 105 + 105. Okay,
and so on down the line. Okay, right here, at this final one, when you
cumulate that 100 with the 250, you get a final
cumulative frequency of 350. Okay, if you were to graph this data- number of
visits versus its cumulative frequency- so, 1 to 5. Here's our 35. But, our final bin of 21 to 25, the cumulative frequency
is up here at 350. When you graph it in that manner, the… the cumulative
frequency graph actually takes the appearance of a
set of stairs that's going upward. So, picture yourself walking up that…
those stairs. That's what a cumulative frequency graph looks like. Thank you to the National Science
Foundation under Grant 233 582 for supporting our work. Thank you. |