V5 Descriptive Statistics

Welcome to our fourth video, or part four, of descriptive statistics. I'm Renee Clark from the Swanson School of Engineering at Pitt.

Okay, in this part four video, our agenda is as follows…is as follows. We're going to talk about graphical displays of data. Okay, in particular, a histogram, bar chart, stem and leaf plot, and a cumulative frequency graph.

Okay, so, the first step in any data analysis that you want to do is graphically summarizing your data because all good analysts know what their data looks like. Okay, so, the first two we're going to talk about are the histogram and the bar chart. Okay, both are shown on this slide and they both look very similar and actually accomplish a similar thing. The difference is that one is used for continuous data, meaning the histogram. Okay, and the other, the bar chart, is used for discrete count, or categorical, or qualitative data.

Okay, so let's talk about the histogram first. Okay, so, the histogram is used to graphically summarize continuous data. So, an example of a continuous variable… able or continuous data would be weight. For example, in pounds, right? Because weight is continuous because its decimal portion can theoretically go on and on and on, right? You can have something that weighs 10.55 76 blah blah blah pounds. Of course, the restriction being can you measure out that far? But, theoretically speaking, number of decimals can go on and on and on. Okay, it's continuous data. Okay.

But, a histogram, on the x-axis, your data or your variable is represented, right? So, these are weight in pounds. Okay, so, here's 50 lb, 70 lb ,250 lb. Okay, and what is shown on your x axis is actually intervals for weight. So, for example this interval goes from 90 to 10. This interval goes from 150… 150 lb to 170 lb. Okay, and then the height of the bar represents the frequency or the number of occurrences of Weights between, for example, 90 and 110 lbs. Okay, and the vertical bars are shown as touching because the data is continuous, right? Okay, contrast that with a bar chart which shows the very same thing. In other words, your variable is represented on the x axis. Okay, but, it's… it's going to be a discrete variable or a categorical variable. Okay, so, in this case, our discrete variable is number of children in a household, right? And, in this data, it ranges from one child up to seven children per household. Okay, so, for example, the number of households having three children, at least, for this data was 11. Okay, now the bars… the vertical bars are not shown as touching because this data is discrete, right? And there is, for example, no concept of 5.5 children, right? Or 1.2 children. You don't have 1.2… 1.2 child, right? Okay, an example of… that's an example of discrete or count data. An example of a qualitative variable, or a categorical variable or data, would be, for example, M&M colors, right? How often do the various M&M colors occur in a bag of M&M's?

Okay, so, let's talk a little bit more about a histogram. A histogram, as we said, shows how continuous data are distributed. Okay, a histogram shows the center of your data. Okay, so, for example, with this weight data shown here, you know the center is roughly somewhere maybe around 160 lb. Somewhere right in there, it shows the variation or the spread of your data. So, this data goes all the way from, I believe, 50 up to around 230 pounds. It shows how spread out your weight values are. Okay, it also shows the shape of your data. Okay, is your data symmetric or is your data skewed? Okay, well, this data is actually fairly symmetric. We see this one outlying interval here which might tend to slightly, you know, skew the data left. But, in general,  this data is fairly symmetric. Okay, another graphical summary tool that serves essentially the same function as a histogram or bar chart is known as a stem and leaf plot, and, actually, you can think of a stem and leaf plot as either a histogram or a bar chart turned on its side. And we'll talk about that in just a second. But, how was the stem and leaf chart set up? Okay, so, let's say you have some data. Okay, and this particular data ranges from the values of 15 up to 41. Okay, so, the first thing you want to do is determine what… what might be the appropriate stem to represent this data. Okay, so since we have data ranging from 15 up to 41, the appropriate stem, in this case, are the single digits 1 2 3 and 4 that represent the first digit of your values. Okay, and then the leaf repress… represents the second digit.

So, for example, with 15, the stem is there: five. Then goes into the leaf. Okay, likewise, for 16, the one is the stem, the six is another leaf. Onto that same stem, okay, 32. Three represents the stem, two represents one of the leaves off of that stem of three. Okay, now picture this stem and leaf plot as rotated counterclockwise by 90°. Picture that and, when you do so, you'll notice how that stem and leaf plot rotated looks like a histogram or a bar chart, right? Where the number of leaves- let's take the stem value of two in this case, the number of leaves 1, 3, 3, 6, 6, there's five of them- represents the frequency with which values occur with stem two. Or, another way to say that is how many values in our data set are in their 20s? There are five of them. 1, 2, 3, 4, five. And that gives us a relative height here of five compared to the other stems.

Okay, and the final graph I want to go over is called the cumulative frequency graph. Okay, another graphical summary tool. Okay, on the left is a cumulative frequency table, and let's say it represents number of visits made by a person to a certain store. Okay, so let's say for 1 to five visits, 35 people made one to five visits to that store. Okay, so, at this point that accumulates to just 35, right? Okay, but, let's look at the… the bin or the bucket of six to 10 visits. Okay, let's say 70 people made 6 to 10 visits. Okay, when you cumulate the… the 70 and the 35, you get 105. That's our cumulative frequency at that point. Okay, let's take 11 to 15 visits. Let's say 105 people make 11 to 15 visits. When you cumulate that 105 with the… with the 105 in the cumulative column, you now cumulate at 210. 105 + 105. Okay, and so on down the line. Okay, right here, at this final one, when you cumulate that 100 with the 250, you get a final cumulative frequency of 350. Okay, if you were to graph this data- number of visits versus its cumulative frequency- so, 1 to 5. Here's our 35. But, our final bin of 21 to 25, the cumulative frequency is up here at 350. When you graph it in that manner, the… the cumulative frequency graph actually takes the appearance of a set of stairs that's going upward. So, picture yourself walking up that… those stairs. That's what a cumulative frequency graph looks like.

Thank you to the National Science Foundation under Grant 233 582 for supporting our work. Thank you.