|
V4
Descriptive Statistics Welcome back to the descriptive
statistics videos. This is part three of the videos on descriptive
statistics. I'm Renee Clark from the Swanson School of Engineering. In this
part three video, the… our agenda items are defining what is a data
distribution. We then move on to the concept of symmetry, or a symmetric
distribution, and then we talk about the concepts of skewness and kurtosis,
both of which are descriptive statistics which describe or measure the shape
of a distribution. Okay, so first, what is a data distribution? Okay, down in the… at the bottom of
the screen, these are some pictures of various data distributions. Okay, so,
a data distribution is the shape of a graph when all the possible values of
your variable, for example perhaps we're looking at the variable height,
okay, when all those possible values of your variable are plotted on the
x-axis. Okay, so, a variable such as height would be plotted on the x-axis.
Okay, and maybe those values, for example, range from 6 to 7 foot, right? Maybe we are measuring basketball players,
for example. Okay, now, how often each x value occurs, or each value of your
variable occurs, is shown on the y-axis. Okay, so on the y-axis of the graph,
that represents a frequency, okay, of how often each particular
value of height occurs. So, 6ft… 6ft heights may occur with that frequency, whereas 7ft heights may occur with
that frequency. Okay, concept of symmetry: a
distribution will be symmetric if a vertical line through its center divides
it into two halves that are mirror images of one another. Okay, so, picture
those two halves kind of folding nicely together if
that vertical line serves as an axis, okay? So, the distribution that is
pictured on the right is the normal distribution, right? It is indeed
symmetric, okay, and that's because half of the data is to the left of the
center line, right? Half the data, or half the area, is to the left and the
other half is to the right of the the
center line. Okay, the opposite of symmetry or being symmetric is known as being skewed or
skew. Okay, if a distribution is not symmetric, then it is skewed. Okay, we
have two possibilities for skewed. A distribution can be skewed to the right,
meaning it has a long right tail such as shown here. Or a distribution can be
skewed to the left such that it has a long left tail
such as shown here. Okay, what would be examples of data that might be skewed
left or skewed right? Okay, in terms of skewed left, time to fail data is
often skewed left. Okay, so, if your… our variable is time to fail, which is
shown on the x- axis, okay, you hope that many fewer items, which is
represented as your frequency on the y-axis, many fewer items will fail
early. Okay, and as we go along in time then, as you know, we get to a larger
time. We hope that then, or expect, that many more items will fail at a later time. Okay, what's an example of data that
might be skewed right? Salary data. Okay, so for example, we expect many more
people to have a lower salary. Okay, many fewer people to
have a higher salary which would be represented off to the right along the
x-axis. Okay, there are descriptive
statistics that measure shape. Okay, so, skewness is a number that measures
the lack of symmetry in a distribution. Okay, such as the distribution that's
shown right there. This is a skewed right distribution. It has a certain lack
of symmetry. Now, as the symmetry of a distribution increases, that skewness
number, or that skewness value, approaches zero. And if your skew value is
exactly zero, then you have a perfectly symmetrical
distribution. Okay, now in terms of a skew
value in… you… serving as a descriptive statistic for shape, we will allow
software to calculate that for us. So, software such as mini tab or Excel, or
other statistical pass package that you might use,
will then simply interpret it using rules of thumb. Okay, so, here are some
rules of thumb relative to skewness. We say that a distribution is highly
skewed if its skew value is greater than one or less than negative one. Or
another way to say that a distribution will be highly skewed is the skew
value has an absolute value greater than one. Okay, a distribution is said to
be moderately skewed if its skew value is somewhere between 0.5 and 1 or 0.5
and -1. Another way to say that’s moderately skewed if its absolute value is
somewhere between 0.5 and 1. In contrast, we say a distribution is fairly symmetric.
Okay, if its skew is low, meaning somewhere between negative 1/2 and ½,
another way to say that is that its absolute value is less than 0.5. Okay,
there is a second descriptive statistic that measures the shape of a
distribution and it's that of kurtosis. Okay, kurtosis measures either the
pointiness or the flatness of a distribution. Okay, if you get a negative
kurtosis value, or software calculates a negative kurtosis value for you,
that means your distribution is relatively flat compared to the normal
distribution. Okay, so, the normal distribution, in this case, is our
comparison. If you get a positive kurtosis value, that says that your
distribution is a little bit taller or has a higher peak than the normal
distribution. Okay, you can associate positive with peak to keep those
straight. Okay, so, over here on the right I'm going to overwrite in red the
normal distribution. So, there's the normal distribution in red. Okay, in
green is a distribution that has a positive kurtosis. Okay, because it's
taller or has a higher peak than the normal. Okay,
and then in orchid I am going to overlay a
distribution that has a negative kurtosis because it's a lot flatter than the
normal distribution in red. Okay, why is all of this
important? Well, because the…the statistical methods that we're going to use
in this course… course or, you know, in other courses you're going to
encounter, often require approximately normal data or approximately normal
distributions. Okay, now for perfectly normal data. Both the skew and the kur… kurtosis values will each be equal to zero. Okay,
with real world data that is actually unlikely to
occur. However, we can go with approximate normality, right? Okay, so, in the
case of kurtosis, if your CTO kurtosis value is less than or equal to one in
absolute value, that provides a very good approximation to the normal
distribution. Although within two in absolute value is also acceptable just as we, when
we were talking about skew, we talked about fairly or fairly
symmetrical, right? So, in the case of kurtosis, we can work within
the bounds of approximate normality in order to use
the statistical methods that we… we're going to be using. But, again, we're
going to let mini tab, Excel, or other software calculate the kurtosis for
us. We will interpret it. Thank you to the National Science
Foundation Grant number 233 5802 for supporting our work. Thank you for
listening. |