|
V3
Descriptive Statistics Hi everybody, Renee Clark from
the University of Pittsburgh Swanson School of Engineering. This is part two
of the descriptive statistics video. In the part two
video, we will discuss two additional types of descriptive statistics, or
statistics for variability: the range and the interquartile range. In
addition, we will also discuss the important concept and Rel concept of percentiles.
Okay, in part one of the… of the
video series, we talked about standard deviation and variance as our first
two measures of variability. Two additional measures of variability are the
range and the interquartile range. Okay, you've probably heard of the range
before. It's actually the simplest of the statistics
for variability. It's defined as the difference between the largest and the smallest
data points in a set of data that you might have. Okay, so let's say we have a set
of data such as this. It has 10 data points. If we wanted to determine the
range for this, we would make sure our data points are ordered. Okay, and
then we would take our largest value in our data set, subtract off our… our
smallest value (in this case two). Our range for this set of data would be 54.
Now, an issue… large issue with the range is that it is extremely sensitive
to outliers. So, in pops the interquartile
range, which is much less sensitive and more robust to outliers. Okay, it's
defined as the spread, or the amount of the spread, of the middle 50% of your
data. Okay, so let's step through how you would determine the interquartile
range. Okay, so, we're going to take the same set of data here as we had
initially. Okay, it's ordered. There are 10 pieces of data. Okay, to
determine the interquartile range which we abbreviate IQR, okay what you need
to do is determine the middle point of your data. Okay, that actually is right here where this red line is. Okay,
because with 10 pieces of data, there are five items below the vertical line
and five items above. So, the five and five add to 10. This is your middle
point. We will define Q1, or the first quartile, as the median or the middle
of the lower half of this data. So, that's right there. We will Define Q3, or
the third quartile, as the median or the middle of the upper half of the data.
That will be right there. Okay, the interquartile range is then Q3 - Q1 or,
in this case, 19 - 7 or 12. Notice that there's a large difference between
our two different measures of variability. The range equals 54, the
interquartile range is 12. Okay, and that's because of this outlier in the
data which really affects the value of the range. So, when you have outliers,
it's better to use the interquartile range in this case to measure your variability.
Okay, onto the related concept- very
important concept of percentiles or a percentile. Okay, so, if I tell you
your exam score is at the 80th percentile, then what that should tell you is
that 80% of the exam scores are below yours. Okay, so I have some exam data
here on the right and there are 10 scores or 10 pieces of examp
Sam data (sample size of 10). The 80th percentile is right here at the score
of 95. Okay, that's the 80th percentile. Okay, and that's because 8 out of
10 of the scores, or 80% of them, are below the score of 95. Okay, so, 80th
percentile is actually pretty good, right? If you're in the 80th percentile, quite good. Okay, now, quartiles (which we talked about on
the previous slide) are specific types of percentiles. Okay, remember
quartiles. Q1 is the first quartile, Q3 is the third quartile, and when we
subtracted Q1 from Q3, we got the interquartile range. Interquartile range,
which represented the spread of The Middle 50% of the data. Okay, but
quartiles are special types of percentiles in that they divide your ordered
data set into four parts, each having the same sample size. Okay, so, picture
here that you have some ordered data. Okay, the first quartile, or Q1,
is that point in your data such that 25% of the data
points are below it. Okay, so, likewise Q3, or the third quartile, okay, is
the point in your data set such that 75% of the data points are below Q3. Okay,
25 + 25 + 25 will give you the 75%. Okay, so, you
can see why they… we call Q3 minus Q1 the spread of the middle 50%, right? Because
it's when you subtract the 25% below Q1 from the 75% below Q3, you'll get
that middle 50%. Okay, Q2 here in your data is the point in your data such
that 50% of the points are below it, right? So, it would be 25 + 25. Q2 is
also known as the median or the middle point. Thank you for watching. |