V3 Descriptive Statistics

 

Hi everybody, Renee Clark from the University of Pittsburgh Swanson School of Engineering. This is part two of the descriptive statistics video. In the part two video, we will discuss two additional types of descriptive statistics, or statistics for variability: the range and the interquartile range. In addition, we will also discuss the important concept and Rel concept of percentiles.

Okay, in part one of the… of the video series, we talked about standard deviation and variance as our first two measures of variability. Two additional measures of variability are the range and the interquartile range. Okay, you've probably heard of the range before. It's actually the simplest of the statistics for variability. It's defined as the difference between the largest and the smallest data points in a set of data that you might have.

Okay, so let's say we have a set of data such as this. It has 10 data points. If we wanted to determine the range for this, we would make sure our data points are ordered. Okay, and then we would take our largest value in our data set, subtract off our… our smallest value (in this case two). Our range for this set of data would be 54. Now, an issue… large issue with the range is that it is extremely sensitive to outliers.

So, in pops the interquartile range, which is much less sensitive and more robust to outliers. Okay, it's defined as the spread, or the amount of the spread, of the middle 50% of your data. Okay, so let's step through how you would determine the interquartile range. Okay, so, we're going to take the same set of data here as we had initially. Okay, it's ordered. There are 10 pieces of data. Okay, to determine the interquartile range which we abbreviate IQR, okay what you need to do is determine the middle point of your data. Okay, that actually is right here where this red line is. Okay, because with 10 pieces of data, there are five items below the vertical line and five items above. So, the five and five add to 10. This is your middle point. We will define Q1, or the first quartile, as the median or the middle of the lower half of this data. So, that's right there. We will Define Q3, or the third quartile, as the median or the middle of the upper half of the data. That will be right there. Okay, the interquartile range is then Q3 - Q1 or, in this case, 19 - 7 or 12. Notice that there's a large difference between our two different measures of variability. The range equals 54, the interquartile range is 12. Okay, and that's because of this outlier in the data which really affects the value of the range. So, when you have outliers, it's better to use the interquartile range in this case to measure your variability.

Okay, onto the related concept- very important concept of percentiles or a percentile. Okay, so, if I tell you your exam score is at the 80th percentile, then what that should tell you is that 80% of the exam scores are below yours. Okay, so I have some exam data here on the right and there are 10 scores or 10 pieces of examp Sam data (sample size of 10). The 80th percentile is right here at the score of 95. Okay, that's the 80th percentile.

Okay, and that's because 8 out of 10 of the scores, or 80% of them, are below the score of 95. Okay, so, 80th percentile is actually pretty good, right? If you're in the 80th percentile, quite good. Okay, now, quartiles (which we talked about on the previous slide) are specific types of percentiles. Okay, remember quartiles. Q1 is the first quartile, Q3 is the third quartile, and when we subtracted Q1 from Q3, we got the interquartile range. Interquartile range, which represented the spread of The Middle 50% of the data. Okay, but quartiles are special types of percentiles in that they divide your ordered data set into four parts, each having the same sample size. Okay, so, picture here that you have some ordered data.

Okay, the first quartile, or Q1, is that point in your data such that 25% of the data points are below it. Okay, so, likewise Q3, or the third quartile, okay, is the point in your data set such that 75% of the data points are below Q3. Okay, 25 + 25 + 25 will give you the 75%. Okay, so, you can see why they… we call Q3 minus Q1 the spread of the middle 50%, right? Because it's when you subtract the 25% below Q1 from the 75% below Q3, you'll get that middle 50%. Okay, Q2 here in your data is the point in your data such that 50% of the points are below it, right? So, it would be 25 + 25. Q2 is also known as the median or the middle point. Thank you for watching.