|
V36
Simple Linear Regression Welcome to part one of our video
series in support of simple linear regression. In this video, we are going to
do an introduction to regression, including the definition of this
statistical technique and uses for it. We'll review the equation of a
straight line and also introduce the concept of a residual. I'm Renee Clark
from the Swanson School of Engineering at the University of Pittsburgh. Okay, so, what is regression? It's
a statistical method or a technique that explores the relationship or
association between two or more variables. Okay, so, what are some examples
of relationships that you might be able to explore with regression. Okay,
what is the relationship between the score that is received on a statistics exam
and the hours spent studying or preparing for that exam. Okay, so, the two
variables being score and hours spent, okay, or what's the relationship
between one's happiness and one's wealth? Okay, now, regression has what we
call a random, or a probabilistic, aspect to it, right, because you can envision
how people with the same wealth may not be equally happy or have the same
degree of happiness. Okay, so, let's discuss various
views on or uses of regression. Okay, regression works with what we're going
to call XY data, sort of, as shown in this table or spreadsheet here. So,
what are some of the views on or uses of regression? Okay, the first is that
regression expresses a mathematical relationship between variables. So, in
this case, our variables would be X and Y, okay, and it does this by
determining an equation or a function between X and Y. Okay, regression
produces a model, or is a type of modeling technique, where a model can be
thought of and… as an abstraction of a real world process in an attempt to
represent reality. Okay, regression is used to
predict or to estimate the value… values of one variable, say y, from the
values of another, say x. Okay, so, with regression we can measure the degree
or the strength of the relationship between the variables. Okay, and finally,
regression is a type of data mining technique, okay, where we are able to
extract patterns that exist in the data. Okay, so, the goal in linear
regression is to fit a linear model to a set of XY data points, as shown in
that table there. Okay, so, with linear regression the model in this case
that is produced is actually a straight line. Okay, so, picture here in this
scatter plot some X versus Y data. Okay, so, it's a scatter plot of X and Y
data points. Okay, the model in this case, okay, that we would be attempting
to produce with a linear regression takes the form of a straight line that
summarizes those data points. Okay, the equation of a straight
line- do you recall what that is from math classes? It is y = mx + b, okay
where m represents the slope of the line, b represents the y-intercept, or
where that line crosses the y-axis. Okay, the result with a regression
analysis is a summarization of your XY data points in the graph via an
equation called the line of best fit, or another way to say this is it
produces the fitted line. Okay, and, like I said above, the goal is a best
fit line, okay, or a line that comes as close as possible straight line to
all of your data points. Okay, related to this is the
concept of a residual. Okay, so, in order to discuss residuals, let's start
with our plotted point which we call X sub i , Y sub i. So, this is our
plotted point that may have come out of our XY table, similar to what I
showed you on the previous screen. Okay, now, the residual is denoted by E
sub I, and that's shown right here in the graph. This is an XY graph or
scatter plot, so that residual is the vertical distance between the plotted
point and the fitted line. Okay, I'm going to just retrace the fitted line
here in green. This is the fitted line that we call… call y hat, I will label
it here to fitted line. Okay, so, again, that residual is
the vertical distance between the plotted point, which is right there, and
the fitted line. So, the residual is what I have shown there in yellow. Okay,
so, mathematically, the residual, E sub I, is Yi minus Yi hat. Okay, so,
again, y I, y sub I, is at this x i here, okay, is the vertical height of
your plotted point. That's Yi. Yi hat is the vertical height of your fitted
line at that Xi point. Okay, so, the residual is simply then the difference
between the two. Okay, the residual represents the error in the fit of the
line of Y hat, the fitted line, it's the error in the fit of the fitted line,
or Y hat, to your plotted point, or x i, y i. Okay, you want your residuals
to be small, right, because you want that fitted line to come as close as
possible to all of your plotted points. So, the smaller the residual, the… or
the smaller the residuals, the better the fit of that line to your data, and
you want good fit of a line to the data. We wish to thank the National
Science Foundation under Grant 233582 for supporting our work. Thank you for
watching. |