|
V38
Simple Linear Regression Welcome to part three of our
video series in support of simple linear regression. In this video, we are
going to discuss transformations to linearize a relationship between X and Y,
if necessary. I'm Renee Clark from the Swanson School of Engineering at the
University of Pittsburgh. Okay, so, we've been talking
about simple linear regression to this point, and what we've talked about is
that a linear regression is only appropriate if your data, your XY data, is
approximately linear… linearly related. Okay, but the question arises: what
if X and Y are not linearly… linearly related? Okay, perhaps it’s evident via
a scatter plot, that you might do of your XY data, such as
similar to the plot on the right, that shows, not a
linear relationship between X and Y, but actually more of an exponential
relationship. Okay, or perhaps, you know, X and Y are not linearly related
via prior experience or theory on it. So, the question is, if this is the
case, is there anything we can do? Can we… can we run a linear regression on
this data? Okay, the answer… answer is possibly. Okay, you may be able to transform
your… your X and/or Y data to create a linear relationship. Okay, if you proceed to try to
create a linear model with curved data or nonlinear data, you're going to get
poor fit, and so that is not a good approach to take. Okay, so, if X and Y
are not linearly related, okay, you can do a transformation, and a
transformation re-expresses your X and Y variables, or your X and Y data. Okay,
we call those re-expressed variables X star and Y star. Okay, if either one
of those, or both, is transformed, we call it X star and y star. Okay, so, an example of a
transformation that you might do is taking the
natural log of Y, okay, which of course we would call Y star. Okay, then we
proceed to run the regression of Y star, or natural log of Y, versus X. Okay,
so, let's look at an example here. On the left, we've got some XY data that I
have labeled with one, and if I were to graph that XY data, it would appear
as such. Okay, so, as you can see, it does not have a linear relationship. X
and Y are not linearly related. It's more of a… it's a curved relationship. Okay,
so, now look at the data on the right. The X values remain the same, but the
Y values have been transformed by taking the log base 10 of
them. Okay, so, you'll see that the Y values differ. If I then go and
plot X on the x axis and log y on the y-axis, it actually
results in a linear relationship between X and transformed y, or log y.
Okay, so, some common
transformations to achieve linearity include logarithmic trans… transformations,
either base 10 or base e, of course, we know base e is natural log, or
reciprocal transformation, okay, and either X can be transformed with… with
one of these, y can be transformed, or perhaps both. Okay? Okay, so, another
example. Let's say you create a scatter plot of your data, and let's say you
get data that, you know, a plot that looks curved as such. Okay? Okay, and
you suspect, based on that, that X and Y are, perhaps, exponentially related.
Not linear… linearly related, but exponentially related, okay, in which case
the form between X and Y, the functional form looks like the following: y
equal beta 0 * e to the beta 1 x. Okay, if X and Y are exponentially related,
then the appropriate transformation in this case is to take the natural log
of Y, okay, which again we call Y star, will then regress y star ,or natural
log of y, on X, and parameters will be estimated via the usual method of
least squares estimation. Okay, so, these are some common transformations
that can…that can linearize X and Y. Okay, and, if you look in this
column here, you'll see that, as we said, they involve the natural log, or
log base 10, or reciprocals of X Y, or both. So, as you can see, in some
cases, both variables are transformed like in those two cases. Or, in other
cases, just one of the variables is transformed. Okay, and this is all driven
by the form of the relationship between X and Y. You know, perhaps the
relationship between them is exponential, okay, or it could be a power
relationship, as given by that functional form. It could be a reciprocal
relationship, or it could be a hyperbolic relationship. Okay, the… the thing
with transformations is it's actually quite a trial and
error process, okay, for a couple reasons. You may not know the true
relationship between X and Y, okay, so you… you don't know really
which functional form holds, right? Is it exponential? Is it power? Is
it reciprocal? Is it hyperbolic? Is it something else? In addition, it may be
hard to differentiate among the possible
relationships. Let's look at the… at the graphs
on the right. So, here in the exponential relationship, it can look something
like this. But, notice that
a power relationship between X and Y is also sort of curved in a similar way.
Okay, going back to the exponential, it could also be curved this way. Well,
power relationship, you know, similar as well, as is the reciprocal, right? So,
it can be hard to differentiate visually as to what the… the true
relationship or functional form may be between X and Y. So, actually what's done is, again, it's trial and error. It's
typical to try various transformations, such as these, of your X and Y data, go
ahead and plot the transformed data, and then choose the transformation that
appears… that… that appears to be most linear, or to create the most linear
relationship between your transformed variables, and then run your regression
based on your transformed variables. Okay, we wish to thank the
National Science Foundation under Grant 2335802 for supporting our work. Thank you for watching. |