V38 Simple Linear Regression

Welcome to part three of our video series in support of simple linear regression. In this video, we are going to discuss transformations to linearize a relationship between X and Y, if necessary. I'm Renee Clark from the Swanson School of Engineering at the University of Pittsburgh.

Okay, so, we've been talking about simple linear regression to this point, and what we've talked about is that a linear regression is only appropriate if your data, your XY data, is approximately linear… linearly related. Okay, but the question arises: what if X and Y are not linearly… linearly related? Okay, perhaps it’s evident via a scatter plot, that you might do of your XY data, such as similar to the plot on the right, that shows, not a linear relationship between X and Y, but actually more of an exponential relationship. Okay, or perhaps, you know, X and Y are not linearly related via prior experience or theory on it. So, the question is, if this is the case, is there anything we can do? Can we… can we run a linear regression on this data? Okay, the answer… answer is possibly. Okay, you may be able to transform your… your X and/or Y data to create a linear relationship.

Okay, if you proceed to try to create a linear model with curved data or nonlinear data, you're going to get poor fit, and so that is not a good approach to take. Okay, so, if X and Y are not linearly related, okay, you can do a transformation, and a transformation re-expresses your X and Y variables, or your X and Y data. Okay, we call those re-expressed variables X star and Y star. Okay, if either one of those, or both, is transformed, we call it X star and y star.

Okay, so, an example of a transformation that you might do is taking the natural log of Y, okay, which of course we would call Y star. Okay, then we proceed to run the regression of Y star, or natural log of Y, versus X. Okay, so, let's look at an example here. On the left, we've got some XY data that I have labeled with one, and if I were to graph that XY data, it would appear as such. Okay, so, as you can see, it does not have a linear relationship. X and Y are not linearly related. It's more of a… it's a curved relationship. Okay, so, now look at the data on the right. The X values remain the same, but the Y values have been transformed by taking the log base 10 of them. Okay, so, you'll see that the Y values differ. If I then go and plot X on the x axis and log y on the y-axis, it actually results in a linear relationship between X and transformed y, or log y.

Okay, so, some common transformations to achieve linearity include logarithmic trans… transformations, either base 10 or base e, of course, we know base e is natural log, or reciprocal transformation, okay, and either X can be transformed with… with one of these, y can be transformed, or perhaps both. Okay? Okay, so, another example. Let's say you create a scatter plot of your data, and let's say you get data that, you know, a plot that looks curved as such. Okay? Okay, and you suspect, based on that, that X and Y are, perhaps, exponentially related. Not linear… linearly related, but exponentially related, okay, in which case the form between X and Y, the functional form looks like the following: y equal beta 0 * e to the beta 1 x. Okay, if X and Y are exponentially related, then the appropriate transformation in this case is to take the natural log of Y, okay, which again we call Y star, will then regress y star ,or natural log of y, on X, and parameters will be estimated via the usual method of least squares estimation. Okay, so, these are some common transformations that can…that can linearize X and Y.

Okay, and, if you look in this column here, you'll see that, as we said, they involve the natural log, or log base 10, or reciprocals of X Y, or both. So, as you can see, in some cases, both variables are transformed like in those two cases. Or, in other cases, just one of the variables is transformed. Okay, and this is all driven by the form of the relationship between X and Y. You know, perhaps the relationship between them is exponential, okay, or it could be a power relationship, as given by that functional form. It could be a reciprocal relationship, or it could be a hyperbolic relationship. Okay, the… the thing with transformations is it's actually quite a trial and error process, okay, for a couple reasons. You may not know the true relationship between X and Y, okay, so you… you don't know really which functional form holds, right? Is it exponential? Is it power? Is it reciprocal? Is it hyperbolic? Is it something else? In addition, it may be hard to differentiate among the possible relationships.

Let's look at the… at the graphs on the right. So, here in the exponential relationship, it can look something like this. But, notice that a power relationship between X and Y is also sort of curved in a similar way. Okay, going back to the exponential, it could also be curved this way. Well, power relationship, you know, similar as well, as is the reciprocal, right? So, it can be hard to differentiate visually as to what the… the true relationship or functional form may be between X and Y. So, actually what's done is, again, it's trial and error. It's typical to try various transformations, such as these, of your X and Y data, go ahead and plot the transformed data, and then choose the transformation that appears… that… that appears to be most linear, or to create the most linear relationship between your transformed variables, and then run your regression based on your transformed variables.

Okay, we wish to thank the National Science Foundation under Grant 2335802 for supporting our work.

Thank you for watching.