Okay, In our previous videos, we talked about how to assess whether or not your data and meet the assumptions of equal variance and normality for a general linear model. We also talked about how to determine whether or not any violations of those assumptions that you might see are actually violations that you need to worry about. In other words, we talked about how to determine whether or not any violations of the assumptions of Syria and a serious enough that we need to do something about them. In this video, we're going to talk about one of the most common ways of trying to deal with serious assumption violation. And that is to transform our data. And that's we're going to focus on in this video and several videos to follow. Data transformation essentially involves changing the scale of our measurements. And exactly what that means will become clearer over the next few videos. The goal of our current video right now is just to simply show you, in a practical sense, what does this actually mean? How do you go about actually transforming your data when you're analyzing them in R, for example. But similar types of approaches to be used in any other statistical package. So to do this, we're going to use RStudio. We're going to use the same a data set are the same codes that we used. And a video which I assume you over just watched, where we use some code to randomly generate data that did or did not meet the assumptions of a general linear model. And the code that I have here is the code that we use to generate a dataset that did not meet the assumptions of equal variance. So I'm just gonna highlight that code, run it, and let's just have a quick look at it and we can look at the dataset itself just by highlighting only the dataset's name. I'm going to say run. And you can see here's a dataset, okay? This is just to remind you of what these data look like. We have one column which is called fields in it. Either has an a, B, or C in it. And then we have another column which I've called yield. Yield's dot header row. To remind us that these data represent yield data in an experiment. But these data have been set up to violate the assumption of equal variance. So we have heterogeneous variance. Okay, So that's sort our data look like they're going to be working with. We can just box plot of these data just to remind ourselves of what these data really look like. So again, we have yields dot hetero as our dependent variable, fields as our independent variable. And then we'll just name it our DataFrame run. And here's our output. And you can see that as our means get larger. So, so the, the mean value for treatment a or field type a is of around a 100. And that meat increases. I was to go from a to B to C. You can see that the variation in these box plots also tends to increase as you go from a to B to C. What did the residuals look like for a model of these data? Again, just quickly remind you we did the same thing and in our previous video. But we're running at a linear model where we're saying yield sought hetero is our dependent variable, fields is our independent variable. And then we're just naming our data. And we're saving the output of this linear model in an object called M2. There was same plot M2 in order to visualize the residuals. So we'll run that. And you can see here pretty much what we expected. Based on the box plot. You can see that as you move from left to right, there is a change and the spread of our residuals. So as you move from left to right, the spread in the residuals becomes more exaggerated. And so this would be a cause for concern. And we'd want to do something about this. We'll skip the rest of our plots. As I've already implied. The thing that we're going to try and do is we're going to try transforming the data. And We can transform the data by adding a simple mathematical function directly to our LM function. I'll just show you what this function is. Let's show you what. I'll just illustrate this in this bottom left-hand panel. Let's see, I've got the value for. If you want to take the natural log of 4, then we use the log function, say log 4. And the natural log of 4 is equal to 1.38. So this function log will take the natural log of any number you give it, and return the value of the natural log of that number you gave it. Okay? We're going to try taking the natural log of all of these data. And we'll do that simply by adding this function log with brackets around our term for our dependent variable. So what's this going to do? Let's go back and look at our dependent variable. So Here's our dependent variable. If we take, if we say log and the name of our dependent variable, What's going to happen is that r is going to take the natural log of every single one of these data points separately. And then it's going to use those values, the value of the natural log of each of these data points. And then R will use those logged values as the dependent variable in our linear model. Okay? So in other words, we're applying the log function to every single data point that's found in this column called hetero yield. So our yield hetero and that it's those logged data that will be analyzed by the lm function. So what happens when we do this? I'm just going to change the name of our object to call it something different called m2 dot b now, and we're just going to plot the residuals of, of this, of this output when we conducted our analysis based on log transformed data to say Run. Now, I can look at the plot and you can see that now we get a fairly different perspective on our data. Where now you can see that we almost have the opposite trend. Now we've got a fair amount of spread for our smaller values, for our, our treatment with our smallest Mean. And then as you move from left to right, the spread actually becomes decreased. Okay? So we can see that log transforming our data has changed our residual plots, but it's not done in a way that really makes us happy. Because it looks like we've essentially just flipped the nature of our problem. So let's try something else. Instead of saying log, instead of taking the log of all of our data points, Let's try taking the square root. And the square root function is just given here, SQRT. And I'll just change this again. So now we're looking at a new object and let's see what we get. Let's run. Now. This looks better. Okay, now our, our analysis is based on data where we've taken the square root of every single one of our original data points and we've run our analysis on those square rooted data. And now when we look at the residual plot, you can see that we are much happier with the assumption of equal variance. I'm happy with this. The amount of variation and the residuals is pretty comparable for all three of our treatments. Let's see what happens for normality. You can see that for net, for normality it's not perfect, but it's certainly not bad at all. Okay? So by having taken the square root of our data, we have now allowed our data to meet the assumptions of the model in a satisfactory way. And this process of trying more than one function to transform our data. That's a very common thing to do. In one of our next videos, we're going to talk about the different types of ways that are commonly used to transform data and the cases where they're most applicable. But the truth is, often it's a case of trial and error. I hope this has been helpful and I'll see you in a moment with our next video. Thank you.