Line of Best Fit 04:37 minutes

Video Transcript

Transcript Line of Best Fit

Valentine Verne is on a treasure hunt in the deep blue ocean. He has heard many tales of treacherous plants and animals standing between sailors and a storied treasure.

Valentine uses his radar to map out where the obstacles are. There are two routes that Valentine can take. He needs to find the line of best fit so that he won’t get too close to any of the sea critters in his way. To do this, he must understand Lines of Best Fit. Let’s take a look at Valentine’s journey.

The line of best fit

In scatter plots, the line of best fit is a line that is as close as possible to all points on the graph, with as many points above the line as below. The easy way to figure out which of two given lines is the line of better fit, is to calculate the residuals of the two lines.Residuals?! Residuals are the differences between the y-values of each point, in our case the obstacles, and the y-values of the line in question. The two different routes are shown here. Valentine knows that the lower the sum of the squares of the differences in y-values, the better the line fits to the data. Let’s take a look at the obstacles in Valentine’s way.

Determining the residuals

Valentine’s route is shown here by the equation 'y' is equal to 0.55x plus 3.2. To figure out if the route is a good fit, we'll use a table. First, we need the coordinates of all points, as well as the ‘y’ values when the ‘x’ coordinates are substituted into the equation of the line. Next, we determine the residuals, which is the difference between the two y-values. The last step is to sum the squares of the residuals. The line with the smaller sum is the line of better fit.

Here, we use the x- and y-coordinates of the obstacles, for example, (1, 2). To determine the corresponding y-value of the route, we plug the x-value, 1, into our equation for the route, y=0.55x+3.2. Doing so gives us 3.75. Now it's time to determine the residual for this point, which again, is the difference between the two y-values. 2 minus 3.75 is -1.75. As always, the last step is to square our sum, giving us 3.0625.

We repeat this process for each obstacle in our list: write down their coordinates plug in the x-value into the equation of the route to find the corresponding y-value, determine the residual, or the difference of the two y-values, and of course, square the result. Finally, we sum all the squares, which gives us 34.0975.

Now, let's have a look at the second route to see if it is a better fit. The equation for this line is y=0.4x plus 4.65. Remember: the obstacles still have the same coordinates. So, the first column of the table contains the coordinates for the obstacles, just as before.

But beware:
When finding the y-values that correspond to these x-values, we have to plug the values for 'x' into the equation for the new line. Again, determine the differences of the two y-values next to get the residuals, and square the results. As always, our final step is to sum the squares.

We're left with 27.695, which is less than the 34.0975 we obtained from the first route. Comparing the two sums, Valentine has to choose the lower residual value of the two lines since the line of best fit is always the LEAST squares line.

He chooses the second line and off he goes!