You are watching: Explain what each point on the least-squares regression line represents

## Goodness of Fit of a Straight Line to Data

Once the scatter diagram of the data has been attracted and also the model assumptions described in the previous sections at leastern visually verified (and also maybe the correlation coefficient (r) computed to quantitatively verify the direct trend), the following step in the analysis is to find the right line that finest fits the data. We will define how to meacertain how well a straight line fits a repertoire of points by researching just how well the line (y=frac12x-1) fits the information set

<eginarrayc c c c c x & 2 & 2 & 6 & 8 & 10 \ hline y &0 &1 &2 &3 &3\ endarray>

(which will be offered as a running instance for the next three sections). We will create the equation of this line as (haty=frac12x-1) with an accent on the (y) to suggest that the (y)-values computed making use of this equation are not from the information. We will carry out this through all lines approximating data sets. The line (haty=frac12x-1) was selected as one that seems to fit the information reasonably well.

The idea for measuring the goodness of fit of a straight line to data is portrayed in Figure (PageIndex1), in which the graph of the line (haty=frac12x-1) has been superapplied on the scatter plot for the sample data set.

Figure (PageIndex1):**Plot of the Five-Point Data and also the Line (haty=frac12x-1)**

To each point in the data collection tright here is connected an “error,” the positive or negative vertical distance from the allude to the line: positive if the suggest is over the line and also negative if it is listed below the line. The error can be computed as the actual (y)-worth of the point minus the (y)-worth (haty) that is “predicted” by inserting the (x)-worth of the data suggest into the formula for the line:

< exterror at data point(x,y)=( exttrue y)−( extpredicted y)=y−haty>

The computation of the error for each of the five points in the data set is presented in Table (PageIndex1).

Table (PageIndex1): The Errors in Fitting File with a Straight Line (x) (y) (haty=frac12x-1) (y-haty) ((y-haty)^2)2 | 0 | 0 | 0 | 0 | |

2 | 1 | 0 | 1 | 1 | |

6 | 2 | 2 | 0 | 0 | |

8 | 3 | 3 | 0 | 0 | |

10 | 3 | 4 | −1 | 1 | |

(sum) | - | - | - | 0 | 2 |

A first believed for a meacertain of the goodness of fit of the line to the information would certainly be simply to add the errors at eexceptionally point, however the instance shows that this cannot work well in basic. The line does not fit the data perfectly (no line can), yet bereason of cancellation of positive and also negative errors the amount of the errors (the fourth column of numbers) is zero. Instead goodness of fit is measured by the amount of the squares of the errors. Squaring eliminates the minus signs, so no cancellation can take place. For the information and line in Figure (PageIndex1) the amount of the squared errors (the last column of numbers) is (2). This number procedures the goodness of fit of the line to the data.

## The Leastern Squares Regression Line

Given any kind of repertoire of pairs of numbers (other than once all the (x)-values are the same) and also the corresponding scatter diagram, tbelow always exists exactly one straight line that fits the data far better than any various other, in the feeling of minimizing the sum of the squared errors. It is called the leastern squares regression line. Additionally tbelow are formulas for its slope and (y)-intercept.

Definition: least squares regression Line

Given a repertoire of pairs ((x,y)) of numbers (in which not all the (x)-worths are the same), there is a line (haty=hatβ_1x+hatβ_0) that best fits the data in the feeling of minimizing the sum of the squared errors. It is referred to as the* least squares regression line*. Its slope (hatβ_1) and (y)-intercept (hatβ_0) are computed utilizing the formulas

and

where

and

< SS_xy=amount xy-frac1nleft ( sum x appropriate )left ( sum y est )>

(arx) is the mean of all the (x)-worths, (ary) is the mean of all the (y)-values, and (n) is the number of pairs in the data collection.

The equation

specifying the leastern squares regression line is referred to as the least squares regression equation.

Remember from Section 10.3 that the line with the equation (y=eta _1x+eta _0) is dubbed the population regression line. The numbers (hateta _1) and (hateta _0) are statistics that estimate the populace parameters (eta _1) and (eta _0).

We will compute the least squares regression line for the five-allude data set, then for a much more practical instance that will be another running example for the introduction of new ideas in this and also the next three sections.

Example (PageIndex2)

Find the least squares regression line for the five-point data set

<eginarrayc c c c c x & 2 & 2 & 6 & 8 & 10 \ hline y &0 &1 &2 &3 &3\ endarray>

and verify that it fits the data far better than the line (haty=frac12x-1) considered in Section 10.4.1 above.

**Solution**:

In actual exercise computation of the regression line is done using a statistical computation package. In order to clarify the meaning of the formulas we display screen the computations in tabular create.

(x) (y) (x^2) (xy)2 | 0 | 4 | 0 | |

2 | 1 | 4 | 2 | |

6 | 2 | 36 | 12 | |

8 | 3 | 64 | 24 | |

10 | 3 | 100 | 30 | |

(sum) | 28 | 9 | 208 | 68 |

In the last line of the table we have the sum of the numbers in each column. Using them we compute:

<arx=fracsum xn=frac285=5.6\ ary=fracamount yn=frac95=1.8>

so that

and

The least squares regression line for these data is

The computations for measuring just how well it fits the sample information are offered in Table (PageIndex2). The sum of the squared errors is the sum of the numbers in the last column, which is (0.75). It is much less than (2), the amount of the squared errors for the fit of the line (haty=frac12x-1) to this data set.

Table (PageIndex2)*(x) (y) (haty=0.34375x-0.125) (y-haty) ((y-haty)^2)*

**The Errors in Fitting Documents via the Least Squares Regression Line**2 | 0 | 0.5625 | −0.5625 | 0.31640625 |

2 | 1 | 0.5625 | 0.4375 | 0.19140625 |

6 | 2 | 1.9375 | 0.0625 | 0.00390625 |

8 | 3 | 2.6250 | 0.3750 | 0.14062500 |

10 | 3 | 3.3125 | −0.3125 | 0.09765625 |

Example (PageIndex3)

Table (PageIndex3) reflects the age in years and also the retail worth in hundreds of dollars of a random sample of ten automobiles of the very same make and also design.

Construct the scatter diagram. Compute the straight correlation coeffective (r). Interpret its worth in the context of the trouble. Compute the leastern squares regression line. Plot it on the scatter diagram. Interpret the interpretation of the slope of the leastern squares regression line in the conmessage of the problem. Suppose a four-year-old car of this make and also model is schosen at random. Use the regression equation to predict its retail worth. Suppose a (20)-year-old car of this make and also design is schosen at random. Use the regression equation to predict its retail value. Interpret the result. Comment on the validity of making use of the regression equation to predict the price of a brand brand-new automobile of this make and version.Table (PageIndex3)

*(x) 2 3 3 3 4 4 5 5 5 6*

**:**File on Period and also Value of Used Automobiles of a Specific Make and Model(y) | 28.7 | 24.8 | 26.0 | 30.5 | 23.8 | 24.6 | 23.8 | 20.4 | 21.6 | 22.1 |

**Solution**:

**Scatter Diagram for Era and also Value of Used Automobiles**

We have to first compute (SS_xx,; SS_xy,; SS_yy), which implies computer (amount x,; sum y,; sum x^2,; amount y^2; extand; sum xy). Using a computing tool we obtain

Using the worths of (amount x) and also (sum y) computed in part (b), <arx=fracamount xn=frac4010=4\ ary=fracamount yn=frac246.310=24.63> Thus making use of the worths of (SS_xx) and also (SS_xy) from component (b),

Figure (PageIndex3) shows the scatter diagram via the graph of the least squares regression line superenforced.

See more: Express This Displacement In Liters (L) By Using Only The Conversions 1L=1000Cm3 And 1In=2.54Cm.