Appendix 6

 

The Method of Least Squares, Also Known As Linear Regression.

(Not for the weak of heart, but hang in there, it has a very happy ending.)

 

            Quite often data consists of pairs of measurements (xi and yi) of an independent variable xi and a dependent variable yi.  Suppose theory suggests that these data pairs can be fit by a straight line, so that you expect a straight line to result from a graph of yi versus xi.  What you must then find are a pair of coefficients (A and B) such that y = Ax + B for any arbitrary x.  An established method exists for determining the "best fit" to such data.  This is the method of least squares.

 

A "best fit" is determined when the difference between the data (yi) and the function Axi + B is minimized by an optimum choice of A (the slope) and B (the y intercept).  This difference is denoted by yi = yi -(Axi + B).  This minimization procedure is accomplished by minimizing a function called "chi-square" (x2).  Where

 

 

Where syi is the standard deviation of the sample of n measurements of yi.  If you rewrite the quantity    yi as yi -(Axi + B), then x2 becomes

x2 =·[(yi - Axi - B)/syi]2

The function x2 is universally considered to be the appropriate measure of the "goodness of fit".  Therefore when x2 is minimized, a best fit is obtained.

            As always, to minimize a function, one must set the derivative of that function equal to zero.  The derivatives of 02 with respect to A and B are used since it is these coefficients which, in effect, must be varied to obtain the best fit.  Taking these derivatives one obtains the following.

 

 

and                 

 

 

From these two equations, a pair of simultaneous equations results:

 

·yi = ·Axi + ·B  and · xiyi = ·Bxi + ·Axi2.  The solution of this pair of equations is:

 

                                                   (6.3)

and                                 (6.4)

 

where                                                    (6.5)

 

            You may use equations (6.3) to (6.5) each time you wish to compute the slope and y-intercept from a set of x and y data pairs.  Several simplifying assumptions have gone into these results, however this derivation will not be developed further.  Suffice it to say that Eqns. (6.3) to (6.5) will do an adequate job of producing a best fit straight line throughout the labs in this course.

            To obtain the error in the coefficients A and B it is necessary to follow the usual procedure of propagating uncertainties.  This is quite laborious, therefore only the results will be quoted here.

 

If         and  retains its earlier definition,

 

then  sA2 = (ns2)/ d   and ·B = (s2· xi2)/ d.

 

            Obviously, just using the formulas presented here for computing A, B, sA2, and sB2 can become quite a chore.  Today, many scientific calculators contain built in algorithms for computing a best fit straight line.  All of these algorithms utilize the method of least squares and the use of such a calculator is highly recommended.  You should refer to the owner's manual of such  calculators for explicit instructions on how to enter the pairs of x and y values into the calculator's memory.  After the

data are entered, computing the coefficients A and B is as simple as computing the sine of an angle.  I do not know whether or not any of these calculators automatically calculate sA2 and sB2, but some of them can be programmed to make these calculations.  However, we have a nifty program in our computers called Graphical Analysis III that will calculate them for you !        

                                   

*****************************************************