Go forward to Standard Nonlinear Models.
Go backward to Polynomial and Multilinear Fits.
Go up to Curve Fitting.
Error Estimates for Fits
------------------------
With the Hyperbolic flag, `H a F' [`efit'] performs the same fitting
operation as `a F', but reports the coefficients as error forms
instead of plain numbers. Fitting our two data matrices (first with
13, then with 14) to a line with `H a F' gives the results,
3. + 2. x
2.6 +/- 0.382970843103 + 2.2 +/- 0.115470053838 x
In the first case the estimated errors are zero because the linear fit
is perfect. In the second case, the errors are nonzero but moderately
small, because the data are still very close to linear.
It is also possible for the *input* to a fitting operation to
contain error forms. The data values must either all include errors
or all be plain numbers. Error forms can go anywhere but generally
go on the numbers in the last row of the data matrix. If the last
row contains error forms
`y_i +/- sigma_i', then the `chi^2'
statistic is now,
chi^2 = sum(((y_i - (a + b x_i)) / sigma_i)^2, i, 1, N)
so that data points with larger error estimates contribute less to the
fitting operation.
If there are error forms on other rows of the data matrix, all the
errors for a given data point are combined; the square root of the sum
of the squares of the errors forms the `sigma_i' used for the data
point.
Both `a F' and `H a F' can accept error forms in the input matrix,
although if you are concerned about error analysis you will probably
use `H a F' so that the output also contains error estimates.
If the input contains error forms but all the `sigma_i' values are the
same, it is easy to see that the resulting fitted model will be the
same as if the input did not have error forms at all (`chi^2' is
simply scaled uniformly by `1 / sigma^2', which doesn't affect where
it has a minimum). But there *will* be a difference in the estimated
errors of the coefficients reported by `H a F'.
Consult any text on statistical modelling of data for a discussion of
where these error estimates come from and how they should be
interpreted.
With the Inverse flag, `I a F' [`xfit'] produces even more
information. The result is a vector of six items:
1. The model formula with error forms for its coefficients or
parameters. This is the result that `H a F' would have produced.
2. A vector of "raw" parameter values for the model. These are the
polynomial coefficients or other parameters as plain numbers, in
the same order as the parameters appeared in the final prompt of
the `I a F' command. For polynomials of degree `d', this vector
will have length `M = d+1' with the constant term first.
3. The covariance matrix `C' computed from the fit. This is an MxM
symmetric matrix; the diagonal elements `C_j_j' are the variances
`sigma_j^2' of the parameters. The other elements are
covariances `sigma_i_j^2' that describe the correlation between
pairs of parameters. (A related set of numbers, the "linear
correlation coefficients" `r_i_j', are defined as `sigma_i_j^2 /
sigma_i sigma_j'.)
4. A vector of `M' "parameter filter" functions whose meanings are
described below. If no filters are necessary this will instead
be an empty vector; this is always the case for the polynomial
and multilinear fits described so far.
5. The value of `chi^2' for the fit, calculated by the formulas
shown above. This gives a measure of the quality of the fit;
statisticians consider `chi^2 = N - M' to indicate a moderately
good fit (where again `N' is the number of data points and `M' is
the number of parameters).
6. A measure of goodness of fit expressed as a probability `Q'.
This is computed from the `utpc' probability distribution
function using `chi^2' with `N - M' degrees of freedom. A value
of 0.5 implies a good fit; some texts recommend that often `Q =
0.1' or even 0.001 can signify an acceptable fit. In particular,
`chi^2' statistics assume the errors in your inputs follow a
normal (Gaussian) distribution; if they don't, you may have to
accept smaller values of `Q'.
The `Q' value is computed only if the input included error
estimates. Otherwise, Calc will report the symbol `nan' for `Q'.
The reason is that in this case the `chi^2' value has effectively
been used to estimate the original errors in the input, and thus
there is no redundant information left over to use for a
confidence test.