Department of Economics

Wake Forest University

Wake Forest University

**Table of Contents**- 1. About gretl
- 2. The NIST reference datasets
- 3. Gretl's performance

Gretl is an open-source, cross-platform econometrics program. Its development is hosted by sourceforge.

The U.S. National Institute of Standards and Technology (NIST) publishes a set of statistical reference datasets. The object of this project is to "improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software".

As of September 2010 the website for the project can be found at:

`http://itl.nist.gov/div898/strd/general/main.html`

while the datasets are at

`http://itl.nist.gov/div898/strd/general/dataarchive.html`

For testing `gretl` I have made use
of the datasets pertaining to Linear Regression and Univariate
Summary Statistics (the others deal with ANOVA and nonlinear
regression).

I quote from the NIST text "Certification Method & Definitions" regarding their certified computational results (emphasis added):

For all datasets, multiple precision calculations (accurate to 500 digits) were made using the preprocessor and FORTRAN subroutine package of Bailey (1995, available from NETLIB). Data were read in exactly as multiple precision numbers and all calculations were made with this very high precision. The results were output in multiple precision, and only then rounded to fifteen significant digits.

These multiple precision results are an idealization. They represent what would be achieved if calculations were made without roundoff or other errors.Any typical numerical algorithm (i.e. not implemented in multiple precision) will introduce computational inaccuracies, and will produce results which differ slightly from these certified values.

It is not to be expected that results obtained from ordinary statistical packages will agree exactly with NIST's multiple precision benchmark figures. But the benchmark provides a very useful test for egregious errors and imprecision.

Table 1 below shows the performance of
both gretl's standard regression facility and the gretl plugin
based on the Gnu Multiple Precision (GMP) library. In the Gretl
column the "min. correct significant digits" figure
shows, for each model, the least number of correct significant
digits in the gretl results when the various statistics associated
with the model (regression coefficients and standard errors, sum
of squared residuals, standard error of residuals,
*F* statistic and *R*^{2}) are compared with the
NIST certified values. The GMP plugin column simply records
whether the gretl results were correct to at least 12 sigificant
figures for all the statistics. For these tests gretl was compiled
using gcc 2.95.3 with the -O2 optimization flag, linked against
glibc-2.2.5, and run on an IBM ThinkPad with Pentium III processor.

**Table 1. NIST linear regression tests**

Dataset | Model | Gretl (min. correct significant digits) | GMP plugin (correct to at least 12 digits?) |
---|---|---|---|

Norris | Simple linear regression | 9 | Yes |

Pontius | Quadratic | 8 | Yes |

NoInt1 | Simple regression, no intercept | 9 (but see text) | Yes |

NoInt2 | Simple regression, no intercept | 9 (but see text) | Yes |

Filip | 10th degree polynomial | 0 (see text) | Yes |

Longley | Multiple regression, six independent variables | 8 | Yes |

Wampler1 | 5th degree polynomial | 7 | Yes |

Wampler2 | 5th degree polynomial | 9 | Yes |

Wampler3 | 5th degree polynomial | 7 | Yes |

Wampler4 | 5th degree polynomial | 7 | Yes |

Wampler5 | 5th degree polynomial | 7 | Yes |

As can be seen from the table,
`gretl` does a good job of tracking the
certified results. With the `Filip` data set,
where the model is

In the `NoInt1` and
`NoInt2` datasets there is a methodological
disagreement over the calculation of the coefficient of
determination, *R*^{2}, where the regression does not have an
intercept. `gretl` reports the square
of the correlation coefficient between the fitted and actual
values of the dependent variable in this case, while the NIST
figure is

` genr r2alt = 1 - $ess/sum(y * y)
`

and the numbers thus obtained were in agreement with the
certified values, up to `gretl`'s
precision.

As for the univariate summary statistics, the certified values given by NIST are for the sample mean, sample standard deviation and sample lag-1 autocorrelation coefficient. NIST note that the latter statistic "may have several definitions". The certified value is computed as

whilegenr y1 = y(-1) genr ybar = mean(y) genr devy = y - ybar genr devy1 = y1 - ybar genr ssy = sum(devy * devy) smpl 2 ; genr ssyy1 = sum(devy * devy1) genr rnist = ssyy1 / ssy |

The figure `rnist` was then compared with the
certified value.

With this modification, all the summary statistics were in
agreement (to the precision given by
`gretl`) for all datasets
(`PiDigits`, `Lottery`,
`Lew`, `Mavro`,
`Michelso`, `NumAcc1`,
`NumAcc2`, `NumAcc3` and
`NumAcc4`).