Assessing gretl's accuracy: the NIST datasets

Allin Cottrell

Department of Economics
Wake Forest University

Table of Contents
1. About gretl
2. The NIST reference datasets
3. Gretl's performance

1. About gretl

Gretl is an open-source, cross-platform econometrics program. Its development is hosted by sourceforge.

2. The NIST reference datasets

The U.S. National Institute of Standards and Technology (NIST) publishes a set of statistical reference datasets. The object of this project is to "improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software".

As of September 2010 the website for the project can be found at:

while the datasets are at

For testing gretl I have made use of the datasets pertaining to Linear Regression and Univariate Summary Statistics (the others deal with ANOVA and nonlinear regression).

I quote from the NIST text "Certification Method & Definitions" regarding their certified computational results (emphasis added):

For all datasets, multiple precision calculations (accurate to 500 digits) were made using the preprocessor and FORTRAN subroutine package of Bailey (1995, available from NETLIB). Data were read in exactly as multiple precision numbers and all calculations were made with this very high precision. The results were output in multiple precision, and only then rounded to fifteen significant digits. These multiple precision results are an idealization. They represent what would be achieved if calculations were made without roundoff or other errors. Any typical numerical algorithm (i.e. not implemented in multiple precision) will introduce computational inaccuracies, and will produce results which differ slightly from these certified values.

It is not to be expected that results obtained from ordinary statistical packages will agree exactly with NIST's multiple precision benchmark figures. But the benchmark provides a very useful test for egregious errors and imprecision.

3. Gretl's performance

Table 1 below shows the performance of both gretl's standard regression facility and the gretl plugin based on the Gnu Multiple Precision (GMP) library. In the Gretl column the "min. correct significant digits" figure shows, for each model, the least number of correct significant digits in the gretl results when the various statistics associated with the model (regression coefficients and standard errors, sum of squared residuals, standard error of residuals, F statistic and R2) are compared with the NIST certified values. The GMP plugin column simply records whether the gretl results were correct to at least 12 sigificant figures for all the statistics. For these tests gretl was compiled using gcc 2.95.3 with the -O2 optimization flag, linked against glibc-2.2.5, and run on an IBM ThinkPad with Pentium III processor.

Table 1. NIST linear regression tests

DatasetModelGretl (min. correct significant digits)GMP plugin (correct to at least 12 digits?)
NorrisSimple linear regression9Yes
NoInt1Simple regression, no intercept9 (but see text)Yes
NoInt2Simple regression, no intercept 9 (but see text)Yes
Filip10th degree polynomial 0 (see text)Yes
LongleyMultiple regression, six independent variables8Yes
Wampler15th degree polynomial7Yes
Wampler25th degree polynomial9Yes
Wampler35th degree polynomial7Yes
Wampler45th degree polynomial7Yes
Wampler55th degree polynomial7Yes

As can be seen from the table, gretl does a good job of tracking the certified results. With the Filip data set, where the model is

	  \[y_t=\beta_0+\beta_1 x_t+\beta_2 x^2_t+\beta_3 x^3_t+\cdots

gretl refuses to produce estimates due to a high degree of multicollinearity (the popular commercial econometrics program Eviews 3.1 also baulks at this regression). Other than that, the program produces accurate coefficient estimates in all cases.

In the NoInt1 and NoInt2 datasets there is a methodological disagreement over the calculation of the coefficient of determination, R2, where the regression does not have an intercept. gretl reports the square of the correlation coefficient between the fitted and actual values of the dependent variable in this case, while the NIST figure is

	\[R^2 = 1 - \frac{\mathrm{ESS}}{\sum y^2}\]

There is no universal agreement among statisticians on the "correct" formula (see for instance the discussion in Ramanathan, 2002, pp. 163–4). Eviews 3.1 produces a different figure again (which has a negative value for the NoInt test files). The figure chosen by NIST was obtained for these regressions using the command

genr r2alt = 1 - $ess/sum(y * y)

and the numbers thus obtained were in agreement with the certified values, up to gretl's precision.

As for the univariate summary statistics, the certified values given by NIST are for the sample mean, sample standard deviation and sample lag-1 autocorrelation coefficient. NIST note that the latter statistic "may have several definitions". The certified value is computed as

                     {\sum^T_{t=1}(y_t - \bar{y})^2}\]

while gretl gives the correlation coefficient between yt and yt−1. For the purposes of comparison, the NIST figure was computed within gretl as follows:

      genr y1 = y(-1) 
      genr ybar = mean(y) 
      genr devy = y - ybar genr
      devy1 = y1 - ybar 
      genr ssy = sum(devy * devy) 
      smpl 2 ; 
      genr ssyy1 = sum(devy * devy1) 
      genr rnist = ssyy1 / ssy

The figure rnist was then compared with the certified value.

With this modification, all the summary statistics were in agreement (to the precision given by gretl) for all datasets (PiDigits, Lottery, Lew, Mavro, Michelso, NumAcc1, NumAcc2, NumAcc3 and NumAcc4).