Journal of Research of the National Bureau of Standards

In more compact form this is

X(t) = (t) X(t−1) + U(t), t = 1, 2, ...,

(3.1)

where X(t) is the state vector and ❤(t) is the state transition matrix. The stochastic elements of the transition from one state to the next are contained in U(t).

The clocks are never read directly, but, using sophisticated electronics, the time differences between clocks can be read with a precision of less than a nanosecond. "Perfect" time, which we can never observe, cancels out, leaving an observation of the clock error differences. z1(t) = x,(t) − x2(t) and z2(t) = x1(t) − x3(t). We write the observation equation in matrix form as

[merged small][merged small][ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small]

Z(t) is the vector of observed clock time differences and H(t) is the observation matrix. V(t) is a vector of observational errors. We write H(t) as a time-varying matrix because, in the multivariate case, partially missing data is represented by deleting rows from H(t) thus changing the dimension of Z(t), H(t), and V(t). In some cases it is possible to recover from a problem with the reference clock by defining a new (temporary) reference and changing the patterns of ± 1 and zeros in H(t). This is described in detail in Jones and Tryon [4].

For small d(t) the covariance matrix of the transition errors, U(t), is of the diagonal form d(t)Q where Q is

[ocr errors][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small]

Note that the matrix Q contains all the parameters to be estimated except w.

If the time offset of each clock is measured to the nearest nanosecond and is otherwise error free, the observational error vector V(t) has covariance matrix

since the variance of a uniformly distributed random variable from -0.5 to 0.5 is 1/12. The dimension of R(t) will also vary with time if there are partially missing data.

4. The Kalman Recursion

First, we adopt some notation. By X(t|s), s <t we denote the best estimate of the state vector at sample time t, based on all the data up to and including sample time s. We define P(ts) to be the covariance matrix of X(t|s). Similarly, Z(ts) will be the predicted observation at time t based on all the data up to time s.

recursion proceeds through the following steps (Kalman [3]).

1. Given the state vector and its covariance matrix at any sample time, say t, we can predict the state vector one observation interval into the future using the transition eq (3.1). The prediction is

since the expected value of the stochastic component of the transition is zero. The covariance matrix of the prediction is Pt+1|t) = (t+1) P(tt) '(t+1) + d(t+1)Q,

(4.2)

where the first term on the right is the covariance matrix of the linear operation in the state transition and the second term is the covariance of the additive stochastic component.

2. From the observation matrix and the predicted state we can predict the next observation.

3. We next compare the prediction with the actual data (when it becomes available in a real-time application). The difference is the recursive residual, or innovation,

I(t+1) = Z(t+1) - Z(t+1|t),

which has covariance matrix

C(t+1)= H(t+1) P(t+1|t) H'(t+1) + R(t+1).

(4.4)

(4.5)

The residuals are critically important. Later, they will be used for checking the fit of the model and estimating the parameters. For now, we will use the information in the residual to complete the recursion.

4. Let

A(t+1) = P(t+1|t) H' (t+1) C(t+1).

This is sometimes called the Kalman gain.

Using the Kalman gain we can update the predicted state and its covariance

and

(4.6)

X(t+1+1) = X(t+1|t) + A(t+1) I(t+1)

(4.7)

P(t+1|t+1) = P(t+1|t) - A(t+1) H(t+1) P(t+1|t).

(4.8)

This completes the recursion.

5. Maximum Likelihood Modeling and Estimation

In this section we will compare the classic use of the Kalman filter with our method of application. In the classical application, unknown parameters to be estimated can appear only in the state vector. All other quantities which appear in the recursion must be known. A single pass of the recursion through the data (often in real time) provides the latest estimate of the state vector and its covariance at each point in time. This is exactly how the Kalman filter is applied to forming a time scale. Given the model and estimates of the parameters, the recursion gives the updated state vector after each new observation. The first, fourth, and seventh, etc., elements of the state vector are estimates of the clock errors, from which, along with their uncertainties from the state covariance matrix, we can estimate time. Our interest, however, is in evaluating how well the clock model fits real data, and in estimating the parameters. To do this, we make use of two facts concerning the residuals, I(t), t = 1, 2, N.

1. If the model is correct, and the recursion is run through the data with the correct values of the unknown parameters, the residuals will form Gaussian white noise series. Standard statistical tests can then be used to determine if the residuals are in fact Guassian white noise. If not, the structure of the residuals can often give some clues as to the defects in the model.

2. From the residuals and their covariance matrix, obtained by running the recursion with any fixed values of the unknown parameters, the likelihood function for those parameter values can be computed. Specifically, the -2 In likelihood function (ignoring the additive constant) is

L = Σ [ln |C(t) | + I'(t) C-'(t) I(t)].

(5.1)

The maximum likelihood estimates of the parameters are those for which L is minimized. Starting from initial guesses, we can use iterative nonlinear function minimization methods to find the estimates. Evaluating the function L for new trial values of the parameters requires one pass of the Kalman recursion through the data. Note that the parameters to be estimated may appear anywhere in the structure of the Kalman recursion.

6. Computational Procedures

It is not necessary to invert the residual covariance matrix, C(t), when computing the likelihood function or updating the predicted state vector and its covariance. If the upper triangular portion of C(t+1) is augmented by H(t+1)P(t+1}t), and I(t+1) to form the partitioned matrix

[C(t+1) | H(t+1) P(t+1|t) | I(t+1)],

a single call to a Choleskey factorization routine (Graybill [5], p. 232) which factors

where T(+1) is upper triangular, will replace the partitioned matrix by

(6.1)

(6.2)

Missing data are easily handled by the Kalman filter procedure. However, there are two distinct situations. First, all the data may be missing at one or several successive observation times, and second, occurring only in multivariate data, only part of the data may be available on a given occasion. For example, one or more, but not all, of the clock differences are missing on that occasion.

The first case is most easily dealt with. In the present model, the unequal spacing feature can account for any size gap in the data by proper choice of d(t). This, however, is a feature of random walk based models and may not apply to other models. A more general procedure for missing data is the following: Suppose k ≥ 1 successive observations are missing. Simply repeat step one in the recursion k additional times to obtain the predicted state, X(t+k+1(t), and its covariance, P(t+k+1|t), for the next observation time for which there is data. Then proceed with the remainder of the recursion. There is no residual and no contribution to the likelihood function when there is no data (Jones [6]).

In the second case, partial loss of data, i.e., missing elements of the data vector Z(t), can be represented by eliminating the corresponding rows of the observation matrix, H(t). Similarly, rows of the observational error vector V(t) and rows and columns of the error covariance matrix R(t) causes no problem with the remaining steps of the recursion, which proceed normally. The information available in the data is propagated through the recursion to correctly update the state vector prediction and its estimated covariance matrix.

The matrix operations in the recursion are, for reasons of efficiency, best coded to take advantage of the specific structure (many zeros) of each matrix. Using available general purpose subroutines to perform the required matrix operations would increase the cost of computer resources enormously. These calculations can be adapted to a time-varying H(t). One approach is to use indicator variables for missing data to form the various reduced matrices directly. Another approach is to note that pre- and/or post-multiplying by H(t) with rows missing is equivalent to performing the operation with the original H matrix and then eliminating the appropriate rows and/or columns of the product matrix and closing it up. For example, the residual covariance matrix

C(t+1)= Ht+1) P(t+1}t) H'(t+1) + R(t+1)

with, say, the second and fourth elements of the data vector missing can be obtained by using the original H and R but then eliminating the second and fourth rows and columns from C(t+1). It is straightforward to write general purpose subroutines to eliminate rows and/or columns of a matrix and close up the spaces. Again, this can be done in response to indicator variables denoting missing observations.

To start the recursion the initial state X(00) and its covariance P(0|0) are needed. Both depend on the application. The time states could be set to zero and the clocks set to agree with International Atomic Time as defined by the BIH. The corresponding variances on the diagonal of P(0|0) would be determined by the accuracy of the intercomparison. If a strictly internal application such as parameter estimation or a time interval measurement was desired, the time states could be set to agree with the difference readings at t = 0, with observational error variances set to the appropriate value for roundoff error. The frequency states might be measured by intercomparison with the frequency standards and the variance determined from the measurement uncertainty. When a measurement is not possible a best guess at the frequency offset can be used with a corresponding guess at its uncertainty used for the variance.

7. Data Analysis

The data used for this analysis were collected over a period of 333 days beginning on February 16, 1979, during which seven clocks listed below were in essentially uninterrupted normal operation. Clock differences were recorded daily to the nearest nanosecond.

In addition to occasional variability in the time of data collection, two successive days were missing entirely. On three other occasions, single clocks had obvious read errors which when discarded resulted in partially missing data. Several dozen other observations which had been flagged as possible errors by the ad hoc procedure then in use were examined visually, judged to be not serious, and retained in the analysis.

Table 1 gives the results of the computations. First we note that the inclusion of deterministic trends (model II) results in a drop of 41 in the -2 In likelihood function value over the no-trend model (model I). The reduction is distributed as X2(6) under the hypothesis of model I, so this is highly significant. There is less than one chance in a thousand that this is a spurious result. Three of the seven clocks had significant trends. The inclusion of random walk drifts (model III), however, makes no improvement. Furthermore, the estimated values for the o.'s are all approximately zero and the estimated initial values of the drift, w(0), are about the same as the deterministic drifts. The conclusion is that several frequency trends are present in this data and that they are constant over the year, rather than wandering as a random walk. It should be noted that one of the drifting clocks had just begun operation while the remaining two were nearing their end of life and failed shortly after the test period. The remaining clocks were in their "middle years.” The standard errors of the parameter estimates given in the table are obtained from the estimated covariance matrix as approximated by twice the inverse of the Hessian of the -2 In likelihood function, L. In model II it is necessary to constrain the solution so that the sum of the drifts is zero. This means that we cannot detect a common drift from clock difference readings. Because of this constraint, a standard error for clock #8 is not available. Due to the large cost of computing the 28 x 28 Hessian for model III, which we have rejected anyway, the standard errors of the estimates have not been computed.

Figures la-f are integrated periodograms of the six series of residuals. The parallel lines represent five percent significance test limits for white noise. That is, only five percent of actual white noise series so tested will wander outside the lines. In this data set, only one pair of clocks (601 minus 167) two) show a slight deviation from white noise. We conclude that this model fits very well. Figures 2a-f are histograms of the six residual series. Also given are three different tests for normality. All the series are well approximated by the normal (Gaussian) distribution which verifies that the assumed likelihood function was reasonable.

We note in conclusion that, having selected the constant drift model, we could reduce the dimension of the state space model by one-third by rewriting the clock model as

[subsumed][subsumed][subsumed][subsumed][subsumed][subsumed][ocr errors][subsumed][subsumed][subsumed][subsumed][merged small][merged small]

« Previous Continue »

Books

Journal of Research of the National Bureau of Standards, Volume 88