Thus, letting
t
play the role of X and y
t
the role of Z, we have from (7)
a
t
= a
t
+ V
t
b
y
t
b
0
a
t
b
0
V
t
b +
2
and V
t
= V
t
V
t
bb
0
V
t
0
b
0
V
t
b +
2
(8)
From (2), it follows that
a
t+1
= Aa
t
and V
t+1
= AV
t
A
0
+ (9)
The ”updating”equations (8) describe how the forecast of the state vector at time t is changed
when y
t
is observed. Together with the ”prediction”equations (9), they imply the recursion
(4).
In models where the state variables have an economic interpretation, it is sometimes
desirable to estimate
t
using all the available data. Starting with a
T
and V
T
computed with
the Kalman …lter, one can iterate backwards to compute E(
t
jY
T
): The relevant recursion,
called the ”smoothing”algorithm, is derived and discussed in Harvey’s book.
6 Matrices that Diagonalize the Covariance Matrix for y
Again, let Y
t
denote the vector y
1
; :::; y
t
. Note that a
t
is a linear function of the data in Y
t1
and hence the prediction error e
t
= y
t
b
0
a
t
is a linear function of the data in Y
t
. If t > s,
E(e
t
e
s
) = Ee
s
E(b
0
t
b
0
a
t
+ u
t
jY
t1
) = 0:
If t = s,
Ee
2
t
= E[var(b
0
t
+ u
t
jY
t1
)] =
2
+ b
0
V
t
b:
Thus, the fe
t
g are a set of uncorrelated, but heteroskedastic random variables. Denoting
the vector of the y’s by y and the vector of the e’s by e, we have e = Gy, where G is
a nonrandom triangular matrix such that Eee
0
= G(Eyy
0
)G
0
is diagonal. Thus, as noted
in Section 4, the Kalman …lter can be viewed as an algorithm for exactly diagonalizing the
covariance matrix of y.
For ARMA models, an alternative to calculating the exact Gaussian likelihood is to
approximate the likelihood by conditioning on the …rst few y’s and "’s. After conditioning,
the remaining y
0
s can be written as an invertible linear function of a …n ite number of current
and lagged innovations. Thus, ap proximating the likelihood by conditioning is equivalent
to …nding a triangular linear transform of the data having a scalar covariance matrix and is
closely related to the linear transform employed by the Kalman …lter. More precisely, suppose
one used as the initial variance matrix V
1
; not the stationary variance given in equation (5),
but instead some variance satisfying
V
1
= + AV
1
A
0
AV
1
bb
0
V
1
A
0
b
0
V
1
b +
2
:
Then the iteration scheme (4) produces a constant matrix V
t
and the term b
0
V
t
b+
2
appearing
in the likelihood (6) does not depend on t. If that term does not depend on the unknown
ARMA coe¢ cients either, the Gaussian maximum likelihood estimator minimizes the sum of
squared innovations
P
(y
t
b
0
a
t
)
2
. Thus, using the Kalman …lter after setting initial conditions
to produce a constant V
t
matrix is equivalent to conditioning on initial values and computing
nonlinear least squares estimates.
There is still one more statistical procedure that involves a linear transformation approxi-
mately diagonalizing the covariance matrix of y. If the y’s are a stationary stochastic process,
the T T Fourier matrix F, with elements f
kt
= e
2ikt=T
, not depending on any unknown pa-
rameters, approximately diagonalizes any stationary covariance matrix. The variable z = Fy
4