Pergamon PII: sooo5-1098(%)00149-5
Aufomorica,'Vol. 33,No. 1.p~. 109-112, 1997
6 1997 Elsevier Science Ltd
Printed in Great Britain. All rights reserved
cmls-lo!%/97 517.00 + 0.00
Technical Communique
A One-measurement Form of Simultaneous
Stochastic Approximation*
JAMES C. SPALLt
Key Words-Optimization: gradient estimation; simultaneous perturbation; SPSA.
Abstmet-The simultaneous perturbation stochastic ap-
proximation (SPSA) algorithm has proven very effective for
difficult multivariate optimization problems where it is not
possible to obtain direct gradient information. As discussed
to date, SPSA is based on a highly efficient gradient
approximation requiring only two measurements of the loss
function independent of the number of parameters being
estimated. This note presents a form of SPSA that requires
only one function measurement (for any dimension). Theory
is presented that identifies the class of problems for which
this one-measurement form will be asymptotically superior to
the standard two-measurement form. 0 1997 Elsevier
Science Ltd. All rights reserved.
1. Introduction
The simultaneous perturbation stochastic approximation
(SPSA) algorithm has recently attracted considerable
attention for multivariate optimization problems where it is
difficult or impossible to obtain a gradient of the objective
function with respect to the parameters being optimized (see
e.g. Alessandri and Parasini, 1995; Hill and Fu, 1995; Smith
and Chin, 1995; Maeda et al., 1995; Rezayat, 1995). SPSA is
for problems in the multivariate Kiefer-Wolfowitz SA
setting, where only (possibly noisy) measurements of the
objective (say, loss) function are assumed available. The
essential feature of SPSA is its highly efficient gradient
approximation. As described in Spa11 (1987, 1992) the
gradient approximation requires only two loss function
measurements, regardless of the problem dimension. This
contrasts with the 2p function measurements required in
conventional finite-difference methods, where p is the
problem dimension (the number of parameters being
optimized). The central theoretical result in Spa11 (1992) is
that in many practical problems this p-fold savings in
function evaluations per gradient approximation translates
directly into a p-fold savings in function evaluations to solve
the optimization problem (i.e., the algorithms take the same
number of iterations to achieve a given level of mean-square
accuracy in the optimization parameters, but SPSA takes p
times fewer measurements per iteration).
This note introduces a variant of SPSA that requires only
one function evaluation to construct the gradient approxima-
tion. Theory is presented that provides guidelines on when it
is advantageous to use this one-measurement form versus the
‘standard’ two-measurement form mentioned above. The
implications of the theory are illustrated in a numerical study
for a small-scale problem involving a multivariate polynomial
loss function. For convenience, we shall refer to the one-
and two-measurement forms of SPSA as SPSAl and SPSA2
respectively.
*Received 1 April 1996; received in final form 5 August
1996. This paper was not presented at any IFAC meeting.
This paper was recommended for publication in revised form
by Editor Peter Dorato. Corresponding author James C.
Spall. Tel. +l 301 953 5ooO; E-mail [email protected].
t The Johns Hopkins University, Applied Physics Labora-
tory, Laurel, MD 20723-6099, U.S.A.
Perturbation
2. Algorithm
Our fundamental aim is to minimize a loss function L(B),
where 19 is some p-dimensional vector of adjustable
parameters. Consistent with the usual framework for
nonlinear continuous optimization, we seek a minimum B*
such that g(0) = X.(0)/a@ = 0. (This paper deals with the
local unconstrained optimization problem only; for SPSA2,
an extension to the constrained optimization problem is
given in Sadegh (1996) and an extension to the global
problem of multiple roots to g(8) =0 is given in Chin
(1994).) It is assumed that only measurements of L(B)
(typically with additive noise) are available and that no direct
measurements (with or without noise) of g(0) are available.
This is identical to the well-known framework of
Kiefer-Wolfowitz SA (Ruppert, 1983).
SPSA has the standard iterative form
h,+t = 8, - a*k?,(O,),
(1)
where ck represents a scalar gain coefficient and &(.)
represents the SP approximation to the unknown gradient
g(.). In Spa11 (1987, 1992), &.(.) is represented by a
two-measurement approximation. For this paper, we propose
the following gradient approximation based on one
measurement of the loss function:
where
^
y, = z_(e, + gA,) + ek,
ck is a positive scalar,
Ak = (A,,, A,,, . , AC)’ is a Vector of zero-mean indepen-
dent random variables (typically generated by Monte Carlo),
and Q is the measurement noise. Note that l k is not
necessarily independent of 8, or Ak, but it does satisfy a
weaker Martingale-type condition, as given in Section 3
below.
To see intuitively why fk(.) above is reasonable, note, by a
Taylor expansion of the Ith element about an arbitrary
parameter value 0, that
i?kde) =
w + CkAk) + Ek _ Jw) I Ckg(@)TAk
‘dk,
Wb CkAk,
+ C:AH(e)Ak+C:L'"(8k)Ak~Ak~A,+E,
&&,
%2&d
c&'
(3)
where H(0) is the Hessian matrix of L(0) and L’“(8,) is the
third derivative of L with respect to BT evaluated at some
point between 8 and 0 + c,A, (as in Spall, 1992, Lemma 1).
Then, under the same assumptions on the distribution of the
{A*;} given in Spa11 (1992) (particularly independence+
symmetry and finite inverse moments), we have from (2.3)
_qq,,(e)=g,(e)+o(~:) w=i, 2,...,p,
i.e. &(.) is an unbiased estimator of the true (unknown)
gradient to within an O(c:) bias. A fundamental difference in
the two-measurement and one-measurement forms of SPSA
is that for the two-measurement form, contributions due to
109
110
Technical Communiques
L(0) and H(6) (the first and third terms on the right-hand
side of (3)) are identically 0 (versus simply mean 0) as a result
of cancellation effects. This plays a significant role in the
efficiency analysis and suggests why, despite taking twice the
number of measurements, SPSA2 is generally the better
algorithm to use in most practical applications. However,
there are situations where SPSAl will be the more efficient
algorithm, and the theory in Section 3 considers this.
3. Eficiency
This section analyzes the relative efficiency of SPSAl and
SPSA2. The arguments here closely follow those of Spa11
(1992), which were based on an asymptotic distribution
theory in Fabian (1968) (the author is unaware of any formal
distribution theory for the finite-sample case). We shall find
that one cannot make a universal claim that either SPSAl or
SPSA2 is the more efficient algorithm but that it is possible
to identify classes of problems for which SPSAl will be
(asymptotically) more efficient than SPSA2. The fundamen-
tal measure of efficiency here is the number of loss function
measurements (not number of iterations per se), since it is
loss function evaluations that represent the primary cost in
the optimization process.
For the most part, the regularity conditions here are
identical to those in Spa11 (1992) and hence will not be
repeated.? The only difference is that, because of the
different gradient approximations, two of the conditions on
the measurement noise are altered slightly. In particular, we
assume
E(Q 1 &, AJ = 0
Vk
var ( ek) -+ 0: as k-+ m for some (~2
(frequently var (ek) = uf Vk)
(these contrasts with the corresponding noise conditions on
pp. 333 and 335 respectively) of Spall, 1992. Then, following
the arguments of Proposition 1 in Spa11 (1992) we have the
following convergence result:
8, -+ 6 a.s., k+m
For the asymptotic distribution result that is critical to the
efficiency analysis, we consider gain sequences of the
standard form uk = a/(k + 1 + A)” and ck = c/(k + l)?, a, c,
(Y, y >O, A s-0, and let p = (Y - 2y. Then, following the
assumptions and proof of SPSA2,
kP’*(& - 6*)=N(p, PM,PT), k+m,
(4)
where p and P are as in Spa11 (1992, Proposition 2) (the
detailed definitions are not critical to the analysis here) and
M, (the subscript 1 is used to contrast with the corresponding
expression M2 used below for SPSA2) is given by
M, = uZc~2pZ{u~diag [(2h, -PI)-‘, ,
PA, - P+)-‘I + ue*)*o,
(5)
with EAF,~+ p* as k + 3~ VI, A; is the ith eigenvalue of the
matrix uH(El*), and /3+ = {p if cy = 1; 0 if a < 1).
Since p, p and P are identical in SPSAl and SPSA2, the
difference in efficiency centers on the difference between MI
and M2, where
M2 = fu*c-*p*g* diag [(2h, - p+)-‘, , (24 - P+)-‘I,
(6)
with a* representing the variance of the sum of the two noise
terms entering the SPSA2 gradient approximation (if the
noise terms are uncorrelated then u* = 2~2,). Note from (5)
and (6) that M, will tend to be larger (in the matrix sense)
t Let us take this opportunity to clarify two slightly
ambiguous conditions in Spa11 (1992). In Lemma 1, one of
the statements should read as follows: ‘. . . suppose that VB in
an open ball about 6,whose radius is not a function of k or
W, L@‘(B) I a3L/aeT aeT aeT exists continuously. . . (bold
italics highlight change). In Proposition 1, Condition A3
should read: sup, 11 &Jk 11 -c m a.s.
than Mz owing to the presence of the L(F)*1 contribution
(i.e. d> +a’). However, since SPSAl uses only half the
measurements of SPSA2 per iteration, efficiency gains are
still possible, as discussed below.
We aim to address the following question: To achieve the
sume level of mean-square accuracy in estimating 8, what is
the relative number of loss function measurements needed by
SPSAl and SPSA2? The asymptotic distribution results in (4)
and Spa11 (1992, Proposition 2) provide the machinery for
answering this question, provided that the absolute number
of measurements in both algorithms is sufficiently large. Let
k, and k2 denote the number of iterations and n, and n2 the
number of loss function evaluations in SPSAl and SPSAZ
respectively. Then from (4) and Spa11 (1992, Proposition 2)
we seek the ratio
“‘:krB(tr PM,PT+ pTp) = k;P(trPM2PT+ CLTCL),
*2
since the terms being equated represent the asymptotically
based approximation to E I/ok - 0*1/* for SPSAl and SPSA2
(this, of course, is under the standard conditions-e.g.
uniform integrability-that the second moments of the
asymptotic distributions correspond to the second moments
of the corresponding random process). From the fact that
k, = n,, k2 = $n2,
as the number of measurements for both procedures
becomes large.
The expression (7) provides a powerful means for
analyzing the relative asymptotic efficiency (analogous to
(4.2) in Spa11 (1992) for analyzing the efficiency of SPSA and
the classical Kiefer-Wolfowitz finite-difference method). In
particular, by specifying values of terms that are used in the
algorithm (a, c etc.) and that represent properties of the
measurements (a’., H(B*) etc.), one can determine if nl/n2
5 or > 1 as a guide to when SPSAl or SPSA2 is the more
efficient alaorithm for a given scenario.
To make (7) more concrete, let us consider an important
special case. Suppose u* = 2u$ (as mentioned earlier) and
I
that L(B*) = 0 (this may be true in certain mean-square
minimization or tracking problems or in cases where an
equivalent loss function can be formed by subtracting a
known value of the original loss function at 0*). Further,
suppose that both algorithms run with the same gain
coefficients a and c and that /3 = $ (corresponding to the
asymptotically optimal a = 1 and y = $,; see e.g. Fabian
(1971) or Chin (1997)). Then, from (5) and (6), Ml = 2M2
(since M2 is now identical to M, except that or is replaced by
2~:). Under these conditions, (7) reduces to
3’2
From (8) we find that as pTp/tr PM2PT ranges from 0 to m,
n,/n2 ranges from fi to $. In particular, pTp/tr PM2PT =
0.7024 is the point under or over which n,/nz is greater than
or less than one. Qualitatively, this makes sense, since the or.
contribution for both SPSAl and SPSAZ is identical for a
given number of iterations (see (4)); hence, as the p part
dominates, one would expect SPSAl to be the more efficient
algorithm.
Obviously, the analysis associated with (8) is no longer
valid if L(B*) #O. Then one should appeal to the more
general form in (7). In such settings, the range of problems
for which n,/nz is asymptotically less than one is smaller.
4. Numericul example
This section provides an illustration of SPSAl to a simple
quartic function with p = 5. Let
L(e) = eTe + 0.15 e: + 0.01 i ep,
,=1
i=l
which has 8* = 0. The measurement noise is independently
normally distributed with mean 0 and u< = 0.01, and
Technical Communiques
111
8, = (0.1, 0.1, 0.1, 0.1, O.l)‘r. All simulations were conducted
with MATLAB on a UNIX-based workstation.
We shall consider the setting where (8) applies, and shall
present numerical results for two settings: one where
(asymptotically) n,/nr > 1 and one where nl/nz < 1. Consis-
tent with practical guidelines given in Spa11 (1996), we take c
of magnitude similar to a, (we take c = 0.06 for both SPSAl
and SPSA2, somewhat larger than the c i= a, suggested in
Spa11 (1996) to accommodate the greater variability inherent
in a one-measurement form of the gradient approximation
and the greater rate of decay implies by the associated y
coefficient (y = 0.16667 here versus y = 0.101 suggested in
Spall, 1996)). The ALi values were independently Bernoulli
*l distribued for all k and i. Also, for both algorithms, we
include the additive constant A in the denominator of uk to
enhance numerical stability in the early iterations (i.e. we
take al, = a/(20 + k)), which, of course, has no effect on the
asymptotic theory.
With n =0.17, (8) implies that the asymptotic ratio
n1/n2 = 0.7606, while with a =0.27, the asymptotic ratio
implies the opposite in terms of efficiency, namely
n,/nr = 1.414. We ran numerical studies for both of these
cases. For n =0.17, we took n, =3042 and nZ=4000 (the
round number for n, was selected, and n, was then derived
based on the ratio value 0.7606). The value of a was chosen
to be slightly greater than the lower bound to a for this loss
function of &, as derived from the conditions in Fabian (1%8)
or Spa11 (1992). The theory associated with (8) suggests that
the mean-square errors (MSEs) associated with these two
sample sixes will be equal, and, in fact, that is nearly what
happened. The ratio of observed MSEs (SPSAl/SPSA2) was
0.98, based on an average of the terminal observed MSEs
over 508 realizations for each algorithm. For a = 0.27, we
take n, = 4000 and nz = 2828, consistent with the ratio 1.414
mentioned above. The ratio of observed MSEs here was 0.91
(so SPSAl performed slightly better than asymptotic theory
would predict).
There were some other observations associated with the
relative behavior of SPSAl and SPSA2. Although we found
that SPSAl could outperform SPSA2 in certain cases (as
above), we also found that it was more sensitive to small
changes in the underlying data-generating process and to the
choice of initial conditions. In fact, for certain initial
conditions not close to the solution, SPSAl diverged while
SPSA2 converged, including cases where, based on
asymptotic thery, SPSAl is the more efficient algorithm. The
reason for this relative lack of robustness is apparent by
comparing the Taylor expansion in (3) with the analogous
expansion for SPSA2: for SPSA2, the terms involving L(8)
and H(B) disappear versus merely having mean 0 in SPSAl.
When 8 is far from fJ*, these terms may significantly degrade
the gradient estimate (hence the greater sensitivity to initial
conditions). These numerical results suggest that although
SPSAl can be more efficient than SPSA2 in some
implementations, one should exercise caution in its
implementation, since nonasymptotic effects may have a
greater detrimental impact.
5. Concluding remarks and extensions
This paper has presented an extension of the SPSA
algorithm of Spa11 (1987, 1992). The primary contribution is
to reduce from two to one the number of loss function
measurements needed per iteration (to approximate the
gradient of the loss). Theory has been presented that
identifies the class of problems for which this twofold savings
in measurements per iteration translates into a measurement
savings over the course of the complete optimization process.
In particular, asymptotic results have been presented that
identify settings where the one- and two-measurement
forms of SPSA yield the same level of mean-square accuracy
but where SPSAl takes less than twice the number of
iterations of SPSA2 to achieve this accuracy (and hence uses
fewer loss function measurements).
Maeda, Y. (19%). Time difference simultaneous perturbation
method. Electron. Lett., 32, 1016-1018.
Maeda, Y., H. Hirano and Y. Kanata (1995). A learning rule
of neural networks via simultaneous perturbation and its
hardware implementation. Neural Networks, 8,251-259.
Polvak. B. T. and A. B. Juditskv (1992). Acceleration of
stochastic approximation by averaging.’ SIAM J. Control
Optim., 30,838-855.
Polyak, B. T. and A. B. Tsybakov (1992). On stochastic
approximation with arbitrary noise (the K-W case). Adu.
Soviet Math., l2, 107-113.
Rezayat, F. (1995). On the use of an SPSA-based model-free
controller in quality improvement. Automatica, 31,
913-915.
Ruppert, D. (1983). Kiefer-Wolfowitz procedure. In N. L.
Johnson and S. Katz (Eds), Encyclopedia of Statistical
Sciences. Vol. 4. DD. 379-381. Wilev. New York.
One of the areas for which SPSAl seems most appropriate
Ruppert, D. (1988). Efficient estimators from a slowly
is in feedback control problems. As described in, say, Spa11
and Cristion (1994, 1995) or Rezayat (1995) one can use
convergent Robbins-Monro process. Technical Report
781, School of Operations Research and Industrial
SPSA to build controllers without the need to build a model
Engineering, Cornell University.
of the process. Although this can be very effective for the
control of complex (nonlinear stochastic) systems, there is
the potential for difficulty if the process dynamics change
dramatically in the course of collecting the two measure-
ments for the SPSA2 gradient approximation. SPSAl has
obvious potential advantages in such settings, since it
approximates the gradient based on one instantaneous
measurement. Other topics of potential interest are to
explore the connection of SPSAl and SPSA2 to the one- and
two-measurement algorithms of Polyak and Tsybakov (1992)
and Maeda (1996), to evaluate whether the iterate averaging
idea of Ruppert (1988, 1991) and Polyak and Juditsky (1992)
might enhance convergence properties, and to determine
how constrained optimization might be implemented (Sadegh
(19%) treats the SPSA2 case).
In summary, for most problems, the two-measurement
form of SPSA previously introduced will be the preferred
algorithm. Asymptotic theory suggests that it will generally
be the more efficient algorithm (in terms of total function
evaluations required), and empirical evidence suggests that it
is also a more robust algorithm (to changes in initial
conditions, gain coefficients, noise levels etc.). However,
there is a class of problems for which the one-measurement
form of SPSA is (asymptotically) more efficient, and it is for
this class that the algorithm of this paper should be
considered. This is especially the case in feedback control
applications, where
nonstationarities
may
make the
instantaneous aspect of the one-measurement gradient
approximation especially appealing.
Acknowledgements-This work was supported by the
JHU/APL IRAD Program and U.S. Navy Contract
NOOO39-95-C-0002.
References
Allessandri, A. and T. Parisini (1995). Nonlinear modelhng
and state estimation in a real power plant using neural
networks and stochastic approximation. In Proc. American
Control Conf, Seattle, WA, pp. 1561-1567.
Chin, D. C. (1994). A more efficient global optimization
algorithm based on Styblinski and Tang. Neural Networks,
7,573-574.
Chin, D. C. (1997). Comparative study of stochastic
algorithms for system optimization based on gradient
approximations. IEEE Trans. Syst. Man Cybernetics,
SMC-27, in press.
Fabian, V. (1968). On asymptotic normality in stochastic
approximation. Ann. Math. Statist., 39, 1327-1332.
Fabian, V. (1971). Stochastic approximation. In J. J. Rustagi
(Ed.), Optimizing Methods in Statistics, pp. 439-470,
Academic Press, New York.
Hill, S. D. and M. C. Fu (1995). Transfer optimization via
simultaneous perturbation stochastic approximation. In C.
Alexopoulos, K. Kang, W. R. Lilegdon and D. Goldsman
(Eds), Proc. Winter Simulation Conf, Washington, DC,
pp. 242-249.
112
Technical Communiques
Ruppert, D. (1991). Stochastic approximation. In B. K.
Ghosh and P. K. Sen (Eds), Handbook of Sequential
Analysis, pp. 503-529. Marcel Dekker, New York.
Sadegh, P. (1996). Constrained optimization via stochastic
approximation with a simultaneous perturbation gradient
approximation. Automatica, to appear (also Technical
Report 3/%, Institute of Mathematical Modelling,
Technical University of Denmark, Lyngby).
Smith, R. H.
and D. C. Chin (1995). Evaluation of an
adaptive traffic control technique with underlying system
changes. In C. Alexopoulos, K. Kang, W. R. Lilegdon and
D. Goldsman (Eds), Proc. Winter Simulation Conf.,
Washington, DC, pp. 1124-1130.
Spall, J. C. (1987). A stochastic approximation technique for
generating maximum likelihood parameter estimates. In
Proc. American Control Conf, Minneapolis, MN, pp.
1161-1167.
Spall, J. C. (1992). Multivariate stochastic approximation
using a simultaneous perturbation gradient approximation.
IEEE Trans. Autom. Control, AC-37,332-341.
Spall, J. C. (1996). Implementation of the simultaneous
perturbation algorithm for stochastic optimization.
JHU/APL Technical Report.
Spall, J. C. and J. A. Cristion (1994). Nonlinear adaptive
control using neural networks: estimation based on a
smoothed form of simultaneous perturbation gradient
approximation. Statistica Sinica, 4, l-27.
Spall, J. C. and J. A. Cristion (1995). Model-free control of
nonlinear stochastic systems in discrete time. In Proc. 34th
IEEE Conf. on Decision and Control, New Orleans, LA,
pp. 2199-2204.
APPENDIX
Correction to Expression (5) in
“A One-Measurement Form of Simultaneous Perturbation Stochastic Approximation”
The Issue
As shown in expression (4) of Spall (1997), the covariance matrix in the asymptotic
distribution for the normalized one-measurement SPSAi.e., for
/2
ˆ
()
k
k
β
θθ
is PM
1
P
T
,
where P is an orthogonal matrix (PP
T
= I). (All notation here is directly out of Spall, 1997.)
However, the expression M
1
in (5) of Spall (1997) is not presented correctly for the general case
where L(
θ
) 0. In particular, there is a term +L(
θ
)
2
that should be placed differently in the
expression. The results are summarized below.
Current Expression for M
1
:
{
}
2 22 2 1 1 2
11
diag 2 , , 2
( ) ( ) ()
p
ac L
−−
ε+ +

= ρ σ λ −β λ −β +

θ
MI
. (A1)
Correct Expression for M
1
:
2 22 2 2 1 1
11
diag 2 , , 2[ ( )] ( ) ( )
p
ac L
−−
ε ++

= ρ σ + λ −β λ −β

θM
. (A2)
Note that the issue above does not affect the efficiency analysis in (7) and (8) of Spall (1997),
which shows the relative asymptotic accuracy of one-measurement SPSA and standard two-
measurement SPSA (Spall, 1992).
Derivation
Let us use the main theorem in Fabian (1968) to show that (A2) is the correct expression,
replacing (A1) (which is the same as (5) in Spall, 1997). In particular, following the notation of
Fabian (1968),
( )/2 /2
1
ˆˆ
,( )( )
k k k kk k
k kk
−α α+β −α−β
+
∗∗
−=− +θ θ Γθθ ΦI VT
where
Γ
k
=
12
T
k k kp
a


hh h
and row vector
T
kj
h
represents the jth row of the Hessian
matrix H(
θ
) evaluated at a value of
θ
on the line segment between
ˆ
k
θ
and
θ
(the point of
evaluation may change by row; hence
Γ
k
may not be symmetric for finite k);
Φ
k
=
Φ
= – aI;
V
k
=
;
and T
k
=
/2
ˆ
()
kk
ak
β
θb
.
As discussed Spall (1997), it is known that
ˆ
k
θ
θ
a.s. under standard SPSA conditions
(see Spall, 2003, Sect. 7.3). Likewise, it is known that (a.s.) we have the following convergence
results:
Γ
k
aH(
θ
) and
()
T
kk k
E
VV
22 2 2
()cL
ε

ρ σ+

θ
I
=
Σ
, where
k
=
0
ˆ
{θ
,
1
ˆ
θ
,…,
ˆ
;
k
θ
01 1
, ,..., }
k
∆∆
for k 1.
From Fabian (1968), the covariance matrix in the asymptotic normal distribution for
/2
ˆ
()
k
k
β
θθ
is PM
1
P
T
, where the ijth entries in M
1
are
1
1;
if ,
=
0 if ,
[ ] (2 )
TT
ii i
ij
ij
ij
λ −β =
+
ΦΣΦPP
M
(A3)
and where λ
i
is the ith eigenvalue of aH(
θ
) and β
+
= β if α = 1 and β
+
= 0 if α 1. (The mean
in the asymptotic normal distribution is
µ
, but
µ
is unaffected by the issue with M
1
above.) Given
the diagonal forms above for
Φ
and
Σ
, we have
[]
TT
ii
ΦΣΦPP
=
22 2 2
()cL
ε

ρ σ+

θ
for all i.
Then, from (A3), it is known that the corrected form in (A2) follows.
Acknowledgement
I thank Dr. Mohammed S. Abdulla of the Indian Institute of Management, Kozhikode,
India, for bringing this issue to my attention. The correction above was noted in Abdulla and
Bhatnagar (2006, p. 322). — J. C. Spall
References
Abdulla, M. S. and Bhatnagar, S. (2006), “SPSA Algorithms with Measurement Reuse,”
Proceedings of the Winter Simulation Conference, Monterey, CA, 3–6 December 2006, pp.
320–328.
Fabian, V. (1968), “On Asymptotic Normality in Stochastic Approximation,” Annals of
Mathematical Statistics, vol. 39, pp. 13271332.
Spall, J. C. (1992), “Multivariate Stochastic Approximation Using a Simultaneous Perturbation
Gradient Approximation,” IEEE Transactions on Automatic Control, vol. 37(3), pp. 332
341.
Spall, J. C. (1997), “A One-Measurement Form of Simultaneous Perturbation Stochastic
Approximation,” Automatica, vol. 33, pp. 109–112.
Spall, J. C. (2003), Introduction to Stochastic Search and Optimization: Estimation, Simulation,
and Control, Wiley, Hoboken, NJ.