The Annals of Statistics
2020, Vol. 48, No. 6, 3417–3441
https://doi.org/10.1214/19-AOS1936
© Institute of Mathematical Statistics, 2020
ROBUST MULTIVARIATE NONPARAMETRIC TESTS VIA
PROJECTION AVERAGING
B
Y ILMUN KIM
*
,SIVARAMAN BALAKRISHNAN
AND LARRY WASSERMAN
Department of Statistics & Data Science, Carnegie Mellon University,
*
In this work, we generalize the Cramér–von Mises statistic via projection
averaging to obtain a robust test for the multivariate two-sample problem.
The proposed test is consistent against all fixed alternatives, robust to heavy-
tailed data and minimax rate optimal against a certain class of alternatives.
Our test statistic is completely free of tuning parameters and is computation-
ally efficient even in high dimensions. When the dimension tends to infinity,
the proposed test is shown to have comparable power to the existing high-
dimensional mean tests under certain location models. As a by-product of
our approach, we introduce a new metric called the angular distance which
can be thought of as a robust alternative to the Euclidean distance. Using the
angular distance, we connect the proposed method to the reproducing ker-
nel Hilbert space approach. In addition to the Cramér–von Mises statistic,
we demonstrate that the projection-averaging technique can be used to define
robust multivariate tests in many other problems.
1. Introduction. Let X and Y be random vectors defined on a common probability space
(,
A, P) with distributions P
X
and P
Y
, respectively. Given two mutually independent sam-
ples
X
m
={X
1
,...,X
m
} and Y
n
={Y
1
,...,Y
n
} from P
X
and P
Y
, we want to test
H
0
: P
X
= P
Y
versus H
1
: P
X
= P
Y
.(1.1)
This fundamental problem has received considerable attention in statistics with a wide range
of applications (see, e.g., Thas (2010), for a review). A common statistic for the univariate
two-sample testing is the Cramér–von Mises (CvM) statistic (Anderson (1962)):
mn
m + n
F
X
(t)
F
Y
(t)
2
d
H(t),
where
F
X
(t) and
F
Y
(t) are the empirical distribution functions of X
m
and Y
n
, respectively,
and (m + n)
H(t) = m
F
X
(t) + n
F
Y
(t). Another approach is based on the energy statistic,
which is an estimate of the squared energy distance (Székely and Rizzo (2013)):
E
2
= 2E
|X
1
Y
1
|
E
|X
1
X
2
|
E
|Y
1
Y
2
|
,
where |x| is the absolute value of x R. The energy distance is well defined assuming a
finite first moment and it can be written in a form that is similar to Cramér’s distance (Cramér
(1928)), namely,
E
2
= 2
−∞
F
X
(t) F
Y
(t)
2
dt,
where F
X
(t) and F
Y
(t) are the distribution functions of X and Y , respectively.
Received May 2019; revised December 2019.
MSC2020 subject classifications. Primary 62G10, 62H15, 62H20; secondary 62G35.
Key words and phrases. Energy statistic, high dimension and low sample size, independence testing, maximum
mean discrepancy, permutation tests, U-statistic.
3417
3418 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
The CvM-statistic has several advantages over the energy statistic for univariate two-
sample testing. For instance, the CvM-statistic is distribution-free under H
0
(Anderson
(1962)) and its population counterpart is well defined without any moment assumptions. It
also has an intuitive probabilistic interpretation in terms of probabilities of concordance and
discordance of four independent random variables (Baringhaus and Henze (2017)). Never-
theless, the CvM-statistic has rarely been studied for multivariate testing. A primary reason
is that the CvM-statistic is essentially rank-based, which leads to a challenge to generalize
it in a multivariate space. In contrast, the energy statistic can be easily applied in arbitrary
dimensions as in Baringhaus and Franz (2004)andSzékely and Rizzo (2004). Specifically,
they defined the squared multivariate energy distance by
E
2
d
(P
X
,P
Y
) = 2E
X
1
Y
1
E
X
1
X
2
E
Y
1
Y
2
,(1.2)
where ·is the Euclidean norm in R
d
. The multivariate energy distance maintains the
characteristic property that it is always nonnegative and equal to zero if and only if P
X
= P
Y
.
It can also be viewed as the average of univariate Cramér’s distances of projected random
variables (Baringhaus and Franz (2004)):
E
2
d
(P
X
,P
Y
) =
π(d 1)(
d1
2
)
(
d
2
)
S
d1
R
F
β
X
(t) F
β
Y
(t)
2
dt (β),(1.3)
where λ represents the uniform probability measure on the d-dimensional unit sphere S
d1
=
{x R
d
:x=1}, (·) is the gamma function and the symbol stands for the transpose
operation.
Although the multivariate energy distance can be easily estimated in any dimension, it still
requires the finite moment assumption as in the univariate case. When the underlying dis-
tributions violate this moment condition with potential outliers, the energy statistic becomes
extremely unstable and the resulting test might perform poorly. Given that outlying obser-
vations arise frequently in practice with high-dimensional data, there is a need to develop a
robust counterpart of the energy distance. The primary goal of this work is to introduce a
robust, tuning parameter-free, two-sample testing procedure that is easily applicable in ar-
bitrary dimensions and consistent against all fixed alternatives. Specifically, we modify the
univariate CvM-statistic to generalize it to an arbitrary dimension by averaging over all one-
dimensional projections. In detail, the proposed test statistic is an unbiased estimate of the
squared multivariate CvM-distance defined as follows:
W
2
d
(P
X
,P
Y
) =
S
d1
R
F
β
X
(t) F
β
Y
(t)
2
dH
β
(t) dλ(β),(1.4)
where H
β
(t) = ϑ
X
F
β
X
(t) +ϑ
Y
F
β
Y
(t) and ϑ
X
is a fixed value in (0, 1) and ϑ
Y
= 1 ϑ
X
.
For simplicity and when there is no ambiguity, we may omit the dependency on P
X
, P
Y
and
write W
d
(P
X
,P
Y
) as W
d
.
Throughout this paper, we refer to the process of averaging over all projections as projec-
tion averaging.
1.1. Summary of our results. The proposed multivariate CvM-distance shares some ap-
pealing properties of the energy distance while being robust to heavy-tailed distributions or
outliers. For example, W
d
is invariant to orthogonal transformations and satisfies the charac-
teristic property (Lemma 2.1), meaning that W
d
is nonnegative and equal to zero if and only
if P
X
= P
Y
. More importantly, it is straightforward to estimate W
d
without using any tuning
parameters (Theorem 2.1). Based on an unbiased estimate of W
2
d
, we apply the permutation
test procedure to determine a critical value of the test statistic. Although the permutation ap-
proach has been standard in practical implementations of two-sample testing, its theoretical
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3419
properties have been less explored beyond simple cases (e.g., Pesarin (2001)). Indeed, previ-
ous studies usually consider asymptotic tests in their theory section whereas their actual tests
are calibrated via permutations. We bridge the gap between theory and practice by presenting
both theoretical and empirical results on the permutation test under various scenarios. Our
main results regarding the CvM-distance are summarized as follows:
Closed-form expression (Section 2): Building on Escanciano (2006)andZhu et al. (2017),
we show that the test statistic has a simple closed-form expression.
Asymptotic power (Section 2): We prove that the permutation test based on the proposed
statistic has the same asymptotic power as the oracle and asymptotic tests that assume
knowledge of the underlying distributions (Section 2.2) against fixed and contiguous alter-
natives.
Robustness (Section 3): We show that the permutation test based on the proposed statistic
maintains good power in the contamination model, while the energy test becomes com-
pletely powerless in this setting.
Minimax optimality (Section 4): We analyze the finite-sample power of the proposed per-
mutation test and prove its minimax rate optimality against a class of alternatives that differ
from the null in terms of the CvM-distance. We also show that the energy test is not optimal
in our context.
HDLSS behavior (Section 5): We consider a high-dimension, low-sample size (HDLSS)
regime where the dimension tends to infinity while the sample size is fixed. Under this
regime, we identify sufficient conditions under which the power of the proposed test con-
verges to one. In addition, we show that the proposed test has comparable power to the
high-dimensional mean tests introduced by Chen and Qin (2010)andChakraborty and
Chaudhuri (2017) under certain location models.
Angular distance (Section 6): We introduce the angular distance between two vectors and
use this to show that the multivariate CvM-distance is a special case of the generalized
energy distance (Sejdinovic et al. (2013)). Furthermore, the CvM-distance is the maximum
mean discrepancy (Gretton et al. (2012)) associated with the angular distance.
Beyond the CvM-statistic, the projection-averaging technique can be widely applicable
to other nonparametric statistics. In the second part of this study, we revisit some famous
univariate sign- or rank-based statistics and propose their multivariate counterparts via pro-
jection averaging. Although there has been much effort to extend univariate sign- or rank-
based statistics in a multivariate space (see, e.g., Hettmansperger, Möttönen and Oja (1998),
Liu (2006), Oja (2010), Oja and Randles (2004)), they are either computationally expensive
to implement or less intuitive to understand. Our projection-averaging approach addresses
these issues by providing a tractable calculation form of statistics and by having a direct
interpretation in terms of projections. In Section 7 and also Appendix D.8, we demonstrate
the generality of the projection-averaging approach by presenting multivariate extensions of
several existing univariate statistics.
1.2. Literature review. There are a number of multivariate two-sample testing proce-
dures available in the literature. We list some fundamental methods and recent developments.
Anderson, Hall and Titterington (1994)
proposed the two-sample statistic based on the inte-
grated square distance between two kernel density estimates. The energy statistic was intro-
duced by Baringhaus and Franz (2004)andSzékely and Rizzo (2004) independently. Biswas
and Ghosh (2014) modified the energy statistic to improve the performance of the previ-
ous test for the high-dimensional location-scale and scale problems. Gretton et al. (2012)
introduced a class of distances between two probability distributions, called the maximum
mean discrepancy (MMD), based on a reproducing kernel Hilbert approach. Sejdinovic et al.
3420 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
(2013) showed that the energy distance is a special case of the MMD associated with the
kernel induced by the Euclidean distance. Recently, Pan et al. (2018) proposed a new met-
ric, named the ball divergence, between two probability distributions and connected it to the
MMD. A further review of kernel-based two-sample tests can be found in Harchaoui et al.
(2013).
Another line of work is based on graph constructions. Schilling (1986)andHenze (1988)
introduced a multivariate two-sample test based on the k nearest neighbor (NN) graph.
Mondal, Biswas and Ghosh (2015) pointed out that the previous NN test may suffer from
low power for the high-dimensional location-scale problem and provided an alternative that
addresses this limitation. Another variant of the NN test, which is tailored to imbalanced sam-
ples, can be found in Chen, Dou and Qiao (2013). Friedman and Rafsky (1979) considered
minimum spanning tree (MST) to present a generalization of the univariate run test in Wald
and Wolfowitz (1940). The MST test proposed by Friedman and Rafsky (1979) has recently
been modified by Chen and Friedman (2017)andChen, Chen and Su (2018) to improve
power under scale alternatives and imbalanced samples, respectively. Rosenbaum (2005)
proposed a distribution-free test in finite samples based on cross-matches. More recently,
Biswas, Mukhopadhyay and Ghosh (2014) introduced another distribution-free test based
on the shortest Hamiltonian path. A general theoretical framework for graph-based tests has
been established by Bhattacharya (2018), Bhattacharya (2019). Other recent developments
include Liu and Modarres (2011), Kanamori, Suzuki and Sugiyama (2012), Bera, Ghosh and
Xiao (2013), Lopez-Paz and Oquab (2016), Zhou, Zheng and Zhang (2017), Mukhopadhyay
and Wang (2018
), among others.
The
projection-averaging approach to CvM-type statistics can be found in other statistical
problems. For example, Zhu, Fang and Bhatti (1997)andCui (2002) considered the CvM-
statistic using projection averaging to investigate one-sample goodness-of-fit tests for mul-
tivariate distributions. Escanciano (2006) proposed the CvM-based goodness-of-fit test for
parametric regression models. To the best of our knowledge, however, this is the first study
that investigates the CvM-statistic for the multivariate two-sample problem via projection
averaging.
Our technique to obtain a closed-form expression for projection-averaging statistics is
based on Escanciano (2006). The same principle has been exploited by Zhu et al. (2017)inthe
context of testing for multivariate independence. We further extend the result of Escanciano
(2006) to more general cases and provide an alternative proof using orthant probabilities for
normal distributions.
Outline. The rest of this paper is organized as follows. In Section 2, we introduce our test
statistic and the permutation test procedure. We then study their limiting behaviors under the
conventional fixed dimension asymptotic framework. In Section 3, we compare the power
of the CvM test with that of the energy test and highlight the robustness of the CvM test.
Section 4 establishes minimax rate optimality of the proposed test against a certain class of
alternatives associated with the CvM-distance. In Section 5, we study the asymptotic power of
the CvM test in the HDLSS setting. We introduce the angular distance between two vectors
in Section 6 to show that the CvM-distance is the generalized energy distance built on the
introduced metric. In Section 7, the projection-averaging technique is applied to other sign-
or rank-based statistics and this allows us to provide new multivariate extensions. Simulation
results are reported in Section 8 to demonstrate the competitive power performance of the
proposed approach with finite sample size. All proofs of the main results are deferred to the
Supplementary Material (Kim, Balakrishnan and Wasserman (2020)).
Notation. For two nonzero vectors U
1
,U
2
R
d
, we denote the angle between U
1
and U
2
by
Ang(U
1
,U
2
) = arccos{U
1
U
2
/(U
1
U
2
)}.For1q p,welet(p)
q
= p(p 1) ···(p
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3421
q +1).LetP
0
and P
1
be the probability measures under H
0
and H
1
, respectively. Similarly,
E
0
and E
1
stand for the expectations with respect to P
0
and P
1
. For any two real sequences
{a
n
} and {b
n
},weusea
n
b
n
if there exist constants C,C
> 0 such that C<|a
n
/b
n
| <C
for each n. We write a
n
= O(b
n
) if there exists C>0 such that |a
n
|≤C|b
n
| for each n.
We also write a
n
= o(b
n
) if lim
n→∞
a
n
/b
n
= 0. For a sequence of random variables X
n
,we
use the notation X
n
= O
P
(a
n
) when X
n
is bounded in probability (tight). The acronym i.i.d.
stands for independent and identically distributed and we use the symbol X
1
,...,X
n
i.i.d.
P to
represent that X
1
,...,X
n
are i.i.d. samples from distribution P . We denote the d ×d identity
matrix by I
d
. The symbol 1(·) is used for indicator functions. We write summation over the
set of all k-tuples drawn without replacement from {1,...,n} by
n,=
i
1
,...,i
k
=1
. Throughout this
paper, we assume that all vectors are column vectors and m, n 2.
2. Projection averaging-type Cramér–von Mises statistics. In this section, we start
with the basic properties of the CvM-distance. We then introduce our test statistic and study
its limiting behavior. We end this section with a description of the permutation test and its
large sample properties. Throughout this section, we consider the conventional asymptotic
regime where the dimension is fixed and
m
m + n
ϑ
X
(0, 1) and
n
m + n
ϑ
Y
(0, 1) as N = m +n →∞.(2.1)
Let us first establish the characteristic property of the CvM-distance.
L
EMMA 2.1 (Characteristic property). W
d
is nonnegative and has the characteristic
property:
W
d
(P
X
,P
Y
) = 0 if and only if P
X
= P
Y
.
Note that W
d
involves integration over the unit sphere. One way to approximate this inte-
gral is to consider a subset of S
d1
, namely {β
1
,...,β
k
}, and then to take the sample mean
over k different univariate CvM-statistics (see, e.g., Zhu, Fang and Bhatti (1997)). However,
this approach has an unpleasant trade-off between accuracy and computational time depend-
ing on the choice of k. The problem becomes even worse in high dimensions where one may
need exponentially many projections to achieve a certain accuracy. Our approach, on the other
hand, does not suffer from this computational issue by explicitly calculating the integral over
S
d1
. The explicit form of the integration is mainly due to Escanciano (2006) who provided
the following lemma.
L
EMMA 2.2 (Escanciano (2006)). For any nonzero vectors U
1
,U
2
R
d
,
S
d1
1
β
U
1
0
1
β
U
2
0
(β) =
1
2
1
2π
Ang(U
1
,U
2
).
R
EMARK 2.1. Escanciano (2006) proved Lemma 2.2 using the volume of a spherical
wedge. In the Supplementary Material (Appendix C.2), we provide an alternative proof of
this result based on orthant probabilities for normal distributions. We also extend this result
to integration involving strictly more than two indicator functions in the Supplementary Ma-
terial (Lemma B.8 and Lemma D.1). This extension allows us to generalize the univariate τ
(Bergsma and Dassios (2014)) in Theorem 7.1 and potentially many other univariate statistics
(see Appendix D.8).
BasedonLemma2.2, we give another representation of W
2
d
in terms of the expected angle
involving three independent random vectors. Here and hereafter, we assume that
β
X and β
Y have continuous distribution functions for λ-almost all β S
d1
.
3422 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
This continuity assumption greatly simplifies the alternative expression for W
2
d
and avoids
the possibility that
Ang(·, ·) is not well defined when one of the inputs is a zero vector. This
issue may be handled by defining
Ang(·, ·) differently for those exceptional cases, but we do
not pursue this direction here.
T
HEOREM 2.1 (Closed-form expression). Suppose that X
1
,X
2
i.i.d.
P
X
and, indepen-
dently, Y
1
,Y
2
i.i.d.
P
Y
. Then the squared multivariate CvM-distance can be written as
W
2
d
(P
X
,P
Y
) =
1
3
1
2π
E
Ang
(X
1
Y
1
,X
2
Y
1
)
1
2π
E
Ang
(Y
1
X
1
,Y
2
X
1
)
.
The above result highlights that W
d
(P
X
,P
Y
) is invariant to the choice of ϑ
X
and ϑ
Y
under
the continuity assumption. In the next subsection, we introduce the test statistic and study its
limiting behavior.
2.1. Test statistic and limiting distributions. Theorem 2.1 leads to a natural empirical
estimate of W
2
d
based on a U -statistic. Consider the kernel of order two:
(2.2)
h
CvM
(x
1
,x
2
;y
1
,y
2
) =
1
3
1
2π
Ang(x
1
y
1
,x
2
y
1
)
1
2π
Ang(y
1
x
1
,y
2
x
1
).
We denote its symmetrized version, which is invariant to the order of the first two arguments
as well as the last two arguments, by
h
CvM
(x
1
,x
2
;y
1
,y
2
) =
1
2
h
CvM
(x
1
,x
2
;y
1
,y
2
) +
1
2
h
CvM
(x
2
,x
1
;y
2
,y
1
).
Then our test statistic is defined as follows:
U
CvM
=
m
2
1
n
2
1
1i
1
<i
2
m
1j
1
<j
2
n
h
CvM
(X
i
1
,X
i
2
;Y
j
1
,Y
j
2
).(2.3)
Leveraging the basic theory of U -statistics (e.g., Lee (1990)), it is clear that U
CvM
is an
unbiased estimator of W
2
d
. Additionally, U
CvM
is a degenerate U -statistic under the null hy-
pothesis as we prove in the Supplementary Material (Appendix C.5). Hence we can apply
the asymptotic theory for a degenerate two-sample U -statistic (Chapter 3 of Bhat (1995)) to
obtain the following result.
T
HEOREM 2.2 (Asymptotic null distribution of U
CvM
). For e ach k =1, 2,..., let λ
k
be
the eigenvalue with the corresponding eigenfunction φ
k
satisfying the integral equation
E
E
h
CvM
(x
1
,X
2
;Y
1
,Y
2
) | X
2
φ
k
(X
2
)
= λ
k
φ
k
(x
1
).(2.4)
Then U
CvM
has the limiting null distribution under the limiting regime (2.1) given by
NU
CvM
d
−→ ϑ
1
X
ϑ
1
Y
k=1
λ
k
ξ
2
k
1
,
where ξ
k
i.i.d.
N(0, 1) and
d
−→ stands for convergence in distribution.
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3423
Under a fixed alternative hypothesis where P
X
and P
Y
do not change with m and n,the
proposed test statistic converges weakly to a normal distribution. We build on Hoeffding’s de-
composition of a two-sample U -statistic (e.g., page 40 of Lee (1990)) to prove the following
result.
T
HEOREM 2.3 (Asymptotic distribution of U
CvM
under fixed alternatives). Let us define
σ
2
h
X
= V
E
h
CvM
(X
1
,X
2
;Y
1
,Y
2
) | X
1

and
σ
2
h
Y
= V
E
h
CvM
(X
1
,X
2
;Y
1
,Y
2
) | Y
1

,
where V(·) is the variance operator. Then under the limiting regime (2.1) and fixed alternative
P
X
= P
Y
, we have
N
U
CvM
W
2
d
d
−→ N
0, 4ϑ
1
X
σ
2
h
X
+4ϑ
1
Y
σ
2
h
Y
.
From the previous two theorems, it is clear to see that NU
CvM
is stochastically bounded
under the null hypothesis whereas it diverges to infinity under fixed alternatives. Thus one
can expect that any reasonable test based on the proposed test statistic is consistent (meaning
that the power converges to one as N →∞) against all fixed alternatives. In fact, the prob-
lem of distinguishing two fixed distributions is too easy in large sample situations and many
of nonparametric tests are known to be consistent in this asymptotic regime. We therefore
turn now to a more challenging scenario where a distance between P
X
and P
Y
diminishes
as the sample size increases. To this end, we make a standard assumption that the underly-
ing distributions belong to quadratic mean differentiable (QMD) families (e.g., Bhattacharya
(2019)).
D
EFINITION 2.1 (Quadratic mean differentiable families, page 484 of Lehmann and Ro-
mano (2005)). Let {P
θ
} be a family of probability distributions on (R
d
, B) where
B is the Borel σ -field associated with R
d
and is an open subset of R
p
. Assume each P
θ
is absolutely continuous with respect to Lebesgue measure and set p
θ
(t) = dP
θ
(t)/dt.The
family {P
θ
} is quadratic mean differentiable at θ
0
if there exists a vector of real-valued
functions η(·
0
) =
1
(·
0
),...,η
p
(·
0
))
such that
R
d
p
θ
0
+b
(t)
p
θ
0
(t) b
η(t,θ
0
)
2
dt = o
b
2
as b→0.
The QMD families include a broad class of parametric distributions such as exponential
families in natural form. By focusing on the QMD families, we are particularly interested
in asymptotically nondegenerate situations where the limiting sum of the type I and type II
errors of the optimal test is nontrivial, that is, bounded by the nominal level α and one. It has
been shown that when P
θ
0
and P
θ
N
belong to the QMD families, this nondegenerate situation
occurs when θ
0
θ
N
N
1/2
(Chapter 13.1 of Lehmann and Romano (2005)). Hence we
consider a sequence of contiguous alternatives where θ
N
= θ
0
+ bN
1/2
for some b R
p
and establish the asymptotic behavior of U
CvM
under the given scenario. Our result builds on
the prior work by Chikkagoudar and Bhat (2014) and extends it to multivariate cases.
T
HEOREM 2.4 (Asymptotic distribution of U
CvM
under contiguous alternatives). As-
sume {P
θ
} is quadratic mean differentiable at θ
0
with derivative η(·
0
) and is
an open subset of R
p
. Define the Fisher information matrix to be the matrix I(θ) with (i, j )
entry
I
i,j
) = 4
R
d
η
i
(t, θ
j
(t, θ) dt,
3424 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
and assume that I(θ
0
) is nonsingular. Suppose we observe X
m
i.i.d.
P
θ
0
and Y
n
i.i.d.
P
θ
0
+bN
1/2
for b R
p
. Then under the limiting regime (2.1),
NU
CvM
d
−→ ϑ
1
X
ϑ
1
Y
k=1
λ
k

ξ
k
+ϑ
1/2
X
a
k
2
1
,
where a
k
=
R
d
2{b
η(x,θ
0
)}p
1/2
θ
0
(x
k
(x) dP
θ
0
(x).
The above theorem implies that if there exists k 1 such that a
k
= 0andλ
k
> 0, a test
based on U
CvM
can have asymptotic power greater than α (see page 615 of Lehmann and
Romano (2005)). This is in contrast to the NN test which has a slower consistency rate given
by N
1/4
when d 8 under some regularity conditions (see Bhattacharya (2018)). In the
Supplementary Material, we consider low-dimensional Gaussian location models and illus-
trate that the proposed CvM test dominates the NN test via simulations. In fact, the former
tends to have very close power to Hotelling’s T
2
test, which is known to be optimal under the
low-dimensional Gaussian scenarios (Anderson (2003)).
2.2. Critical value and permutation test. So far, we have investigated the limiting behav-
iors of the test statistic under the null and (fixed and contiguous) alternative hypotheses. It
is important to note that the performance of a test depends not only on its test statistic but
also crucially on its critical value. A common approach to determining the critical value is
based on the limiting null distribution of the test statistic. Since we are dealing with a general
composite null, one can define this limiting null distribution in various ways. Two natural
candidates are described as follows:
P
(single)
CvM
: the limiting distribution of NU
CvM
based on i.i.d. samples from the single distri-
bution P
X
.
P
(mix)
CvM
: the limiting distribution of NU
CvM
based on i.i.d. samples from the mixture distri-
bution ϑ
X
P
X
+ϑ
Y
P
Y
.
These two limiting distributions P
(single)
CvM
and P
(mix)
CvM
coincide when P
X
= P
Y
but they are
different in general if P
X
= P
Y
. By invoking Theorem 2.2, we can conclude that P
(single)
CvM
and P
(mix)
CvM
are Gaussian chaos distributions in the low-dimensional setting. The asymptotic
tests then reject the null hypothesis when NU
CvM
is greater than the upper 1 α quantile of
P
(single)
CvM
or P
(mix)
CvM
, denoted by q
(single)
α,CvM
and q
(mix)
α,CvM
, respectively.
Unfortunately, this asymptotic approach is infeasible as the limiting distributions involve
quantities that depend on the underlying distributions and that cannot be easily estimated.
Even if either P
(single)
CvM
or P
(mix)
CvM
is known exactly, the resulting asymptotic test does not have
finite-sample guarantees on the type I error control. For this reason, we advocate for using the
permutation procedure that resolves the issues of the asymptotic approach. More importantly,
asshowninTheorem2.6, the power of the permutation test is asymptotically the same as that
of the asymptotic tests under the conventional asymptotic regime.
Before we describe the permutation procedure, let us introduce the oracle test that serves
as a benchmark for the permutation test. Let T
m,n
be a generic two-sample test statistic. Then
the critical value of the oracle test based on T
m,n
can be determined as follows:
Oracle test.
1. Consider new i.i.d. samples {
Z
1
,...,
Z
N
} from the mixture ϑ
X
P
X
+ϑ
Y
P
Y
.
2. Let T
m,n
(
Z) be the test statistic of interest calculated based on
X
m
={
Z
1
,...,
Z
m
}
and
Y
n
={
Z
m+1
,...,
Z
N
}.
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3425
3. Given a significance level 0 <1, return the critical value c
α,m,n
defined by
c
α,m,n
:= inf
t R : 1 α P
T
m,n
(
Z) t

.(2.5)
It is worth pointing out that the oracle statistic T
m,n
(
Z) has the same distribution as the
test statistic based on the original samples under H
0
, but not necessarily under H
1
. Hence
the oracle test based on c
α,m,n
is exact under H
0
and can be powerful under H
1
.However,
c
α,m,n
relies on the unknown mixture distribution ϑ
X
P
X
+ ϑ
Y
P
Y
, which makes the oracle
test impractical. In sharp contrast, the critical value of the permutation test can be obtained
without knowledge of the mixture distribution as follows:
Permutation test.
1. Let {Z
1
,...,Z
N
}={X
1
,...,X
m
,Y
1
,...,Y
n
} be the pooled samples and Z
=
{Z
(1)
,...,Z
(N)
} where ={(1),...,(N)} is a permutation of {1,...,N}.
2. Let T
m,n
(Z
) be the test statistic of interest calculated based on X
m
={Z
(1)
,...,
Z
(m)
} and Y
n
={Z
(m+1)
,...,Z
(N)
}.
3. Given a significance level 0 <1, return the critical value c
α,m,n
defined by
c
α,m,n
:= inf
t R : 1 α
1
N!
S
N
1
T
m,n
(Z
) t
,(2.6)
where
S
N
is the set of all permutations of {1,...,N}.
In Theorem 2.5, we show that the difference between c
α,m,n
and c
α,m,n
for the proposed
statistic is asymptotically negligible under both the null and alternative hypotheses. This con-
nection in turn implies that the permutation critical value converges to q
(mix)
α,CvM
, which is the
limit of the oracle critical value by construction. Moreover, under the contiguous alterna-
tive, we also establish that q
(single)
α,CvM
is the same as q
(mix)
α,CvM
. Building on this observation, we
formally prove that (i) the permutation test, (ii) the oracle test and (iii) the asymptotic tests
based on P
(single)
CvM
and P
(mix)
CvM
have the same asymptotic power against both contiguous and
fixed alternatives in Theorem 2.6. In doing so, we develop a general asymptotic theory for the
permutation distribution of a two-sample degenerate U-statistic under H
0
. This general result
is established based on Hoeffding’s conditions (Hoeffding (1952)) and extended to H
1
via the
coupling argument (Chung and Romano (2013)). The details can be found in Appendix A.
Let us denote by c
α,CvM
and c
α,CvM
the critical values of the oracle test and the permutation
test based on the scaled CvM-statistic, that is, NU
CvM
, as described in the procedures (2.5)
and (2.6), respectively. Then our result on the critical values is stated as follows.
T
HEOREM 2.5 (Asymptotic behavior of the critical values). Consider the conventional
limiting regime in (2.1) with the additional assumption that m/N ϑ
X
= O(N
1/2
). Then
under both the null and (fixed or contiguous) alternative hypotheses,
c
α,CvM
p
−→ q
(mix)
α,CvM
and c
α,CvM
p
−→ q
(mix)
α,CvM
,
where
p
−→ stands for convergence in probability. Moreover, under the null or contiguous
alternative, we further have that q
(mix)
α,CvM
= q
(single)
α,CvM
.
Leveraging the previous result combined with Slutsky’s theorem, we next prove that the
asymptotic power of the oracle test, the permutation test and the asymptotic tests are identical
against any fixed and contiguous alternatives. This clearly highlights an advantage of the
permutation test as it is exact under H
0
and asymptotically as powerful as the oracle and
asymptotic tests under H
1
. More importantly, the permutation test does not require any prior
information on the underlying distributions.
3426 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
THEOREM 2.6 (Asymptotic equivalence of power). The oracle test and the permutation
test control the type I error under the null hypothesis as
P
0
NU
CvM
>c
α,CvM
α and P
0
(NU
CvM
>c
α,CvM
) α.
On the other hand, under the fixed or contiguous alternative hypotheses considered in The-
orem 2.3 and Theorem 2.4 with the additional assumption that m/N ϑ
X
= O(N
1/2
), we
have
P
1
(NU
CvM
>c
α,CvM
) = P
1
NU
CvM
>c
α,CvM
+o(1)
= P
1
NU
CvM
>q
(single)
α,CvM
+o(1) = P
1
NU
CvM
>q
(mix)
α,CvM
+o(1).
R
EMARK 2.2. It is worth pointing out that due to the symmetry of the kernel
h
CvM
,it
is enough to consider
N
m
permutations to obtain the critical value c
α,CvM
for the CvM test.
Nevertheless, except for small sample sizes, the exact permutation procedure is too expensive
to implement in practical applications. A common approach to alleviate this computational
issue is to use Monte Carlo sampling of random permutations and approximate the exact per-
mutation p-value. In more detail, note first that the permutation test function can be written
as 1(
p
CvM
α) where
p
CvM
is the permutation p-value given by
p
CvM
=
1
N!
S
N
1
U
CvM
(Z
) U
CvM
.
Let
(1)
,...,
(B)
be independent and uniformly distributed on S
N
. Then the Monte Carlo
version of the permutation p-value is computed by
p
(B)
CvM
=
1
B +1
B
i=1
1
U
CvM
(Z
(i)
) U
CvM
+1
.
It is well known that 1(
p
(B)
CvM
α) is also a valid level α test for any finite sample size and
p
CvM
p
(B)
CvM
p
−→ 0asB →∞(e.g., page 636 of Lehmann and Romano (2005)). Through-
out this paper, we also adopt this approach for our simulation studies.
3. Robustness. Recall that the energy distance and the CvM-distance can be represented
by integrals of the L
2
2
-type difference between two distribution functions. In view of this, the
main difference between the energy distance and the CvM-distance is in their weight function.
More precisely, the energy distance is defined with dt, which gives a uniform weight to the
whole real line. On the other hand, the CvM-distance is defined with dH
β
(t), which gives
the most weight on high-density regions. As a result, the test based on the CvM-distance is
more robust to extreme observations than the one based on the energy distance. It is also
important to note that the CvM-distance is well defined without any moment conditions,
whereas the energy distance is only well defined assuming a finite first moment. When the
moment condition is violated or there exist extreme observations, the test based on the energy
distance may perform poorly. The purpose of this section is to demonstrate this point both
theoretically and empirically by using contaminated distribution models.
3.1. Theoretical analysis. Suppose that we observe samples from an -contamination
model:
(3.1)
X P
X,N
:= (1 )Q
X
+G
N
and
Y P
Y,N
:= (1 )Q
Y
+G
N
,
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3427
where G
N
can change arbitrarily with N and (0, 1). Suppose that Q
X
and Q
Y
are dif-
ferent so that a given test has high power to distinguish between Q
X
and Q
Y
without con-
taminations. Then it is natural to expect that the power of the same test would not decrease
much for the contamination model when is close to zero. In other words, an ideal test would
maintain robust power against any choice of G
N
as long as Q
X
and Q
Y
are different and
is small. Unfortunately, this is not the case for the energy test. As we shall see, for any arbi-
trary small (but fixed) , there exists a contamination G
N
such that the energy test becomes
asymptotically powerless under mild moment conditions for Q
X
and Q
Y
. On the other hand,
the CvM test is uniformly powerful over any choice of G
N
as sample size tends to infinity.
R
EMARK 3.1. We mainly focus on statistical power to study robustness because one can
always employ the permutation procedure to control the type I error under H
0
: P
X,N
= P
Y,N
.
Let us consider the energy statistic based on a U -statistic:
(3.2)
U
Energy
=
2
mn
m
i=1
n
j=1
X
i
Y
j
−
1
(m)
2
m,=
i
1
,i
2
=1
X
i
1
X
i
2
1
(n)
2
n,=
j
1
,j
2
=1
Y
j
1
Y
j
2
.
Then the main result of this subsection is stated as follows.
T
HEOREM 3.1 (Robustness under contaminations). Suppose we observe samples X
m
and Y
n
from the contaminated model in (3.1) with an arbitrary small but fixed contamination
ratio . Assume that Q
X
and Q
Y
are fixed but Q
X
= Q
Y
while N changes. In addition,
assume that Q
X
and Q
Y
have their finite second moments. Consider the tests based on U
CvM
and U
Energy
given by
φ
CvM
:= 1(U
CvM
>c
α,CvM
) and φ
Energy
:= 1(U
Energy
>c
α,Eng
),
where c
α,CvM
and c
α,Eng
are α level permutation critical values of U
CvM
and U
Energy
, re-
spectively. Then for any (Q
X
,Q
Y
), there exists a certain G
N
such that the energy test be-
comes asymptotically powerless under the asymptotic regime in (2.1), whereas the CvM test
is asymptotically powerful uniformly over all possible G
N
. More precisely,
lim
m,n→∞
inf
G
N
E
1
[φ
Energy
]≤α and lim
m,n→∞
inf
G
N
E
1
[φ
CvM
]=1.(3.3)
R
EMARK 3.2. In Theorem 3.1, we made the assumption that Q
X
and Q
Y
are fixed
and have finite second moments. We also assumed the asymptotic regime in (2.1). These
assumptions are mainly for the energy test and are not necessary for the CvM test. In fact, the
same result can be derived for the CvM test given that there is a positive sequence b
m,n
→∞
increasing arbitrary slowly with m, n such that W
d
(Q
X
,Q
Y
) b
m,n
(1/
m + 1/
n) (see
Theorem 4.2).
3.2. Empirical analysis. To illustrate Theorem 3.1 with finite sample size, we carried out
simulation studies using the contamination model in (3.1). In our simulation, we take Q
X
and
Q
Y
to have multivariate normal distributions with different location parameters or different
scale parameters. In both examples, we take G
N
to have a multivariate normal distribution
given by
G
N
:= N
(0,...,0)
2
I
d
,
where σ controls the scale of the contamination G
N
.
3428 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
FIG.1. Empirical power of NN, FR, Energy, BG, Hotelling, CQ, LRT, LC and CvM tests under the contamination
models with = 0.05. See Examples 3.1 and 3.2 for details.
EXAMPLE 3.1 (Location difference). For the location alternative, we compare two mul-
tivariate normal distributions, where the means are different but the covariance matrices are
identical. Specifically, we set
Q
X
= N
(0.5,...,0.5)
,I
d
, and Q
Y
= N
(0.5,...,0.5)
,I
d
,
with = 0.05. We then change σ = 1, 40, 80, 120, 160, 200 and 240 to investigate the ro-
bustness of the tests against contamination with large scale parameter σ .
E
XAMPLE 3.2 (Scale difference). Similar to the location alternative, we again choose
multivariate normal distributions which differ in their scale but not in their location parame-
ters. In detail, we have
Q
X
= N
(0,...,0)
, 0.1
2
×I
d
and Q
Y
= N
(0,...,0)
,I
d
,
with = 0.05. Again, we change σ = 1, 40, 80, 120, 160, 200 and 240 to assess the effect of
contamination with large scale parameter σ .
In addition to the energy test, we further considered three nonparametric tests in our sim-
ulation studies, namely, the k-nearest neighbor test by Schilling (1986) with k =3, the MST
test proposed by Friedman and Rafsky (1979) and the interpoint distance test by Biswas and
Ghosh (2014). For future reference, we refer to them as the NN test, the FR test and the BG
test, respectively. We also added the high-dimensional mean test by Chen and Qin (2010)
and Hotelling’s T
2
test (e.g., page 188 of Anderson (2003)) for the location alternative and
the high-dimensional covariance test by Li and Chen (2012) and the conventional likelihood
ratio test (e.g., page 412 of Anderson (2003)) for the scale alternative. We refer to them as
the CQ test, Hotelling’s test, the LC test and the LRT test, respectively.
Experiments were run 1000 times to estimate the power of different tests with m = n = 40
and d = 10 at significance level α = 0.05. The p-value of each test was computed using 500
permutations as in Remark 2.2. As can be seen from Figure 1, the power of the CvM test
is consistently robust to the value of σ , which supports our theoretical result. The power of
the energy test, on the other hand, drops down significantly as σ increases for both location
and scale differences. As explained in the proof of Theorem 3.1, this poor performance was
attributed to the fact that the energy statistic is very much dominated by extreme observations
from G
N
when σ is large. The graph-based tests, that is, the NN and FR tests, also show
a robust power performance against the contamination models. Intuitively speaking, they
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3429
perform robustly under the given scenarios as their test statistics, which count the number of
edges in a graph, do not vary a lot even in the presence of outliers; but as far as we know,
there is no theoretical support for this result in the current literature. Moreover, these graph-
based tests typically exhibit poorer consistency rates (Bhattacharya (2018)) compared to the
proposed CvM test. The other four tests (Hotelling’s test, the LRT test, the LC test and the
CQ test) perform poorly for large σ , which may be explained similarly as to why the energy
test has low power in these examples.
4. Minimax optimality. Although our choice of the U -statistic was a natural one to
estimate W
2
d
, it remains unclear whether one can come up with a better test statistic for
testing whether H
0
: W
d
= 0orH
0
: W
d
> 0. One might also wonder whether there exists
a testing procedure that leads to significantly higher power than the permutation test while
controlling the type I error. In this section, we shall show that the answer is negative from
a minimax point of view. In particular, we prove that the permutation test based on U
CvM
is
minimax rate optimal against a class of alternatives associated with the CvM-distance.
To formulate the minimax problem, let us define the set of two multivariate distributions
which are at least far apart in terms of the CvM-distance, that is,
F() :=
(P
X
,P
Y
) : W
d
(P
X
,P
Y
)
.
For a given significance level α (0, 1),letT
m,n
) be the set of measurable functions φ :
{
X
m
, Y
n
}→{0, 1} such that
T
m,n
) =
φ :P
0
=1) α
.
We then define the minimax type II error as follows:
1 β
m,n
() = inf
φT
m,n
)
sup
P
X
,P
Y
F ()
P
1
=0).(4.1)
Our primary interest is in finding the minimax separation
m,n
satisfying
m,n
= inf
:1 β
m,n
() ζ
,
for some 0 <1 α. We start by establishing a lower bound for the minimax separation
m,n
based on Neyman–Pearson lemma.
T
HEOREM 4.1 (Lower bound). Fo r 0 <1 α, there exists some constant b =
b(α, ζ) independent of the dimension such that
m,n
= b(m
1/2
+ n
1/2
) and the minimax
type II error is lower bounded by ζ , that is,
1 β
m,n
(
m,n
) ζ.
The above result shows that if
m,n
is of lower order than m
1/2
+n
1/2
,thennotesthas
the type II error that is uniformly smaller than the nominal level α. We now prove that this
lower bound is tight by establishing a matching upper bound. In particular, the upper bound
is obtained by the permutation test based on U
CvM
, highlighting that the proposed approach
is minimax rate optimal.
T
HEOREM 4.2 (Upper bound). Recall the CvM test φ
CvM
given in Theorem 3.1. Fo r a
sufficiently large c>0, let
m,n
be the radius of interest defined by
m,n
:= c
1
m
+
1
n
.(4.2)
3430 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
Then there exists ζ (0, 1 α) such that the type II error of φ
CvM
is uniformly bounded by
ζ , that is,
sup
P
X
,P
Y
F (
m,n
)
P
1
CvM
= 0)<ζ.
R
EMARK 4.1. We would like to emphasize that no assumption has been made in The-
orem 4.2 regarding the ratio of the sample sizes. This implies that the proposed test can be
consistent against general alternatives even when the two sample sizes are highly unbalanced
as m/n 0orm/n →∞.
As a straightforward consequence of Theorem 3.1, we also show that the energy test, which
is our main competitor, is not minimax rate optimal in our context.
P
ROPOSITION 4.1 (Nonoptimality of the energy test). Recall the energy test φ
Energy
given in Theorem 3.1. Then there exists a pair of distributions that belongs to F (
m,n
) such
that the energy test becomes asymptotically powerless, that is,
lim
m,n→∞
inf
P
X
,P
Y
F (
m,n
)
P
1
Energy
= 1) α.
In the next section, we turn our attention to the asymptotic regime where the sample size
is fixed and the dimension tends to infinity and study the limiting behavior of the CvM test.
5. High dimension, low sample size analysis. The high dimension, low sample size
(HDLSS) regime has received increasing attention in recent years and has been frequently
employed to give statistical insights into high-dimensional two-sample testing (e.g., Biswas
and Ghosh (2014), Biswas, Mukhopadhyay and Ghosh (2014), Chakraborty and Chaudhuri
(2017), Mondal, Biswas and Ghosh (2015)). Focusing on this HDLSS regime, the goal of
this section is twofold: First, we provide sufficient conditions under which the proposed test
is consistent in HDLSS situations (Section 5.1). Second, we show that U
CvM
has the same
asymptotic behavior as the high-dimensional mean test statistics proposed by Chen and Qin
(2010)andChakraborty and Chaudhuri (2017) under certain location models (Section 5.2).
Along with these mean test statistics, we further establish the equivalence among U
CvM
,the
energy statistic and the MMD statistic with the Gaussian kernel. The latter connection was
motivated by Ramdas et al. (2015) who showed that the energy statistic, the MMD statistic
and the mean test statistic by Chen and Qin (2010) are asymptotically equivalent in different
scenarios.
5.1. HDLSS consistency. Let us denote E(X) = μ
X
, E(Y ) = μ
Y
, V(X) =
X
and
V(Y ) =
Y
where
X
and
Y
are positive definite matrices. Before presenting the main
results, we state the two assumptions.
(A1). V
Z
1
Z
2
2
= O(d), and V

Z
1
Z
3
Z
2
Z
3

= O(d),
where Z
1
, Z
2
, Z
3
are independent and each Z
i
follows either P
X
or P
Y
.
(A2). d
1
tr(
X
) σ
2
X
,d
1
tr(
Y
) σ
2
Y
,d
1
μ
X
μ
Y
2
2
δ
2
XY
,
where 0 <
σ
2
X
, σ
2
Y
< and 0 δ
2
XY
< .
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3431
Assumption (A1) implies that component variables are weakly dependent. Under the
distributional assumptions (including multivariate normal distributions) made in Bai and
Saranadasa (1996)andChen and Qin (2010), Assumption (A1) is satisfied when
(5.1)
X
μ
Y
)
(
X
+
Y
)(μ
X
μ
Y
) = O(d) and
tr
(
X
+
Y
)
2
= O(d).
The details of this derivation can be found in Appendix D.1. Assumption (A2) is common
in the HDLSS literature (e.g., Hall, Marron and Neeman (2005)) and facilitates the analysis.
Under these two conditions, the following theorem establishes the HDLSS consistency of the
proposed test where we assume that the nominal level satisfies α>1/{(m + n)!/(m!n!)} for
m = n and α>2/{(m + n)!/(m!n!)} for m = n.
T
HEOREM 5.1 (HDLSS consistency). Suppose (A1) and (A2) hold. Assume that σ
2
X
=
σ
2
Y
or δ
2
XY
> 0. Then the permutation test based on U
CvM
is consistent under the HDLSS
regime, that is, lim
d→∞
E
1
[φ
CvM
]=1.
5.2. HDLSS asymptotic equivalence of CvM-statistic and others.Next,wefocuson
mean difference alternatives with equal covariance matrices. There are many types of high-
dimensional mean inference procedures in the literature (see Hu and Bai (2016), for a recent
review). For example, Chen and Qin (2010) suggest a test statistic based on an unbiased
estimator of μ
X
μ
Y
2
. Specifically, their test statistic is given by
U
CQ
=
1
(m)
2
(n)
2
m,=
i
1
,i
2
=1
n,=
j
1
,j
2
=1
(X
i
1
Y
j
1
)
(X
i
2
Y
j
2
).
More recently, Chakraborty and Chaudhuri (2017) define a test statistic based on spatial ranks
as
U
WMW
=
1
(m)
2
(n)
2
m,=
i
1
,i
2
=1
n,=
j
1
,j
2
=1
(X
i
1
Y
j
1
)
X
i
1
Y
j
1
(X
i
2
Y
j
2
)
X
i
2
Y
j
2
.
They proved that U
CQ
and U
WMW
are asymptotically equivalent under a certain HDLSS
setting. Independently, the equivalence between U
CQ
, U
Energy
and the MMD statistic with
the Gaussian kernel was established by Ramdas et al. (2015) under different settings. Let us
denote the MMD statistic with the Gaussian kernel by
U
MMD
=
1
(m)
2
m,=
i
1
,i
2
=1
exp
1
2ς
2
d
X
i
1
X
i
2
2
+
1
(n)
2
n,=
j
1
,j
2
=1
exp
1
2ς
2
d
Y
j
1
Y
j
2
2
2
mn
m
i=1
n
j=1
exp
1
2ς
2
d
X
i
Y
j
2
,
where ς
2
d
is the bandwidth parameter. Here, we combine and further extend these results
by presenting sufficient conditions under which U
CvM
, U
Energy
, U
MMD
, U
CQ
and U
WMW
are
asymptotically equivalent. To establish the result, we need two more assumptions:
(A3). V

Z
1
Z
2
Z
3
Z
4

= O(d) where
Z
1
, Z
2
, Z
3
, Z
4
are independent and each Z
i
follows either P
X
or P
Y
.
(A4).
X
=
Y
and μ
X
μ
Y
2
= O(
d).
3432 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
Assumption (A3) is required for studying U
CQ
and U
WMW
. Like Assumption (A1),As-
sumption (A3) is also satisfied under condition (5.1). Notice that U
CQ
and U
WMW
are only
sensitive to location parameters whereas U
CvM
, U
Energy
and U
MMD
are sensitive to both lo-
cation and scale parameters. This suggests that the equal covariance assumption in (A4) is
crucial for our result and cannot be dropped. The condition μ
X
μ
Y
2
= O(
d) is also
important for our analysis and was also considered in Chakraborty and Chaudhuri (2017).
Under the given assumptions, we make repeated use of Taylor expansions to establish the
equivalence among the test statistics stated as follows.
T
HEOREM 5.2 (HDLSS equivalence). Suppose (A1), (A2), (A3) and (A4) hold.
Let be an arbitrary permutation of {1,...,N} and
σ
2
d
= d
1
tr(
X
). We denote
by U
CvM
, U
Energy
, U
MMD
, U
CQ
and U
WMW
, the CvM, Energy, MMD, CQ and WMW
test statistics, respectively, calculated based on
X
m
={Z
(1)
,...,Z
(m)
} and Y
n
=
{Z
(m+1)
,...,Z
(N)
}. Assume that the bandwidth parameter of the Gaussian kernel sat-
isfies ς
2
d
d. Then under the HDLSS asymptotics, we have that
(5.2)
dU
CvM
=
1
2π
3dσ
2
d
U
CQ
+O
P
d
1/2
,
U
Energy
=
1
2dσ
d
U
CQ
+O
P
d
1/2
,
dU
WMW
=
1
dσ
2
d
U
CQ
+O
P
d
1/2
,
dU
MMD
=
d
ς
2
d
e
dσ
2
d
2
d
U
CQ
+O
P
d
1/2
.
Note that the asymptotic equivalence established in (5.2) holds for any permutation. Lever-
aging this result, we show that the permutation critical values of the test statistics are asymp-
totically the same as well.
C
OROLLARY 5.1 (Permutation critical values). Consider the same assumptions made in
Theorem 5.2. Let c
α,CvM
, c
α,Eng
, c
α,MMD
, c
α,CQ
and c
α,WMW
be the 1 α quantile of the per-
mutation distribution of 2π
3dσ
2
d
U
CvM
,
2σ
d
U
Energy
, ς
2
d
e
dσ
2
d
2
d
U
MMD
/
d, U
CQ
/
d
and
dσ
2
d
U
WMW
, respectively. Then
c
α,CvM
= c
α,Eng
+O
P
d
1/2
= c
α,MMD
+O
P
d
1/2
= c
α,CQ
+O
P
d
1/2
= c
α,WMW
+O
P
d
1/2
.
From the previous results, we expect that the considered permutation tests have compara-
ble power in the limit as further illustrated by our simulation results in Section 8.Wealso
refer the reader to Appendix D.5 where we present an explicit expression for the limiting
power function of the asymptotic tests with extra assumptions. We would like to empha-
size, however, that when the moment assumption is violated, the power of these tests can
be entirely different. For instance, our simulation results in Section 8 demonstrate that the
CQ, energy and MMD tests perform poorly when X and Y have Cauchy distributions with
different location parameters. In contrast, the CvM and WMW tests maintain robust power
against the same Cauchy alternative, which highlights a benefit of the current approach in
high dimensions.
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3433
6. Connection to the generalized energy distance and MMD. Recall that the energy
distance is defined with the Euclidean distance under the finite first moment condition. By
considering a semimetric space (Z) of negative type, Sejdinovic et al. (2013) generalized
the energy distance by
E
2
ρ
= 2E
ρ(X
1
,Y
1
)
E
ρ(X
1
,X
2
)
E
ρ(Y
1
,Y
2
)
.
They further established the equivalence between the generalized energy distance and the
MMD with a kernel induced by ρ(·, ·). Given a distance-induced kernel k(·, ·), the squared
MMD is given by
MMD
2
k
= E
k(X
1
,X
2
)
+E
k(Y
1
,Y
2
)
2E
k(X
1
,Y
1
)
.
In this section, we show that the multivariate CvM-distance is a member of the generalized
energy distance by the use of the angular distance, and thus also a member of the MMD. Let
M
X
and M
Y
be the support of X and Y , respectively, and let M = M
X
M
Y
R
d
.Then
we define the angular distance as follows.
D
EFINITION 6.1 (Angular distance). Let Z
be a random vector having mixture distribu-
tion (1/2)P
X
+(1/2)P
Y
.Forz, z
M, denote the scaled angle between z Z
and z
Z
by
ρ
Angle
z, z
;Z
=
1
π
Ang
z Z
,z
Z
.
The angular distance is defined as the expected value of the scaled angle:
ρ
Angle
z, z
= E
ρ
Angle
z, z
;Z

.(6.1)
As shown in Appendix D.6, ρ
Angle
is a metric of negative type defined on M. With this key
property and the identity given in the next proposition, we may conclude that the multivari-
ate CvM-distance is a special case of the generalized energy distance based on the angular
distance.
P
ROPOSITION 6.1 (Another view of the CvM-distance). Let us consider the angular
distance defined in (6.1). Then
2W
2
d
= 2E
ρ
Angle
(X
1
,Y
1
)
E
ρ
Angle
(X
1
,X
2
)
E
ρ
Angle
(Y
1
,Y
2
)
.
R
EMARK 6.1. The angular distance can be generalized by taking the expectation with
respect to a different measure. For instance, when the expectation is taken with respect to
Lebesgue measure, the generalized angular distance is proportional to the Euclidean distance
(see Appendix D.7). The main difference between the Euclidean distance and the proposed
angular distance is that the latter takes into account information from the underlying distri-
bution and is less sensitive to outliers. In this respect, the introduced angular distance can be
viewed as a robust alternative for the Euclidean distance.
R
EMARK 6.2. As one of the reviewers pointed out, there might be several ways to
enhance the power of the proposed test by modifying the multivariate CvM-distance. For
instance, by the characteristic property of W
2
d
, it can be seen that H
0
holds if and only
if
T
xy
= T
xx
and T
xy
= T
yy
where T
xy
= E[ρ
Angle
(X
1
,Y
1
)], T
xx
= E[ρ
Angle
(X
1
,X
2
)] and
T
yy
= E[ρ
Angle
(Y
1
,Y
2
)]. Motivated by this observation, another test statistic can be intro-
duced based on an estimate of (
T
xy
T
xx
)
2
+(T
xy
T
yy
)
2
(and other variants are possible,
see Appendix D.4). As demonstrated in Appendix E, the test based on this new statistic tends
to be more sensitive to scale differences than the CvM test.
3434 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
7. Other multivariate extensions via projection averaging. The projection-averaging
approach used for the multivariate CvM-statistic is general and can be applied to many other
univariate robust statistics. In this section and also Appendix D.8, we illustrate the utility
of the projection-averaging approach by considering several examples including Kendall’s
tau, Spearman’s rho and the sign covariance (Bergsma and Dassios (2014)). Let us begin
with one-sample and two-sample robust statistics. Given a pair of random variables (X, Y ),
denote the difference between two random variables by Z = X Y . The univariate sign test
statistic is an estimate of
T
sign
:= P(Z > 0) 1/2 and it is used to test whether
H
0
: P(Z > 0) = 1/2versusH
1
: P(Z > 0) = 1/2.
The projection-averaging technique extends
T
sign
to a multivariate case as follows.
P
ROPOSITION 7.1 (One-sample sign test statistic). For i .i.d. random vectors Z
1
, Z
2
from
a multivariate distribution P
Z
where Z R
d
, the projection-averaging approach generalizes
T
sign
as
S
d1
P
β
Z
1
> 0
1
2
2
(β) =
1
4
1
2π
E
Ang
(Z
1
,Z
2
)
.(7.1)
Given univariate two samples
X
m
={X
1
,...,X
m
} and Y
n
={Y
1
,...,Y
n
}, the Wilcoxon–
Mann–Whitney test is designed for testing whether
H
0
: P(X > Y ) = 1/2versusH
1
: P(X > Y ) = 1/2,
basedonanestimateof
T
WMW
:= P(X > Y ) 1/2. The next proposition extends T
WMW
to
a multivariate case via projection averaging.
P
ROPOSITION 7.2 (Two-sample Wilcoxon–Mann–Whitney test statistic). Let X
1
,
X
2
i.i.d.
P
X
and, independently, Y
1
,Y
2
i.i.d.
P
Y
where X
1
,Y
1
R
d
. The projection-averaging
approach generalizes
T
WMW
as
S
d1
P
β
X
1
Y
1
1
2
2
(β) =
1
4
1
2π
E
Ang
(X
1
Y
1
,X
2
Y
2
)
.(7.2)
R
EMARK 7.1. The first-order Taylor approximation of the inverse cosine function shows
that the representations given on the right-hand side of (7.1)and(7.2) are related to the spa-
tial sign-statistics introduced by Wang, Peng and Li (2015)andChakraborty and Chaud-
huri (2017), respectively. In fact, when U -statistics are used to estimate (7.1)and(7.2), the
projection-averaging statistics and the spatial sign-statistics are asymptotically equivalent un-
der some regularity conditions (see Appendix D.3 for details).
The same technique can be further applied to some robust statistics for independence test-
ing. To test for independence between two random variables, Kendall’s tau statistic is given
as an estimate of τ := 4P(X
1
<X
2
,Y
1
<Y
2
) 1. We present a multivariate extension of τ
as follows.
T
HEOREM 7.1 (Kendall’s tau). Fo r i .i.d. pairs of random vectors (X
1
,Y
1
),...,(X
4
,Y
4
)
from a joint distribution P
XY
where X R
p
and Y R
q
, the multivariate extension of τ via
projection averaging is given by
S
p1
S
q1
4P
α
(X
1
X
2
)<0
(Y
1
Y
2
)<0
1
2
(α)(β)
= E

2
2
π
Ang(X
1
X
2
,X
3
X
4
)
·
2
2
π
Ang(Y
1
Y
2
,Y
3
Y
4
)

1.
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3435
Recently, Bergsma and Dassios (2014) introduced a modification of Kendall’s tau, which
is zero if and only if random variables are independent under some mild conditions. Let us
denote the univariate Bergsma–Dassios sign covariance by
τ
= E
a
sign
(X
1
,X
2
,X
3
,X
4
) · a
sign
(Y
1
,Y
2
,Y
3
,Y
4
)
,(7.3)
with a
sign
(z
1
,z
2
,z
3
,z
4
) = sign(|z
1
z
2
|+|z
3
z
4
|−|z
1
z
3
|−|z
2
z
4
|). Motivated by
the projection-averaging approach, we propose the multivariate τ
as follows.
D
EFINITION 7.1 (Multivariate τ
). Suppose (X
1
,Y
1
),...,(X
4
,Y
4
) are i.i.d. random
vectors from a joint distribution P
XY
where X R
p
and Y R
q
. We define the multivariate
τ
by
τ
p,q
=
S
p1
S
q1
E
a
sign
α
X
1
X
2
X
3
X
4
×a
sign
β
Y
1
Y
2
Y
3
Y
4

dλ(α) dλ(β).
Since the kernel of τ
is sign-invariant, that is, a
sign
(z
1
,z
2
,z
3
,z
4
) = a
sign
(z
1
, z
2
, z
3
,
z
4
), it is easy to see that τ
p,q
becomes the univariate τ
when p = q = 1. Also note that
since X and Y are independent if and only if α
X and β
Y are independent for all α S
p1
and β S
q1
, the characteristic property of τ
p,q
follows by that of the univariate τ
.
Next, we present a closed-form expression for τ
p,q
. For nonzero U
1
,U
2
,U
3
R
d
,letus
define g
d
(U
1
,U
2
,U
3
) and h
d
(Z
1
,Z
2
,Z
3
,Z
4
) by
g
d
(U
1
,U
2
,U
3
) =
1
2
1
4π
Ang
(U
1
,U
2
) + Ang(U
1
,U
3
) + Ang(U
2
,U
3
)
and
h
d
(Z
1
,Z
2
,Z
3
,Z
4
)
= g
d
(Z
1
Z
2
,Z
2
Z
3
,Z
3
Z
4
) + g
d
(Z
2
Z
1
,Z
1
Z
3
,Z
3
Z
4
)
+g
d
(Z
1
Z
2
,Z
2
Z
4
,Z
4
Z
3
) + g
d
(Z
2
Z
1
,Z
1
Z
4
,Z
4
Z
3
).
We note that, in contrast to other applications, Lemma 2.2 is not enough to have an expres-
sion for τ
p,q
without involving integrations over the unit sphere. To this end, we generalize
Lemma 2.2 with three indicator functions (see Lemma B.8) and present an alternative expres-
sion for τ
p,q
as follows.
T
HEOREM 7.2 (Closed-form expression for τ
p,q
). For i.i.d. random vectors (X
1
,Y
1
),
...,(X
4
,Y
4
) from a joint distribution P
XY
where X R
p
and Y R
q
, τ
p,q
can be written as
τ
p,q
= E
h
p
(X
1
,X
2
,X
3
,X
4
) · h
q
(Y
1
,Y
2
,Y
3
,Y
4
)
+E
h
p
(X
1
,X
2
,X
3
,X
4
) · h
q
(Y
3
,Y
4
,Y
1
,Y
2
)
2E
h
p
(X
1
,X
2
,X
3
,X
4
) · h
q
(Y
1
,Y
3
,Y
2
,Y
4
)
.
Theorem 7.2 leads to a straightforward empirical estimate of τ
p,q
based on a U -statistic.
This is also true for the other multivariate generalizations introduced in this section and the
Supplementary Material (Appendix D.8). Using these estimates, some theoretical and empiri-
cal properties of the proposed measures can be further investigated. These topics are reserved
for future work.
3436 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
8. Simulations. In this section, we report numerical results to support the argument in
Section 5 as well as to compare the performance of the CvM test with other competing non-
parametric tests against heavy-tailed alternatives. Along with the energy, MMD, NN, FR and
BG tests described before, we consider the cross-match test (Rosenbaum (2005)), the multi-
variate run test (Biswas, Mukhopadhyay and Ghosh (2014)), the modified k-NN test (Mondal,
Biswas and Ghosh (2015)) and the ball divergence test (Pan et al. (2018)) for comparison. We
refer to them as the CM test, run test, MBG test and ball test, respectively. In our simulations,
we used the Gaussian kernel with the median heuristic (Gretton et al. (2012)) for the MMD
test and we set the number of nearest neighbors as k = 3 for both NN test and MBG test.
Since finding the shortest Hamiltonian path for the run test is NP-complete, we employed
Kruskal’s algorithm (Kruskal (1956)) as suggested by Biswas, Mukhopadhyay and Ghosh
(2014).
Throughout our experiments, the significance level was set at 0.05 and the permutation
procedure was used to determine the p-value of each test with 200 permutations as in Re-
mark 2.2. The simulations were repeated 500 times to approximate the power of different
tests. We set the sample size and the dimension by m, n = 20 and d = 200 for the balanced
cases and by m = 35, n = 5andd = 200 for the imbalanced cases.
First, we consider several examples where the powers of the five tests (CvM, energy,
MMD, CQ and WMW tests) in Section 5 are approximately equivalent to each other. Specif-
ically, we use multivariate normal distributions with different means
μ
(0)
= (0,...,0)
(1)
= (0.15,...,0.15)
and
μ
(2)
=
0.045( 1,...,1

d/2elements
, 0,...,0

d/2elements
)
and covariance matrices:
1. Identity matrix (denoted by I )whereσ
i,i
= 1andσ
i,j
= 0fori = j .
2. Banded matrix (denoted by
Band
)whereσ
i,i
= 1, σ
i,j
= 0.6for|i j |=1, σ
i,j
= 0.3
for |i j |=2andσ
i,j
= 0otherwise.
3. Autocorrelation matrix (denoted by
Auto
)whereσ
i,i
= 1andσ
i,j
= 0.2
|ij|
when
i = j .
4. Block diagonal matrix (denoted by
Block
)wherethe5×5 main diagonal blocks A are
defined by a
i,i
= 1anda
i,j
= 0.2wheni = j , and the off-diagonal blocks are zeros.
Then we generate random samples from X N(μ
(0)
,) and either Y N(μ
(1)
,) or
Y N(μ
(2)
,). The results are summarized in Table 1. As can be seen from the table,
the empirical powers of the considered tests are very close under the given setting, which
supports our theoretical results in Section 5. We also observe that the other nonparametric
tests, not considered in Section 5, are significantly less powerful than the proposed test in all
normal location alternatives.
In our second experiment, we consider several examples where the moment conditions
are not satisfied. We focus on random samples generated from multivariate Cauchy distri-
butions. Let Cauchy , s) refer to the univariate Cauchy distribution where γ , s are the lo-
cation parameter and the scale parameter, respectively. Let X = (X
(1)
,...,X
(d)
) and Y =
(Y
(1)
,...,Y
(d)
) be random vectors where X
(i)
i.i.d.
Cauchy(0, 1) and Y
(i)
i.i.d.
Cauchy, s)
for i = 1,...,d. We first consider location differences where γ is not zero but the scale pa-
rameters are identical, that is, s = 1. Similarly, we consider scale differences where the scale
parameter s changes, but the location parameters are identical, that is, γ = 0.
From the results presented in Table 2 and Table 3, it is seen that, unlike the multivari-
ate normal cases, there are significant differences between power performance among CvM,
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3437
TABLE 1
Empirical power of the considered tests against the normal location models at α = 0.05
I
d
Band
Block
Auto
m = 20, n = 20 μ
(1)
μ
(2)
μ
(1)
μ
(2)
μ
(1)
μ
(2)
μ
(1)
μ
(2)
CvM 0.662 0.646 0.418 0.406 0.572 0.584 0.452 0.442
Energy 0.656 0.650 0.420 0.408 0.576 0.584 0.452 0.444
MMD 0.658 0.638 0.412 0.398 0.568 0.570 0.458 0.444
CQ 0.656 0.650 0.416 0.412 0.578 0.580 0.454 0.448
WMW 0.668 0.646 0.420 0.402 0.568 0.580 0.458 0.444
NN 0.288 0.288 0.164 0.154 0.242 0.238 0.176 0.174
FR 0.168 0.170 0.090 0.084 0.158 0.116 0.112 0.088
MBG 0.050 0.050 0.050 0.052 0.048 0.044 0.060 0.046
Ball 0.240 0.254 0.186 0.198 0.262 0.250 0.216 0.226
CM 0.042 0.054 0.028 0.040 0.052 0.050 0.038 0.034
BG 0.070 0.060 0.074 0.074 0.074 0.078 0.084 0.078
Run 0.160 0.153 0.101 0.105 0.146 0.128 0.110 0.102
energy, MMD, CQ and WMW tests. In particular, the tests based on the energy, MMD and
CQ statistics have relatively low power against the heavy-tail location alternatives, whereas
the tests based on the CvM and WMW statistics show better performance than the others.
Turning to the scale problems, it can be seen that the CQ and WMW tests are not sensitive
to detect scale differences, which makes sense because they are specifically designed for lo-
cation problems. On the other hand, the CvM, energy and MMD tests perform reasonably
well in these alternatives. Among the omnibus nonparametric tests, the MMD, energy and
ball tests have competitive power against the scale differences, but not against the location
differences in general. The MBG test is only powerful against the scale differences where the
sample sizes are balanced. The CM and run tests are uniformly outperformed by the CvM
test under all scenarios. The NN and FR tests perform strongly against the location alterna-
tives especially for the balanced case, but not against the scale alternatives. When the sample
TABLE 2
Empirical power of the considered tests against multivariate Cauchy distributions with m = n = 20 at α = 0.05
where γ , s represent the location and scale parameter, respectively. The three highest power estimates in each
column are highlighted in boldface
Location Scale
m = 20, n = 20 γ = 2 γ = 3 γ = 4 γ = 5 s = 2 s = 3 s = 4 s = 5
CvM 0.124 0.252 0.596 0.842 0.560 0.926 0.988 1.000
Energy 0.060 0.066 0.102 0.134 0.316 0.602 0.766 0.866
MMD 0.056 0.064 0.110 0.162 0.448 0.772 0.890 0.970
CQ 0.138 0.268 0.360 0.456 0.046 0.070 0.042 0.068
WMW 0.324 0.698 0.912 0.988 0.052 0.064 0.062 0.056
NN 0.288 0.662 0.884 0.976 0.214 0.194 0.256 0.224
FR 0.178 0.462 0.706 0.888 0.028 0.034 0.048 0.036
MBG 0.060 0.044 0.050 0.074 0.564 0.904 0.964 0.992
Ball 0.064 0.064 0.076 0.098 0.606 0.936 0.994 1.000
CM 0.030 0.078 0.128 0.226 0.056 0.170 0.334 0.490
BG 0.048 0.038 0.048 0.040 0.238 0.394 0.560 0.632
Run 0.059 0.129 0.274 0.422 0.220 0.506 0.767 0.864
3438 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
T
ABLE 3
Empirical power of the considered tests against multivariate Cauchy distributions with m = 35 and n = 5 at
α = 0.05 where γ , s represent the location and scale parameter, respectively. The three highest power estimates
in each column are highlighted in boldface
Location Scale
m = 35, n = 5 γ = 5 γ = 6 γ = 7 γ = 8 s = 3 s = 4 s = 5 s = 6
CvM 0.340 0.498 0.652 0.758 0.570 0.806 0.928 0.952
Energy 0.110 0.146 0.212 0.262 0.436 0.632 0.794 0.858
MMD 0.108 0.148 0.192 0.240 0.552 0.808 0.926 0.968
CQ 0.284 0.380 0.454 0.544 0.178 0.210 0.262 0.290
WMW 0.796 0.890 0.942 0.960 0.110 0.126 0.134 0.148
NN 0.144 0.294 0.376 0.558 0.118 0.150 0.154 0.182
FR 0.226 0.360 0.464 0.588 0.078 0.092 0.104 0.112
MBG 0.010 0.000 0.008 0.000 0.092 0.130 0.176 0.214
Ball 0.072 0.088 0.098 0.122 0.238 0.406 0.594 0.762
CM 0.082 0.176 0.190 0.262 0.030 0.080 0.092 0.126
BG 0.058 0.052 0.058 0.052 0.320 0.386 0.506 0.514
Run 0.088 0.150 0.198 0.228 0.106 0.174 0.248 0.326
sizes are unbalanced, the performance of the NN and FR tests are degraded a little bit, which
can be explained by Chen, Dou and Qiao (2013)andChen, Chen and Su (2018). The CvM
test, on the other hand, performs consistently well against the heavy-tail location and scale
alternatives and its performance appears immune to the sample proportion.
In summary, the proposed test has almost identical power as the high-dimensional mean
tests against the light-tail location alternatives, whereas it outperforms many popular non-
parametric competitors under the heavy-tail location and scale alternatives. More simulation
results in both high and low dimensions can be found in Appendix E of the Supplementary
Material.
9. Concluding remarks. In this work, we extended the univariate Cramér–von Mises
statistic for two-sample testing to the multivariate case using projection averaging and
demonstrated its robustness, minimax rate optimality and high-dimensional power proper-
ties. We applied the same projection technique to other robust statistics and presented their
multivariate extensions. We also introduced the angular distance that is closely connected to
the Euclidean distance (Remark 6.1) but is more robust to outliers by incorporating infor-
mation from the underlying distribution. Given that the use of distances is of fundamental
importance in many statistical applications (including clustering, classification and regres-
sion), we hope that this work will stimulate further study of the angular distance applied to
other statistical problems serving as a robust alternative for the Euclidean distance.
Acknowledgments. The authors would like to thank the Associate Editor and two refer-
ees for their valuable comments which helped to improve the manuscript.
SUPPLEMENTARY MATERIAL
Supplement to “Robust multivariate nonparametric tests via projection averaging”
(DOI: 10.1214/19-AOS1936SUPP; .pdf). This supplemental file includes the technical proofs
omitted in the main text, and some additional results.
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3439
REFERENCES
ANDERSON, T. W. (1962). On the distribution of the two-sample Cramér–von Mises criterion. Ann. Math. Stat.
33 1148–1159. MR0145607 https://doi.org/10.1214/aoms/1177704477
A
NDERSON, T. W. (2003). An Introduction to Multivariate Statistical Analysis,3rded.Wiley Series in Probability
and Statistics. Wiley-Interscience, Hoboken, NJ. MR1990662
A
NDERSON,N.H.,HALL,P.andTITTERINGTON, D. M. (1994). Two-sample test statistics for measuring
discrepancies between two multivariate probability density functions using kernel-based density estimates.
J. Multivariate Anal. 50 41–54. MR1292607 https://doi.org/10.1006/jmva.1994.1033
B
AI,Z.andSARANADASA, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist.
Sinica 6 311–329. MR1399305
B
ARINGHAUS,L.andFRANZ, C. (2004). On a new multivariate two-sample test. J. Multivariate Anal. 88 190–
206. MR2021870 https://doi.org/10.1016/S0047-259X(03)00079-4
B
ARINGHAUS,L.andHENZE, N. (2017). Cramér–von Mises distance: Probabilistic interpretation, confi-
dence intervals, and neighbourhood-of-model validation. J. Nonparametr. Stat. 29 167–188. MR3635009
https://doi.org/10.1080/10485252.2017.1285029
B
ERA,A.K.,GHOSH,A.andXIAO, Z. (2013). A smooth test for the equality of distributions. Econometric
Theory 29 419–446. MR3042761 https://doi.org/10.1017/S0266466612000370
B
ERGSMA,W.andDASSIOS, A. (2014). A consistent test of independence based on a sign covariance related to
Kendall’s tau. Bernoulli 20 1006–1028. MR3178526 https://doi.org/10.3150/13-BEJ514
B
HAT, B. V. (1995). Theory of U-statistics and its applications. Ph.D. thesis, Karnatak Univ.
B
HATTACHARYA, B. B. (2018). Two-sample tests based on geometric graphs: Asymptotic distribution and de-
tection thresholds. Preprint. Available at arXiv:1512.00384v3.
B
HATTACHARYA, B. B. (2019). A general asymptotic framework for distribution-free graph-based two-sample
tests. J. R. Stat. Soc. Ser. B. Stat. Methodol. 81 575–602. MR3961499
B
ISWAS,M.andGHOSH, A. K. (2014). A nonparametric two-sample test applicable to high dimensional data.
J. Multivariate Anal. 123 160–171. MR3130427 https://doi.org/10.1016/j.jmva.2013.09.004
B
ISWAS,M.,MUKHOPADHYAY,M.andGHOSH, A. K. (2014). A distribution-free two-sample run test applica-
ble to high-dimensional data. Biometrika 101 913–926. MR3286925 https://doi.org/10.1093/biomet/asu045
C
HAKRABORTY,A.andCHAUDHURI, P. (2017). Tests for high-dimensional data based on means, spatial signs
and spatial ranks. Ann. Statist. 45 771–799. MR3650400 https://doi.org/10.1214/16-AOS1467
C
HEN,H.,CHEN,X.andSU, Y. (2018). A weighted edge-count two-sample test for multivariate and object data.
J. Amer. Statist. Assoc. 113 1146–1155. MR3862346 https://doi.org/10.1080/01621459.2017.1307757
C
HEN,L.,DOU,W.W.andQIAO, Z. (2013). Ensemble subsampling for imbalanced multivariate two-sample
tests. J. Amer. Statist. Assoc. 108 1308–1323. MR3174710 https://doi.org/10.1080/01621459.2013.800763
C
HEN,H.andFRIEDMAN, J. H. (2017). A new graph-based two-sample test for multivariate and object data.
J. Amer. Statist. Assoc. 112 397–409. MR3646580 https://doi.org/10.1080/01621459.2016.1147356
C
HEN,S.X.andQIN, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set
testing. Ann. Statist. 38 808–835. MR2604697 https://doi.org/10.1214/09-AOS716
C
HIKKAGOUDAR,M.S.andBHAT, B. V. (2014). Limiting distribution of two-sample degenerate U-statistic
under contiguous alternatives and applications. J. Appl. Statist. Sci. 22 127–139. MR3616873
C
HUNG,E.andROMANO, J. P. (2013). Exact and asymptotically robust permutation tests. Ann. Statist. 41 484–
507. MR3099111 https://doi.org/10.1214/13-AOS1090
C
RAMÉR, H. (1928). On the composition of elementary errors. Skand. Aktuarietidskr. 11 141–180.
C
UI, H. (2002). Average projection type weighted Cramér–von Mises statistics for testing some distributions. Sci.
China Ser. A 45 562–577. MR1911172
E
SCANCIANO, J. C. (2006). A consistent diagnostic test for regression models using projections. Econometric
Theory 22 1030–1051. MR2328527 https://doi.org/10.1017/S0266466606060506
F
RIEDMAN,J.H.andRAFSKY, L. C. (1979). Multivariate generalizations of the Wald–Wolfowitz and Smirnov
two-sample tests. Ann. Statist. 7 697–717. MR0532236
G
RETTON,A.,BORGWARDT,K.M.,RASCH,M.J.,SCHÖLKOPF,B.andSMOLA, A. (2012). A kernel two-
sample test. J. Mach. Learn. Res. 13 723–773. MR2913716
H
ALL,P.,MARRON,J.S.andNEEMAN, A. (2005). Geometric representation of high dimension, low sample
size data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 427–444. MR2155347 https://doi.org/10.1111/j.1467-9868.
2005.00510.x
H
ARCHAOUI,Z.,BACH,F.,CAPPE,O.andMOULINES, E. (2013). Kernel-based methods for hypothesis testing:
A unified view. IEEE Signal Process. Mag. 30 87–97.
H
ENZE, N. (1988). A multivariate two-sample test based on the number of nearest neighbor type coincidences.
Ann. Statist. 16 772–783. MR0947577 https://doi.org/10.1214/aos/1176350835
3440 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN
H
ETTMANSPERGER,T.P.,MÖTTÖNEN,J.andOJA, H. (1998). Affine invariant multivariate rank tests for sev-
eral samples. Statist. Sinica 8 785–800. MR1651508
H
OEFFDING, W. (1952). The large-sample power of tests based on permutations of observations. Ann. Math. Stat.
23 169–192. MR0057521 https://doi.org/10.1214/aoms/1177729436
H
U,J.andBAI, Z. (2016). A review of 20 years of naive tests of significance for high-dimensional mean
vectors and covariance matrices. Sci. China Math. 59 2281–2300. MR3578957 https://doi.org/10.1007/
s11425-016-0131-0
K
ANAMORI,T.,SUZUKI,T.andSUGIYAMA, M. (2012). f -divergence estimation and two-sample homogene-
ity test under semiparametric density-ratio models. IEEE Trans. Inform. Theory 58 708–720. MR2917977
https://doi.org/10.1109/TIT.2011.2163380
K
IM,I.,BALAKRISHNAN,S.andWASSERMAN, L. (2020). Supplement to “Robust multivariate nonparametric
tests via projection averaging.https://doi.org/10.1214/19-AOS1936SUPP.
K
RUSKAL,J.B.JR. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem.
Proc. Amer. Math. Soc. 7 48–50. MR0078686 https://doi.org/10.2307/2033241
L
EE, A. J. (1990). U-Statistics: Theory and Practice. Statistics: Textbooks and Monographs 110.Dekker,New
York. MR1075417
L
EHMANN,E.L.andROMANO, J. P. (2005). Testing Statistical Hypotheses,3rded.Springer Texts in Statistics.
Springer, New York. MR2135927
L
I,J.andCHEN, S. X. (2012). Two sample tests for high-dimensional covariance matrices. Ann. Statist. 40
908–940. MR2985938 https://doi.org/10.1214/12-AOS993
L
IU, R. Y. (2006). Data Depth: Robust Multivariate Analysis, Computational Geometry, and Applications 72.
Amer. Math. Soc., Providence.
L
IU,Z.andMODARRES, R. (2011). A triangle test for equality of distribution functions in high dimensions.
J. Nonparametr. Stat. 23 605–615. MR2836279 https://doi.org/10.1080/10485252.2010.485644
L
OPEZ-PAZ,D.andOQUAB, M. (2016). Revisiting classifier two-sample tests. Preprint. Available at
arXiv:1610.06545.
M
ONDAL,P.K.,BISWAS,M.andGHOSH, A. K. (2015). On high dimensional two-sample tests based on nearest
neighbors. J. Multivariate Anal. 141 168–178. MR3390065 https://doi.org/10.1016/j.jmva.2015.07.002
M
UKHOPADHYAY,S.andWANG, K. (2018). A nonparametric approach to high-dimensional K-sample compar-
ison problem. Preprint. Available at arXiv:1810.01724.
O
JA, H. (2010). Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs and Ranks.
Lecture Notes in Statistics 199. Springer, New York. MR2598854 https://doi.org/10.1007/978-1-4419-0468-3
O
JA,H.andRANDLES, R. H. (2004). Multivariate nonparametric tests. Statist. Sci. 19 598–605. MR2185581
https://doi.org/10.1214/088342304000000558
P
AN,W.,TIAN,Y.,WANG,X.andZHANG, H. (2018). Ball divergence: Nonparametric two sample test. Ann.
Statist. 46 1109–1137. MR3797998 https://doi.org/10.1214/17-AOS1579
P
ESARIN, F. (2001). Multivariate Permutation Tests: With Applications in Biostatistics. Wiley, Chichester.
MR1855501
R
AMDAS,A.,REDDI,S.J.,POCZOS,B.,SINGH,A.andWASSERMAN, L. (2015). Adaptivity and computation-
statistics tradeoffs for kernel and distance based high dimensional two sample testing. Preprint. Available at
arXiv:1508.00655.
R
OSENBAUM, P. R. (2005). An exact distribution-free test comparing two multivariate distributions based on ad-
jacency. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 515–530. MR2168202 https://doi.org/10.1111/j.1467-9868.
2005.00513.x
S
CHILLING, M. F. (1986). Multivariate two-sample tests based on nearest neighbors. J. Amer. Statist. Assoc. 81
799–806. MR0860514
S
EJDINOVIC,D.,SRIPERUMBUDUR,B.,GRETTON,A.andFUKUMIZU, K. (2013). Equivalence of
distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41 2263–2291. MR3127866
https://doi.org/10.1214/13-AOS1140
S
ZÉKELY,G.J.andRIZZO, M. L. (2004). Testing for equal distributions in high dimension. Interstate 5.
S
ZÉKELY,G.J.andRIZZO, M. L. (2013). Energy statistics: A class of statistics based on distances. J. Statist.
Plann. Inference 143 1249–1272. MR3055745 https://doi.org/10.1016/j.jspi.2013.03.018
T
HAS, O. (2010). Comparing Distributions. Springer Series in Statistics. Springer, New York. MR2547894
https://doi.org/10.1007/b99044
W
ALD,A.andWOLFOWITZ, J. (1940). On a test whether two samples are from the same population. Ann. Math.
Stat. 11 147–162. MR0002083 https://doi.org/10.1214/aoms/1177731909
W
ANG,L.,PENG,B.andLI, R. (2015). A high-dimensional nonparametric multivariate test for mean vector.
J. Amer. Statist. Assoc. 110 1658–1669. MR3449062 https://doi.org/10.1080/01621459.2014.988215
Z
HOU,W.-X.,ZHENG,C.andZHANG, Z. (2017). Two-sample smooth tests for the equality of distributions.
Bernoulli 23 951–989. MR3606756 https://doi.org/10.3150/15-BEJ766
NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3441
ZHU,L.-X.,FANG,K.-T.andBHATTI, M. I. (1997). On estimated projection pursuit-type Crámer–von Mises
statistics. J. Multivariate Anal. 63 1–14. MR1491563 https://doi.org/10.1006/jmva.1997.1673
Z
HU,L.,XU,K.,LI,R.andZHONG, W. (2017). Projection correlation between two random vectors. Biometrika
104 829–843. MR3737307 https://doi.org/10.1093/biomet/asx043