Robust multivariate nonparametric tests via projection averaging

The Annals of Statistics

2020, Vol. 48, No. 6, 3417–3441

https://doi.org/10.1214/19-AOS1936

ROBUST MULTIVARIATE NONPARAMETRIC TESTS VIA

PROJECTION AVERAGING

Y ILMUN KIM

,SIVARAMAN BALAKRISHNAN

†

AND LARRY WASSERMAN

‡

Department of Statistics & Data Science, Carnegie Mellon University,

[email protected];

†

[email protected];

‡

[email protected]

In this work, we generalize the Cramér–von Mises statistic via projection

averaging to obtain a robust test for the multivariate two-sample problem.

The proposed test is consistent against all ﬁxed alternatives, robust to heavy-

tailed data and minimax rate optimal against a certain class of alternatives.

Our test statistic is completely free of tuning parameters and is computation-

ally efﬁcient even in high dimensions. When the dimension tends to inﬁnity,

the proposed test is shown to have comparable power to the existing high-

dimensional mean tests under certain location models. As a by-product of

our approach, we introduce a new metric called the angular distance which

can be thought of as a robust alternative to the Euclidean distance. Using the

angular distance, we connect the proposed method to the reproducing ker-

nel Hilbert space approach. In addition to the Cramér–von Mises statistic,

we demonstrate that the projection-averaging technique can be used to deﬁne

robust multivariate tests in many other problems.

1. Introduction. Let X and Y be random vectors deﬁned on a common probability space

(,

A, P) with distributions P

and P

, respectively. Given two mutually independent sam-

ples

={X

,...,X

} and Y

={Y

,...,Y

} from P

and P

, we want to test

: P

= P

versus H

: P

= P

.(1.1)

This fundamental problem has received considerable attention in statistics with a wide range

of applications (see, e.g., Thas (2010), for a review). A common statistic for the univariate

two-sample testing is the Cramér–von Mises (CvM) statistic (Anderson (1962)):

m + n



∞





(t) −



(t)





H(t),

where



(t) and



(t) are the empirical distribution functions of X

and Y

, respectively,

and (m + n)



H(t) = m



(t) + n



(t). Another approach is based on the energy statistic,

which is an estimate of the squared energy distance (Székely and Rizzo (2013)):

= 2E



−Y



−E



−X



−E



−Y



where |x| is the absolute value of x ∈ R. The energy distance is well deﬁned assuming a

ﬁnite ﬁrst moment and it can be written in a form that is similar to Cramér’s distance (Cramér

(1928)), namely,

= 2



∞

−∞



(t) − F

(t)



dt,

where F

(t) and F

(t) are the distribution functions of X and Y , respectively.

Received May 2019; revised December 2019.

MSC2020 subject classiﬁcations. Primary 62G10, 62H15, 62H20; secondary 62G35.

Key words and phrases. Energy statistic, high dimension and low sample size, independence testing, maximum

mean discrepancy, permutation tests, U-statistic.

3417

3418 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

The CvM-statistic has several advantages over the energy statistic for univariate two-

sample testing. For instance, the CvM-statistic is distribution-free under H

(Anderson

(1962)) and its population counterpart is well deﬁned without any moment assumptions. It

also has an intuitive probabilistic interpretation in terms of probabilities of concordance and

discordance of four independent random variables (Baringhaus and Henze (2017)). Never-

theless, the CvM-statistic has rarely been studied for multivariate testing. A primary reason

is that the CvM-statistic is essentially rank-based, which leads to a challenge to generalize

it in a multivariate space. In contrast, the energy statistic can be easily applied in arbitrary

dimensions as in Baringhaus and Franz (2004)andSzékely and Rizzo (2004). Speciﬁcally,

they deﬁned the squared multivariate energy distance by

) = 2E



X

−Y





−E



X

−X





−E



Y

−Y





,(1.2)

where ·is the Euclidean norm in R

. The multivariate energy distance maintains the

characteristic property that it is always nonnegative and equal to zero if and only if P

= P

It can also be viewed as the average of univariate Cramér’s distances of projected random

variables (Baringhaus and Franz (2004)):

) =

√

π(d −1)(

d−1

)

(

)



d−1







(t) − F



(t)



dt dλ(β),(1.3)

where λ represents the uniform probability measure on the d-dimensional unit sphere S

d−1

{x ∈ R

:x=1}, (·) is the gamma function and the symbol  stands for the transpose

operation.

Although the multivariate energy distance can be easily estimated in any dimension, it still

requires the ﬁnite moment assumption as in the univariate case. When the underlying dis-

tributions violate this moment condition with potential outliers, the energy statistic becomes

extremely unstable and the resulting test might perform poorly. Given that outlying obser-

vations arise frequently in practice with high-dimensional data, there is a need to develop a

robust counterpart of the energy distance. The primary goal of this work is to introduce a

robust, tuning parameter-free, two-sample testing procedure that is easily applicable in ar-

bitrary dimensions and consistent against all ﬁxed alternatives. Speciﬁcally, we modify the

univariate CvM-statistic to generalize it to an arbitrary dimension by averaging over all one-

dimensional projections. In detail, the proposed test statistic is an unbiased estimate of the

squared multivariate CvM-distance deﬁned as follows:

) =



d−1







(t) − F



(t)



(t) dλ(β),(1.4)

where H

(t) = ϑ



(t) +ϑ



(t) and ϑ

is a ﬁxed value in (0, 1) and ϑ

= 1 −ϑ

For simplicity and when there is no ambiguity, we may omit the dependency on P

, P

and

write W

) as W

Throughout this paper, we refer to the process of averaging over all projections as projec-

tion averaging.

1.1. Summary of our results. The proposed multivariate CvM-distance shares some ap-

pealing properties of the energy distance while being robust to heavy-tailed distributions or

outliers. For example, W

is invariant to orthogonal transformations and satisﬁes the charac-

teristic property (Lemma 2.1), meaning that W

is nonnegative and equal to zero if and only

if P

= P

. More importantly, it is straightforward to estimate W

without using any tuning

parameters (Theorem 2.1). Based on an unbiased estimate of W

, we apply the permutation

test procedure to determine a critical value of the test statistic. Although the permutation ap-

proach has been standard in practical implementations of two-sample testing, its theoretical

NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3419

properties have been less explored beyond simple cases (e.g., Pesarin (2001)). Indeed, previ-

ous studies usually consider asymptotic tests in their theory section whereas their actual tests

are calibrated via permutations. We bridge the gap between theory and practice by presenting

both theoretical and empirical results on the permutation test under various scenarios. Our

main results regarding the CvM-distance are summarized as follows:

• Closed-form expression (Section 2): Building on Escanciano (2006)andZhu et al. (2017),

we show that the test statistic has a simple closed-form expression.

• Asymptotic power (Section 2): We prove that the permutation test based on the proposed

statistic has the same asymptotic power as the oracle and asymptotic tests that assume

knowledge of the underlying distributions (Section 2.2) against ﬁxed and contiguous alter-

natives.

• Robustness (Section 3): We show that the permutation test based on the proposed statistic

maintains good power in the contamination model, while the energy test becomes com-

pletely powerless in this setting.

• Minimax optimality (Section 4): We analyze the ﬁnite-sample power of the proposed per-

mutation test and prove its minimax rate optimality against a class of alternatives that differ

from the null in terms of the CvM-distance. We also show that the energy test is not optimal

in our context.

• HDLSS behavior (Section 5): We consider a high-dimension, low-sample size (HDLSS)

regime where the dimension tends to inﬁnity while the sample size is ﬁxed. Under this

regime, we identify sufﬁcient conditions under which the power of the proposed test con-

verges to one. In addition, we show that the proposed test has comparable power to the

high-dimensional mean tests introduced by Chen and Qin (2010)andChakraborty and

Chaudhuri (2017) under certain location models.

• Angular distance (Section 6): We introduce the angular distance between two vectors and

use this to show that the multivariate CvM-distance is a special case of the generalized

energy distance (Sejdinovic et al. (2013)). Furthermore, the CvM-distance is the maximum

mean discrepancy (Gretton et al. (2012)) associated with the angular distance.

Beyond the CvM-statistic, the projection-averaging technique can be widely applicable

to other nonparametric statistics. In the second part of this study, we revisit some famous

univariate sign- or rank-based statistics and propose their multivariate counterparts via pro-

jection averaging. Although there has been much effort to extend univariate sign- or rank-

based statistics in a multivariate space (see, e.g., Hettmansperger, Möttönen and Oja (1998),

Liu (2006), Oja (2010), Oja and Randles (2004)), they are either computationally expensive

to implement or less intuitive to understand. Our projection-averaging approach addresses

these issues by providing a tractable calculation form of statistics and by having a direct

interpretation in terms of projections. In Section 7 and also Appendix D.8, we demonstrate

the generality of the projection-averaging approach by presenting multivariate extensions of

several existing univariate statistics.

1.2. Literature review. There are a number of multivariate two-sample testing proce-

dures available in the literature. We list some fundamental methods and recent developments.

Anderson, Hall and Titterington (1994)

proposed the two-sample statistic based on the inte-

grated square distance between two kernel density estimates. The energy statistic was intro-

duced by Baringhaus and Franz (2004)andSzékely and Rizzo (2004) independently. Biswas

and Ghosh (2014) modiﬁed the energy statistic to improve the performance of the previ-

ous test for the high-dimensional location-scale and scale problems. Gretton et al. (2012)

introduced a class of distances between two probability distributions, called the maximum

mean discrepancy (MMD), based on a reproducing kernel Hilbert approach. Sejdinovic et al.

3420 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

(2013) showed that the energy distance is a special case of the MMD associated with the

kernel induced by the Euclidean distance. Recently, Pan et al. (2018) proposed a new met-

ric, named the ball divergence, between two probability distributions and connected it to the

MMD. A further review of kernel-based two-sample tests can be found in Harchaoui et al.

(2013).

Another line of work is based on graph constructions. Schilling (1986)andHenze (1988)

introduced a multivariate two-sample test based on the k nearest neighbor (NN) graph.

Mondal, Biswas and Ghosh (2015) pointed out that the previous NN test may suffer from

low power for the high-dimensional location-scale problem and provided an alternative that

addresses this limitation. Another variant of the NN test, which is tailored to imbalanced sam-

ples, can be found in Chen, Dou and Qiao (2013). Friedman and Rafsky (1979) considered

minimum spanning tree (MST) to present a generalization of the univariate run test in Wald

and Wolfowitz (1940). The MST test proposed by Friedman and Rafsky (1979) has recently

been modiﬁed by Chen and Friedman (2017)andChen, Chen and Su (2018) to improve

power under scale alternatives and imbalanced samples, respectively. Rosenbaum (2005)

proposed a distribution-free test in ﬁnite samples based on cross-matches. More recently,

Biswas, Mukhopadhyay and Ghosh (2014) introduced another distribution-free test based

on the shortest Hamiltonian path. A general theoretical framework for graph-based tests has

been established by Bhattacharya (2018), Bhattacharya (2019). Other recent developments

include Liu and Modarres (2011), Kanamori, Suzuki and Sugiyama (2012), Bera, Ghosh and

Xiao (2013), Lopez-Paz and Oquab (2016), Zhou, Zheng and Zhang (2017), Mukhopadhyay

and Wang (2018

), among others.

The

projection-averaging approach to CvM-type statistics can be found in other statistical

problems. For example, Zhu, Fang and Bhatti (1997)andCui (2002) considered the CvM-

statistic using projection averaging to investigate one-sample goodness-of-ﬁt tests for mul-

tivariate distributions. Escanciano (2006) proposed the CvM-based goodness-of-ﬁt test for

parametric regression models. To the best of our knowledge, however, this is the ﬁrst study

that investigates the CvM-statistic for the multivariate two-sample problem via projection

averaging.

Our technique to obtain a closed-form expression for projection-averaging statistics is

based on Escanciano (2006). The same principle has been exploited by Zhu et al. (2017)inthe

context of testing for multivariate independence. We further extend the result of Escanciano

(2006) to more general cases and provide an alternative proof using orthant probabilities for

normal distributions.

Outline. The rest of this paper is organized as follows. In Section 2, we introduce our test

statistic and the permutation test procedure. We then study their limiting behaviors under the

conventional ﬁxed dimension asymptotic framework. In Section 3, we compare the power

of the CvM test with that of the energy test and highlight the robustness of the CvM test.

Section 4 establishes minimax rate optimality of the proposed test against a certain class of

alternatives associated with the CvM-distance. In Section 5, we study the asymptotic power of

the CvM test in the HDLSS setting. We introduce the angular distance between two vectors

in Section 6 to show that the CvM-distance is the generalized energy distance built on the

introduced metric. In Section 7, the projection-averaging technique is applied to other sign-

or rank-based statistics and this allows us to provide new multivariate extensions. Simulation

results are reported in Section 8 to demonstrate the competitive power performance of the

proposed approach with ﬁnite sample size. All proofs of the main results are deferred to the

Supplementary Material (Kim, Balakrishnan and Wasserman (2020)).

Notation. For two nonzero vectors U

∈ R

, we denote the angle between U

and U

Ang(U

) = arccos{U



/(U

U

)}.For1≤q ≤ p,welet(p)

= p(p −1) ···(p −

NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3421

q +1).LetP

and P

be the probability measures under H

and H

, respectively. Similarly,

and E

stand for the expectations with respect to P

and P

. For any two real sequences

} and {b

},weusea

 b

if there exist constants C,C



> 0 such that C<|a

| <C



for each n. We write a

= O(b

) if there exists C>0 such that |a

|≤C|b

| for each n.

We also write a

= o(b

) if lim

n→∞

= 0. For a sequence of random variables X

,we

use the notation X

= O

) when X

is bounded in probability (tight). The acronym i.i.d.

stands for independent and identically distributed and we use the symbol X

,...,X

i.i.d.

∼ P to

represent that X

,...,X

are i.i.d. samples from distribution P . We denote the d ×d identity

matrix by I

. The symbol 1(·) is used for indicator functions. We write summation over the

set of all k-tuples drawn without replacement from {1,...,n} by



n,=

,...,i

. Throughout this

paper, we assume that all vectors are column vectors and m, n ≥ 2.

2. Projection averaging-type Cramér–von Mises statistics. In this section, we start

with the basic properties of the CvM-distance. We then introduce our test statistic and study

its limiting behavior. We end this section with a description of the permutation test and its

large sample properties. Throughout this section, we consider the conventional asymptotic

regime where the dimension is ﬁxed and

m + n

→ ϑ

∈ (0, 1) and

m + n

→ ϑ

∈ (0, 1) as N = m +n →∞.(2.1)

Let us ﬁrst establish the characteristic property of the CvM-distance.

EMMA 2.1 (Characteristic property). W

is nonnegative and has the characteristic

property:

) = 0 if and only if P

= P

Note that W

involves integration over the unit sphere. One way to approximate this inte-

gral is to consider a subset of S

d−1

, namely {β

,...,β

}, and then to take the sample mean

over k different univariate CvM-statistics (see, e.g., Zhu, Fang and Bhatti (1997)). However,

this approach has an unpleasant trade-off between accuracy and computational time depend-

ing on the choice of k. The problem becomes even worse in high dimensions where one may

need exponentially many projections to achieve a certain accuracy. Our approach, on the other

hand, does not suffer from this computational issue by explicitly calculating the integral over

d−1

. The explicit form of the integration is mainly due to Escanciano (2006) who provided

the following lemma.

EMMA 2.2 (Escanciano (2006)). For any nonzero vectors U

∈ R



d−1





≤ 0







≤ 0



dλ(β) =

−

2π

Ang(U

EMARK 2.1. Escanciano (2006) proved Lemma 2.2 using the volume of a spherical

wedge. In the Supplementary Material (Appendix C.2), we provide an alternative proof of

this result based on orthant probabilities for normal distributions. We also extend this result

to integration involving strictly more than two indicator functions in the Supplementary Ma-

terial (Lemma B.8 and Lemma D.1). This extension allows us to generalize the univariate τ

∗

(Bergsma and Dassios (2014)) in Theorem 7.1 and potentially many other univariate statistics

(see Appendix D.8).

BasedonLemma2.2, we give another representation of W

in terms of the expected angle

involving three independent random vectors. Here and hereafter, we assume that



X and β



Y have continuous distribution functions for λ-almost all β ∈S

d−1

3422 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

This continuity assumption greatly simpliﬁes the alternative expression for W

and avoids

the possibility that

Ang(·, ·) is not well deﬁned when one of the inputs is a zero vector. This

issue may be handled by deﬁning

Ang(·, ·) differently for those exceptional cases, but we do

not pursue this direction here.

HEOREM 2.1 (Closed-form expression). Suppose that X

i.i.d.

∼ P

and, indepen-

dently, Y

i.i.d.

∼ P

. Then the squared multivariate CvM-distance can be written as

) =

−

2π



Ang

−Y

)



−

2π



Ang

−X

)



The above result highlights that W

) is invariant to the choice of ϑ

and ϑ

under

the continuity assumption. In the next subsection, we introduce the test statistic and study its

limiting behavior.

2.1. Test statistic and limiting distributions. Theorem 2.1 leads to a natural empirical

estimate of W

based on a U -statistic. Consider the kernel of order two:

(2.2)

CvM

) =

−

2π

Ang(x

−y

)

−

2π

Ang(y

−x

We denote its symmetrized version, which is invariant to the order of the ﬁrst two arguments

as well as the last two arguments, by



CvM

) =

CvM

) +

CvM

Then our test statistic is deﬁned as follows:

CvM





−1





−1



1≤i

≤m



1≤j

≤n



CvM

).(2.3)

Leveraging the basic theory of U -statistics (e.g., Lee (1990)), it is clear that U

CvM

is an

unbiased estimator of W

. Additionally, U

CvM

is a degenerate U -statistic under the null hy-

pothesis as we prove in the Supplementary Material (Appendix C.5). Hence we can apply

the asymptotic theory for a degenerate two-sample U -statistic (Chapter 3 of Bhat (1995)) to

obtain the following result.

HEOREM 2.2 (Asymptotic null distribution of U

CvM

). For e ach k =1, 2,..., let λ

the eigenvalue with the corresponding eigenfunction φ

satisfying the integral equation







CvM

) | X



)



= λ

).(2.4)

Then U

CvM

has the limiting null distribution under the limiting regime (2.1) given by

CvM

−→ ϑ

−1

∞



k=1



−1



where ξ

i.i.d.

∼ N(0, 1) and

−→ stands for convergence in distribution.

NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3423

Under a ﬁxed alternative hypothesis where P

and P

do not change with m and n,the

proposed test statistic converges weakly to a normal distribution. We build on Hoeffding’s de-

composition of a two-sample U -statistic (e.g., page 40 of Lee (1990)) to prove the following

result.

HEOREM 2.3 (Asymptotic distribution of U

CvM

under ﬁxed alternatives). Let us deﬁne

= V







CvM

) | X



and

= V







CvM

) | Y



where V(·) is the variance operator. Then under the limiting regime (2.1) and ﬁxed alternative

= P

, we have

√



CvM

−W



−→ N



0, 4ϑ

−1

+4ϑ

−1



From the previous two theorems, it is clear to see that NU

CvM

is stochastically bounded

under the null hypothesis whereas it diverges to inﬁnity under ﬁxed alternatives. Thus one

can expect that any reasonable test based on the proposed test statistic is consistent (meaning

that the power converges to one as N →∞) against all ﬁxed alternatives. In fact, the prob-

lem of distinguishing two ﬁxed distributions is too easy in large sample situations and many

of nonparametric tests are known to be consistent in this asymptotic regime. We therefore

turn now to a more challenging scenario where a distance between P

and P

diminishes

as the sample size increases. To this end, we make a standard assumption that the underly-

ing distributions belong to quadratic mean differentiable (QMD) families (e.g., Bhattacharya

(2019)).

EFINITION 2.1 (Quadratic mean differentiable families, page 484 of Lehmann and Ro-

mano (2005)). Let {P

,θ ∈ } be a family of probability distributions on (R

, B) where

B is the Borel σ -ﬁeld associated with R

and  is an open subset of R

. Assume each P

is absolutely continuous with respect to Lebesgue measure and set p

(t) = dP

(t)/dt.The

family {P

,θ ∈ } is quadratic mean differentiable at θ

if there exists a vector of real-valued

functions η(·,θ

) = (η

(·,θ

),...,η

(·,θ

))



such that







(t) −



(t) − b



η(t,θ

)



dt = o



b



as b→0.

The QMD families include a broad class of parametric distributions such as exponential

families in natural form. By focusing on the QMD families, we are particularly interested

in asymptotically nondegenerate situations where the limiting sum of the type I and type II

errors of the optimal test is nontrivial, that is, bounded by the nominal level α and one. It has

been shown that when P

and P

belong to the QMD families, this nondegenerate situation

occurs when θ

−θ

N

−1/2

(Chapter 13.1 of Lehmann and Romano (2005)). Hence we

consider a sequence of contiguous alternatives where θ

= θ

+ bN

−1/2

for some b ∈ R

and establish the asymptotic behavior of U

CvM

under the given scenario. Our result builds on

the prior work by Chikkagoudar and Bhat (2014) and extends it to multivariate cases.

HEOREM 2.4 (Asymptotic distribution of U

CvM

under contiguous alternatives). As-

sume {P

,θ ∈ } is quadratic mean differentiable at θ

with derivative η(·,θ

) and  is

an open subset of R

. Deﬁne the Fisher information matrix to be the matrix I(θ) with (i, j )

entry

i,j

(θ) = 4



(t, θ)η

(t, θ) dt,

3424 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

and assume that I(θ

) is nonsingular. Suppose we observe X

i.i.d.

∼ P

and Y

i.i.d.

∼ P

+bN

−1/2

for b ∈ R

. Then under the limiting regime (2.1),

CvM

−→ ϑ

−1

∞



k=1



+ϑ

1/2



−1



where a



2{b



η(x,θ

)}p

−1/2

(x)φ

(x) dP

(x).

The above theorem implies that if there exists k ≥ 1 such that a

= 0andλ

> 0, a test

based on U

CvM

can have asymptotic power greater than α (see page 615 of Lehmann and

Romano (2005)). This is in contrast to the NN test which has a slower consistency rate given

by N

−1/4

when d ≤ 8 under some regularity conditions (see Bhattacharya (2018)). In the

Supplementary Material, we consider low-dimensional Gaussian location models and illus-

trate that the proposed CvM test dominates the NN test via simulations. In fact, the former

tends to have very close power to Hotelling’s T

test, which is known to be optimal under the

low-dimensional Gaussian scenarios (Anderson (2003)).

2.2. Critical value and permutation test. So far, we have investigated the limiting behav-

iors of the test statistic under the null and (ﬁxed and contiguous) alternative hypotheses. It

is important to note that the performance of a test depends not only on its test statistic but

also crucially on its critical value. A common approach to determining the critical value is

based on the limiting null distribution of the test statistic. Since we are dealing with a general

composite null, one can deﬁne this limiting null distribution in various ways. Two natural

candidates are described as follows:

• P

(single)

CvM

: the limiting distribution of NU

CvM

based on i.i.d. samples from the single distri-

bution P

• P

(mix)

CvM

: the limiting distribution of NU

CvM

based on i.i.d. samples from the mixture distri-

bution ϑ

+ϑ

These two limiting distributions P

(single)

CvM

and P

(mix)

CvM

coincide when P

= P

but they are

different in general if P

= P

. By invoking Theorem 2.2, we can conclude that P

(single)

CvM

and P

(mix)

CvM

are Gaussian chaos distributions in the low-dimensional setting. The asymptotic

tests then reject the null hypothesis when NU

CvM

is greater than the upper 1 − α quantile of

(single)

CvM

or P

(mix)

CvM

, denoted by q

(single)

α,CvM

and q

(mix)

α,CvM

, respectively.

Unfortunately, this asymptotic approach is infeasible as the limiting distributions involve

quantities that depend on the underlying distributions and that cannot be easily estimated.

Even if either P

(single)

CvM

or P

(mix)

CvM

is known exactly, the resulting asymptotic test does not have

ﬁnite-sample guarantees on the type I error control. For this reason, we advocate for using the

permutation procedure that resolves the issues of the asymptotic approach. More importantly,

asshowninTheorem2.6, the power of the permutation test is asymptotically the same as that

of the asymptotic tests under the conventional asymptotic regime.

Before we describe the permutation procedure, let us introduce the oracle test that serves

as a benchmark for the permutation test. Let T

m,n

be a generic two-sample test statistic. Then

the critical value of the oracle test based on T

m,n

can be determined as follows:

• Oracle test.

1. Consider new i.i.d. samples {



,...,



} from the mixture ϑ

+ϑ

2. Let T

m,n

(



Z) be the test statistic of interest calculated based on



,...,



}

and



m+1

,...,



NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3425

3. Given a signiﬁcance level 0 <α<1, return the critical value c

∗

α,m,n

deﬁned by

∗

α,m,n

:= inf



t ∈ R : 1 − α ≤P



m,n

(



Z) ≤ t



.(2.5)

It is worth pointing out that the oracle statistic T

m,n

(



Z) has the same distribution as the

test statistic based on the original samples under H

, but not necessarily under H

. Hence

the oracle test based on c

∗

α,m,n

is exact under H

and can be powerful under H

.However,

∗

α,m,n

relies on the unknown mixture distribution ϑ

+ ϑ

, which makes the oracle

test impractical. In sharp contrast, the critical value of the permutation test can be obtained

without knowledge of the mixture distribution as follows:

• Permutation test.

1. Let {Z

,...,Z

}={X

,...,X

,...,Y

} be the pooled samples and Z



(1)

,...,Z

(N)

} where  ={(1),...,(N)} is a permutation of {1,...,N}.

2. Let T

m,n



) be the test statistic of interest calculated based on X



={Z

(1)

,...,

(m)

} and Y



={Z

(m+1)

,...,Z

(N)

3. Given a signiﬁcance level 0 <α<1, return the critical value c

α,m,n

deﬁned by

α,m,n

:= inf



t ∈ R : 1 − α ≤



 ∈S



m,n



) ≤ t





,(2.6)

where

is the set of all permutations of {1,...,N}.

In Theorem 2.5, we show that the difference between c

∗

α,m,n

and c

α,m,n

for the proposed

statistic is asymptotically negligible under both the null and alternative hypotheses. This con-

nection in turn implies that the permutation critical value converges to q

(mix)

α,CvM

, which is the

limit of the oracle critical value by construction. Moreover, under the contiguous alterna-

tive, we also establish that q

(single)

α,CvM

is the same as q

(mix)

α,CvM

. Building on this observation, we

formally prove that (i) the permutation test, (ii) the oracle test and (iii) the asymptotic tests

based on P

(single)

CvM

and P

(mix)

CvM

have the same asymptotic power against both contiguous and

ﬁxed alternatives in Theorem 2.6. In doing so, we develop a general asymptotic theory for the

permutation distribution of a two-sample degenerate U-statistic under H

. This general result

is established based on Hoeffding’s conditions (Hoeffding (1952)) and extended to H

via the

coupling argument (Chung and Romano (2013)). The details can be found in Appendix A.

Let us denote by c

∗

α,CvM

and c

α,CvM

the critical values of the oracle test and the permutation

test based on the scaled CvM-statistic, that is, NU

CvM

, as described in the procedures (2.5)

and (2.6), respectively. Then our result on the critical values is stated as follows.

HEOREM 2.5 (Asymptotic behavior of the critical values). Consider the conventional

limiting regime in (2.1) with the additional assumption that m/N − ϑ

= O(N

−1/2

). Then

under both the null and (ﬁxed or contiguous) alternative hypotheses,

α,CvM

−→ q

(mix)

α,CvM

and c

∗

α,CvM

−→ q

(mix)

α,CvM

where

−→ stands for convergence in probability. Moreover, under the null or contiguous

alternative, we further have that q

(mix)

α,CvM

= q

(single)

α,CvM

Leveraging the previous result combined with Slutsky’s theorem, we next prove that the

asymptotic power of the oracle test, the permutation test and the asymptotic tests are identical

against any ﬁxed and contiguous alternatives. This clearly highlights an advantage of the

permutation test as it is exact under H

and asymptotically as powerful as the oracle and

asymptotic tests under H

. More importantly, the permutation test does not require any prior

information on the underlying distributions.

3426 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

THEOREM 2.6 (Asymptotic equivalence of power). The oracle test and the permutation

test control the type I error under the null hypothesis as



CvM

∗

α,CvM



≤ α and P

(NU

CvM

α,CvM

) ≤ α.

On the other hand, under the ﬁxed or contiguous alternative hypotheses considered in The-

orem 2.3 and Theorem 2.4 with the additional assumption that m/N − ϑ

= O(N

−1/2

), we

have

(NU

CvM

α,CvM

) = P



CvM

∗

α,CvM



+o(1)

= P



CvM

(single)

α,CvM



+o(1) = P



CvM

(mix)

α,CvM



+o(1).

EMARK 2.2. It is worth pointing out that due to the symmetry of the kernel



CvM

,it

is enough to consider





permutations to obtain the critical value c

α,CvM

for the CvM test.

Nevertheless, except for small sample sizes, the exact permutation procedure is too expensive

to implement in practical applications. A common approach to alleviate this computational

issue is to use Monte Carlo sampling of random permutations and approximate the exact per-

mutation p-value. In more detail, note ﬁrst that the permutation test function can be written

as 1(



CvM

≤ α) where



CvM

is the permutation p-value given by



CvM



 ∈S



CvM



) ≥ U

CvM



Let 

(1)

,...,

(B)

be independent and uniformly distributed on S

. Then the Monte Carlo

version of the permutation p-value is computed by



(B)

CvM

B +1





i=1



CvM



(i)

) ≥ U

CvM





It is well known that 1(



(B)

CvM

≤ α) is also a valid level α test for any ﬁnite sample size and



CvM

−



(B)

CvM

−→ 0asB →∞(e.g., page 636 of Lehmann and Romano (2005)). Through-

out this paper, we also adopt this approach for our simulation studies.

3. Robustness. Recall that the energy distance and the CvM-distance can be represented

by integrals of the L

-type difference between two distribution functions. In view of this, the

main difference between the energy distance and the CvM-distance is in their weight function.

More precisely, the energy distance is deﬁned with dt, which gives a uniform weight to the

whole real line. On the other hand, the CvM-distance is deﬁned with dH

(t), which gives

the most weight on high-density regions. As a result, the test based on the CvM-distance is

more robust to extreme observations than the one based on the energy distance. It is also

important to note that the CvM-distance is well deﬁned without any moment conditions,

whereas the energy distance is only well deﬁned assuming a ﬁnite ﬁrst moment. When the

moment condition is violated or there exist extreme observations, the test based on the energy

distance may perform poorly. The purpose of this section is to demonstrate this point both

theoretically and empirically by using contaminated distribution models.

3.1. Theoretical analysis. Suppose that we observe samples from an -contamination

model:

(3.1)

X ∼ P

X,N

:= (1 − )Q

+G

and

Y ∼ P

Y,N

:= (1 − )Q

+G

NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3427

where G

can change arbitrarily with N and  ∈ (0, 1). Suppose that Q

and Q

are dif-

ferent so that a given test has high power to distinguish between Q

and Q

without con-

taminations. Then it is natural to expect that the power of the same test would not decrease

much for the contamination model when  is close to zero. In other words, an ideal test would

maintain robust power against any choice of G

as long as Q

and Q

are different and 

is small. Unfortunately, this is not the case for the energy test. As we shall see, for any arbi-

trary small (but ﬁxed) , there exists a contamination G

such that the energy test becomes

asymptotically powerless under mild moment conditions for Q

and Q

. On the other hand,

the CvM test is uniformly powerful over any choice of G

as sample size tends to inﬁnity.

EMARK 3.1. We mainly focus on statistical power to study robustness because one can

always employ the permutation procedure to control the type I error under H

: P

X,N

= P

Y,N

Let us consider the energy statistic based on a U -statistic:

(3.2)

Energy



i=1



j=1

X

−Y

−

(m)

m,=



X

−X



−

(n)

n,=



Y

−Y

.

Then the main result of this subsection is stated as follows.

HEOREM 3.1 (Robustness under contaminations). Suppose we observe samples X

and Y

from the contaminated model in (3.1) with an arbitrary small but ﬁxed contamination

ratio . Assume that Q

and Q

are ﬁxed but Q

= Q

while N changes. In addition,

assume that Q

and Q

have their ﬁnite second moments. Consider the tests based on U

CvM

and U

Energy

given by

CvM

:= 1(U

CvM

α,CvM

) and φ

Energy

:= 1(U

Energy

α,Eng

where c

α,CvM

and c

α,Eng

are α level permutation critical values of U

CvM

and U

Energy

, re-

spectively. Then for any (Q

), there exists a certain G

such that the energy test be-

comes asymptotically powerless under the asymptotic regime in (2.1), whereas the CvM test

is asymptotically powerful uniformly over all possible G

. More precisely,

lim

m,n→∞

inf

[φ

Energy

]≤α and lim

m,n→∞

inf

[φ

CvM

]=1.(3.3)

EMARK 3.2. In Theorem 3.1, we made the assumption that Q

and Q

are ﬁxed

and have ﬁnite second moments. We also assumed the asymptotic regime in (2.1). These

assumptions are mainly for the energy test and are not necessary for the CvM test. In fact, the

same result can be derived for the CvM test given that there is a positive sequence b

m,n

→∞

increasing arbitrary slowly with m, n such that W

) ≥ b

m,n

(1/

√

m + 1/

√

n) (see

Theorem 4.2).

3.2. Empirical analysis. To illustrate Theorem 3.1 with ﬁnite sample size, we carried out

simulation studies using the contamination model in (3.1). In our simulation, we take Q

and

to have multivariate normal distributions with different location parameters or different

scale parameters. In both examples, we take G

to have a multivariate normal distribution

given by

:= N



(0,...,0)



,σ



where σ controls the scale of the contamination G

3428 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

FIG.1. Empirical power of NN, FR, Energy, BG, Hotelling, CQ, LRT, LC and CvM tests under the contamination

models with  = 0.05. See Examples 3.1 and 3.2 for details.

EXAMPLE 3.1 (Location difference). For the location alternative, we compare two mul-

tivariate normal distributions, where the means are different but the covariance matrices are

identical. Speciﬁcally, we set

= N



(−0.5,...,−0.5)





, and Q

= N



(0.5,...,0.5)





with  = 0.05. We then change σ = 1, 40, 80, 120, 160, 200 and 240 to investigate the ro-

bustness of the tests against contamination with large scale parameter σ .

XAMPLE 3.2 (Scale difference). Similar to the location alternative, we again choose

multivariate normal distributions which differ in their scale but not in their location parame-

ters. In detail, we have

= N



(0,...,0)



, 0.1

×I



and Q

= N



(0,...,0)





with  = 0.05. Again, we change σ = 1, 40, 80, 120, 160, 200 and 240 to assess the effect of

contamination with large scale parameter σ .

In addition to the energy test, we further considered three nonparametric tests in our sim-

ulation studies, namely, the k-nearest neighbor test by Schilling (1986) with k =3, the MST

test proposed by Friedman and Rafsky (1979) and the interpoint distance test by Biswas and

Ghosh (2014). For future reference, we refer to them as the NN test, the FR test and the BG

test, respectively. We also added the high-dimensional mean test by Chen and Qin (2010)

and Hotelling’s T

test (e.g., page 188 of Anderson (2003)) for the location alternative and

the high-dimensional covariance test by Li and Chen (2012) and the conventional likelihood

ratio test (e.g., page 412 of Anderson (2003)) for the scale alternative. We refer to them as

the CQ test, Hotelling’s test, the LC test and the LRT test, respectively.

Experiments were run 1000 times to estimate the power of different tests with m = n = 40

and d = 10 at signiﬁcance level α = 0.05. The p-value of each test was computed using 500

permutations as in Remark 2.2. As can be seen from Figure 1, the power of the CvM test

is consistently robust to the value of σ , which supports our theoretical result. The power of

the energy test, on the other hand, drops down signiﬁcantly as σ increases for both location

and scale differences. As explained in the proof of Theorem 3.1, this poor performance was

attributed to the fact that the energy statistic is very much dominated by extreme observations

from G

when σ is large. The graph-based tests, that is, the NN and FR tests, also show

a robust power performance against the contamination models. Intuitively speaking, they

NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3429

perform robustly under the given scenarios as their test statistics, which count the number of

edges in a graph, do not vary a lot even in the presence of outliers; but as far as we know,

there is no theoretical support for this result in the current literature. Moreover, these graph-

based tests typically exhibit poorer consistency rates (Bhattacharya (2018)) compared to the

proposed CvM test. The other four tests (Hotelling’s test, the LRT test, the LC test and the

CQ test) perform poorly for large σ , which may be explained similarly as to why the energy

test has low power in these examples.

4. Minimax optimality. Although our choice of the U -statistic was a natural one to

estimate W

, it remains unclear whether one can come up with a better test statistic for

testing whether H

: W

= 0orH

: W

> 0. One might also wonder whether there exists

a testing procedure that leads to signiﬁcantly higher power than the permutation test while

controlling the type I error. In this section, we shall show that the answer is negative from

a minimax point of view. In particular, we prove that the permutation test based on U

CvM

minimax rate optimal against a class of alternatives associated with the CvM-distance.

To formulate the minimax problem, let us deﬁne the set of two multivariate distributions

which are at least  far apart in terms of the CvM-distance, that is,

F() :=



) : W

) ≥ 



For a given signiﬁcance level α ∈ (0, 1),letT

m,n

(α) be the set of measurable functions φ :

{

, Y

}→{0, 1} such that

m,n

(α) =



φ :P

(φ =1) ≤ α



We then deﬁne the minimax type II error as follows:

1 − β

m,n

() = inf

φ∈T

m,n

(α)

sup

∈F ()

(φ =0).(4.1)

Our primary interest is in ﬁnding the minimax separation 

m,n

satisfying



m,n

= inf



 :1 −β

m,n

() ≤ ζ



for some 0 <ζ <1 − α. We start by establishing a lower bound for the minimax separation



m,n

based on Neyman–Pearson lemma.

HEOREM 4.1 (Lower bound). Fo r 0 <ζ <1 − α, there exists some constant b =

b(α, ζ) independent of the dimension such that 

m,n

= b(m

−1/2

+ n

−1/2

) and the minimax

type II error is lower bounded by ζ , that is,

1 − β

m,n

(

m,n

) ≥ ζ.

The above result shows that if 

m,n

is of lower order than m

−1/2

,thennotesthas

the type II error that is uniformly smaller than the nominal level α. We now prove that this

lower bound is tight by establishing a matching upper bound. In particular, the upper bound

is obtained by the permutation test based on U

CvM

, highlighting that the proposed approach

is minimax rate optimal.

HEOREM 4.2 (Upper bound). Recall the CvM test φ

CvM

given in Theorem 3.1. Fo r a

sufﬁciently large c>0, let 



m,n

be the radius of interest deﬁned by





m,n

:= c



√



.(4.2)

3430 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

Then there exists ζ ∈ (0, 1 −α) such that the type II error of φ

CvM

is uniformly bounded by

ζ , that is,

sup

∈F (



m,n

)

(φ

CvM

= 0)<ζ.

EMARK 4.1. We would like to emphasize that no assumption has been made in The-

orem 4.2 regarding the ratio of the sample sizes. This implies that the proposed test can be

consistent against general alternatives even when the two sample sizes are highly unbalanced

as m/n → 0orm/n →∞.

As a straightforward consequence of Theorem 3.1, we also show that the energy test, which

is our main competitor, is not minimax rate optimal in our context.

ROPOSITION 4.1 (Nonoptimality of the energy test). Recall the energy test φ

Energy

given in Theorem 3.1. Then there exists a pair of distributions that belongs to F (



m,n

) such

that the energy test becomes asymptotically powerless, that is,

lim

m,n→∞

inf

∈F (



m,n

)

(φ

Energy

= 1) ≤ α.

In the next section, we turn our attention to the asymptotic regime where the sample size

is ﬁxed and the dimension tends to inﬁnity and study the limiting behavior of the CvM test.

5. High dimension, low sample size analysis. The high dimension, low sample size

(HDLSS) regime has received increasing attention in recent years and has been frequently

employed to give statistical insights into high-dimensional two-sample testing (e.g., Biswas

and Ghosh (2014), Biswas, Mukhopadhyay and Ghosh (2014), Chakraborty and Chaudhuri

(2017), Mondal, Biswas and Ghosh (2015)). Focusing on this HDLSS regime, the goal of

this section is twofold: First, we provide sufﬁcient conditions under which the proposed test

is consistent in HDLSS situations (Section 5.1). Second, we show that U

CvM

has the same

asymptotic behavior as the high-dimensional mean test statistics proposed by Chen and Qin

(2010)andChakraborty and Chaudhuri (2017) under certain location models (Section 5.2).

Along with these mean test statistics, we further establish the equivalence among U

CvM

,the

energy statistic and the MMD statistic with the Gaussian kernel. The latter connection was

motivated by Ramdas et al. (2015) who showed that the energy statistic, the MMD statistic

and the mean test statistic by Chen and Qin (2010) are asymptotically equivalent in different

scenarios.

5.1. HDLSS consistency. Let us denote E(X) = μ

, E(Y ) = μ

, V(X) = 

and

V(Y ) = 

where 

and 

are positive deﬁnite matrices. Before presenting the main

results, we state the two assumptions.

(A1). V





∗

−Z

∗





= O(d), and V



∗

−Z

∗







∗

−Z

∗



= O(d),

where Z

∗

, Z

∗

, Z

∗

are independent and each Z

∗

follows either P

or P

(A2). d

−1

tr(

) → σ

−1

tr(

) → σ

−1

μ

−μ



→ δ

where 0 <

, σ

< ∞ and 0 ≤ δ

< ∞.

NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3431

Assumption (A1) implies that component variables are weakly dependent. Under the

distributional assumptions (including multivariate normal distributions) made in Bai and

Saranadasa (1996)andChen and Qin (2010), Assumption (A1) is satisﬁed when

(5.1)

(μ

−μ

)



(

+

)(μ

−μ

) = O(d) and



(

+

)



= O(d).

The details of this derivation can be found in Appendix D.1. Assumption (A2) is common

in the HDLSS literature (e.g., Hall, Marron and Neeman (2005)) and facilitates the analysis.

Under these two conditions, the following theorem establishes the HDLSS consistency of the

proposed test where we assume that the nominal level satisﬁes α>1/{(m + n)!/(m!n!)} for

m = n and α>2/{(m + n)!/(m!n!)} for m = n.

HEOREM 5.1 (HDLSS consistency). Suppose (A1) and (A2) hold. Assume that σ

=

or δ

> 0. Then the permutation test based on U

CvM

is consistent under the HDLSS

regime, that is, lim

d→∞

[φ

CvM

]=1.

5.2. HDLSS asymptotic equivalence of CvM-statistic and others.Next,wefocuson

mean difference alternatives with equal covariance matrices. There are many types of high-

dimensional mean inference procedures in the literature (see Hu and Bai (2016), for a recent

review). For example, Chen and Qin (2010) suggest a test statistic based on an unbiased

estimator of μ

−μ



. Speciﬁcally, their test statistic is given by

(m)

(n)

m,=



n,=



−Y

)



−Y

More recently, Chakraborty and Chaudhuri (2017) deﬁne a test statistic based on spatial ranks

WMW

(m)

(n)

m,=



n,=



−Y

)

X

−Y





−Y

)

X

−Y



They proved that U

and U

WMW

are asymptotically equivalent under a certain HDLSS

setting. Independently, the equivalence between U

, U

Energy

and the MMD statistic with

the Gaussian kernel was established by Ramdas et al. (2015) under different settings. Let us

denote the MMD statistic with the Gaussian kernel by

MMD

(m)

m,=



exp



−

2ς

X

−X





(n)

n,=



exp



−

2ς

Y

−Y





−



i=1



j=1

exp



−

2ς

X

−Y





where ς

is the bandwidth parameter. Here, we combine and further extend these results

by presenting sufﬁcient conditions under which U

CvM

, U

Energy

, U

MMD

, U

and U

WMW

are

asymptotically equivalent. To establish the result, we need two more assumptions:

(A3). V



∗

−Z

∗







∗

−Z

∗



= O(d) where

∗

, Z

∗

, Z

∗

, Z

∗

are independent and each Z

∗

follows either P

or P

(A4). 

= 

and μ

−μ



= O(

√

d).

3432 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

Assumption (A3) is required for studying U

and U

WMW

. Like Assumption (A1),As-

sumption (A3) is also satisﬁed under condition (5.1). Notice that U

and U

WMW

are only

sensitive to location parameters whereas U

CvM

, U

Energy

and U

MMD

are sensitive to both lo-

cation and scale parameters. This suggests that the equal covariance assumption in (A4) is

crucial for our result and cannot be dropped. The condition μ

− μ



= O(

√

d) is also

important for our analysis and was also considered in Chakraborty and Chaudhuri (2017).

Under the given assumptions, we make repeated use of Taylor expansions to establish the

equivalence among the test statistics stated as follows.

HEOREM 5.2 (HDLSS equivalence). Suppose (A1), (A2), (A3) and (A4) hold.

Let  be an arbitrary permutation of {1,...,N} and

= d

−1

tr(

). We denote

by U



CvM

, U



Energy

, U



MMD

, U



and U



WMW

, the CvM, Energy, MMD, CQ and WMW

test statistics, respectively, calculated based on



={Z

(1)

,...,Z

(m)

} and Y



(m+1)

,...,Z

(N)

}. Assume that the bandwidth parameter of the Gaussian kernel sat-

isﬁes ς

 d. Then under the HDLSS asymptotics, we have that

(5.2)

√



CvM

2π

√

3dσ





−1/2





Energy

√

2dσ





−1/2



√



WMW

√

dσ





−1/2



√



MMD

√

−dσ

/ς





−1/2



Note that the asymptotic equivalence established in (5.2) holds for any permutation. Lever-

aging this result, we show that the permutation critical values of the test statistics are asymp-

totically the same as well.

OROLLARY 5.1 (Permutation critical values). Consider the same assumptions made in

Theorem 5.2. Let c

α,CvM

, c

α,Eng

, c

α,MMD

, c

α,CQ

and c

α,WMW

be the 1 −α quantile of the per-

mutation distribution of 2π

√

3dσ

CvM

√

2σ

Energy

, ς

−dσ

/ς

MMD

√

d, U

√

and

√

dσ

WMW

, respectively. Then

α,CvM

= c

α,Eng



−1/2



= c

α,MMD



−1/2



= c

α,CQ



−1/2



= c

α,WMW



−1/2



From the previous results, we expect that the considered permutation tests have compara-

ble power in the limit as further illustrated by our simulation results in Section 8.Wealso

refer the reader to Appendix D.5 where we present an explicit expression for the limiting

power function of the asymptotic tests with extra assumptions. We would like to empha-

size, however, that when the moment assumption is violated, the power of these tests can

be entirely different. For instance, our simulation results in Section 8 demonstrate that the

CQ, energy and MMD tests perform poorly when X and Y have Cauchy distributions with

different location parameters. In contrast, the CvM and WMW tests maintain robust power

against the same Cauchy alternative, which highlights a beneﬁt of the current approach in

high dimensions.

NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3433

6. Connection to the generalized energy distance and MMD. Recall that the energy

distance is deﬁned with the Euclidean distance under the ﬁnite ﬁrst moment condition. By

considering a semimetric space (Z,ρ) of negative type, Sejdinovic et al. (2013) generalized

the energy distance by

= 2E



ρ(X

)



−E



ρ(X

)



−E



ρ(Y

)



They further established the equivalence between the generalized energy distance and the

MMD with a kernel induced by ρ(·, ·). Given a distance-induced kernel k(·, ·), the squared

MMD is given by

MMD

= E



k(X

)





k(Y

)



−2E



k(X

)



In this section, we show that the multivariate CvM-distance is a member of the generalized

energy distance by the use of the angular distance, and thus also a member of the MMD. Let

and M

be the support of X and Y , respectively, and let M = M

∪M

⊆ R

.Then

we deﬁne the angular distance as follows.

EFINITION 6.1 (Angular distance). Let Z

∗

be a random vector having mixture distribu-

tion (1/2)P

+(1/2)P

.Forz, z



∈ M, denote the scaled angle between z −Z

∗

and z



−Z

∗

Angle



z, z



∗



Ang



z − Z

∗



−Z

∗



The angular distance is deﬁned as the expected value of the scaled angle:

Angle



z, z





= E



Angle



z, z



∗



.(6.1)

As shown in Appendix D.6, ρ

Angle

is a metric of negative type deﬁned on M. With this key

property and the identity given in the next proposition, we may conclude that the multivari-

ate CvM-distance is a special case of the generalized energy distance based on the angular

distance.

ROPOSITION 6.1 (Another view of the CvM-distance). Let us consider the angular

distance deﬁned in (6.1). Then

= 2E



Angle

)



−E



Angle

)



−E



Angle

)



EMARK 6.1. The angular distance can be generalized by taking the expectation with

respect to a different measure. For instance, when the expectation is taken with respect to

Lebesgue measure, the generalized angular distance is proportional to the Euclidean distance

(see Appendix D.7). The main difference between the Euclidean distance and the proposed

angular distance is that the latter takes into account information from the underlying distri-

bution and is less sensitive to outliers. In this respect, the introduced angular distance can be

viewed as a robust alternative for the Euclidean distance.

EMARK 6.2. As one of the reviewers pointed out, there might be several ways to

enhance the power of the proposed test by modifying the multivariate CvM-distance. For

instance, by the characteristic property of W

, it can be seen that H

holds if and only

= T

and T

= T

where T

= E[ρ

Angle

)], T

= E[ρ

Angle

)] and

= E[ρ

Angle

)]. Motivated by this observation, another test statistic can be intro-

duced based on an estimate of (

−T

)

+(T

−T

)

(and other variants are possible,

see Appendix D.4). As demonstrated in Appendix E, the test based on this new statistic tends

to be more sensitive to scale differences than the CvM test.

3434 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

7. Other multivariate extensions via projection averaging. The projection-averaging

approach used for the multivariate CvM-statistic is general and can be applied to many other

univariate robust statistics. In this section and also Appendix D.8, we illustrate the utility

of the projection-averaging approach by considering several examples including Kendall’s

tau, Spearman’s rho and the sign covariance (Bergsma and Dassios (2014)). Let us begin

with one-sample and two-sample robust statistics. Given a pair of random variables (X, Y ),

denote the difference between two random variables by Z = X − Y . The univariate sign test

statistic is an estimate of

sign

:= P(Z > 0) − 1/2 and it is used to test whether

: P(Z > 0) = 1/2versusH

: P(Z > 0) = 1/2.

The projection-averaging technique extends

sign

to a multivariate case as follows.

ROPOSITION 7.1 (One-sample sign test statistic). For i .i.d. random vectors Z

, Z

from

a multivariate distribution P

where Z ∈ R

, the projection-averaging approach generalizes

sign



d−1







> 0



−



dλ(β) =

−

2π



Ang

)



.(7.1)

Given univariate two samples

={X

,...,X

} and Y

={Y

,...,Y

}, the Wilcoxon–

Mann–Whitney test is designed for testing whether

: P(X > Y ) = 1/2versusH

: P(X > Y ) = 1/2,

basedonanestimateof

WMW

:= P(X > Y ) − 1/2. The next proposition extends T

WMW

a multivariate case via projection averaging.

ROPOSITION 7.2 (Two-sample Wilcoxon–Mann–Whitney test statistic). Let X

i.i.d.

∼ P

and, independently, Y

i.i.d.

∼ P

where X

∈ R

. The projection-averaging

approach generalizes

WMW



d−1







>β





−



dλ(β) =

−

2π



Ang

−Y

)



.(7.2)

EMARK 7.1. The ﬁrst-order Taylor approximation of the inverse cosine function shows

that the representations given on the right-hand side of (7.1)and(7.2) are related to the spa-

tial sign-statistics introduced by Wang, Peng and Li (2015)andChakraborty and Chaud-

huri (2017), respectively. In fact, when U -statistics are used to estimate (7.1)and(7.2), the

projection-averaging statistics and the spatial sign-statistics are asymptotically equivalent un-

der some regularity conditions (see Appendix D.3 for details).

The same technique can be further applied to some robust statistics for independence test-

ing. To test for independence between two random variables, Kendall’s tau statistic is given

as an estimate of τ := 4P(X

) − 1. We present a multivariate extension of τ

as follows.

HEOREM 7.1 (Kendall’s tau). Fo r i .i.d. pairs of random vectors (X

),...,(X

)

from a joint distribution P

where X ∈ R

and Y ∈ R

, the multivariate extension of τ via

projection averaging is given by



p−1



q−1







−X

)<0,β



−Y

)<0



−1



dλ(α)dλ(β)

= E



2 −

Ang(X

−X

)





2 −

Ang(Y

−Y

)



−1.

NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3435

Recently, Bergsma and Dassios (2014) introduced a modiﬁcation of Kendall’s tau, which

is zero if and only if random variables are independent under some mild conditions. Let us

denote the univariate Bergsma–Dassios sign covariance by

∗

= E



sign

) · a

sign

)



,(7.3)

with a

sign

) = sign(|z

− z

|+|z

− z

|−|z

− z

|−|z

− z

|). Motivated by

the projection-averaging approach, we propose the multivariate τ

∗

as follows.

EFINITION 7.1 (Multivariate τ

∗

). Suppose (X

),...,(X

) are i.i.d. random

vectors from a joint distribution P

where X ∈ R

and Y ∈ R

. We deﬁne the multivariate

∗

p,q



p−1



q−1



sign





,α



,α



,α





×a

sign





,β



,β



,β





dλ(α) dλ(β).

Since the kernel of τ

∗

is sign-invariant, that is, a

sign

) = a

sign

(−z

, −z

−z

), it is easy to see that τ

∗

p,q

becomes the univariate τ

∗

when p = q = 1. Also note that

since X and Y are independent if and only if α



X and β



Y are independent for all α ∈ S

p−1

and β ∈ S

q−1

, the characteristic property of τ

∗

p,q

follows by that of the univariate τ

∗

Next, we present a closed-form expression for τ

∗

p,q

. For nonzero U

∈ R

,letus

deﬁne g

) and h

) by

) =

−

4π



Ang

) + Ang(U

)



and

)

= g

−Z

) + g

−Z

)

−Z

) + g

−Z

We note that, in contrast to other applications, Lemma 2.2 is not enough to have an expres-

sion for τ

∗

p,q

without involving integrations over the unit sphere. To this end, we generalize

Lemma 2.2 with three indicator functions (see Lemma B.8) and present an alternative expres-

sion for τ

∗

p,q

as follows.

HEOREM 7.2 (Closed-form expression for τ

∗

p,q

). For i.i.d. random vectors (X

...,(X

) from a joint distribution P

where X ∈ R

and Y ∈ R

, τ

∗

p,q

can be written as

∗

p,q

= E



) · h

)





) · h

)



−2E



) · h

)



Theorem 7.2 leads to a straightforward empirical estimate of τ

∗

p,q

based on a U -statistic.

This is also true for the other multivariate generalizations introduced in this section and the

Supplementary Material (Appendix D.8). Using these estimates, some theoretical and empiri-

cal properties of the proposed measures can be further investigated. These topics are reserved

for future work.

3436 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

8. Simulations. In this section, we report numerical results to support the argument in

Section 5 as well as to compare the performance of the CvM test with other competing non-

parametric tests against heavy-tailed alternatives. Along with the energy, MMD, NN, FR and

BG tests described before, we consider the cross-match test (Rosenbaum (2005)), the multi-

variate run test (Biswas, Mukhopadhyay and Ghosh (2014)), the modiﬁed k-NN test (Mondal,

Biswas and Ghosh (2015)) and the ball divergence test (Pan et al. (2018)) for comparison. We

refer to them as the CM test, run test, MBG test and ball test, respectively. In our simulations,

we used the Gaussian kernel with the median heuristic (Gretton et al. (2012)) for the MMD

test and we set the number of nearest neighbors as k = 3 for both NN test and MBG test.

Since ﬁnding the shortest Hamiltonian path for the run test is NP-complete, we employed

Kruskal’s algorithm (Kruskal (1956)) as suggested by Biswas, Mukhopadhyay and Ghosh

(2014).

Throughout our experiments, the signiﬁcance level was set at 0.05 and the permutation

procedure was used to determine the p-value of each test with 200 permutations as in Re-

mark 2.2. The simulations were repeated 500 times to approximate the power of different

tests. We set the sample size and the dimension by m, n = 20 and d = 200 for the balanced

cases and by m = 35, n = 5andd = 200 for the imbalanced cases.

First, we consider several examples where the powers of the ﬁve tests (CvM, energy,

MMD, CQ and WMW tests) in Section 5 are approximately equivalent to each other. Specif-

ically, we use multivariate normal distributions with different means

(0)

= (0,...,0)



,μ

(1)

= (0.15,...,0.15)



and

(2)

√

0.045( 1,...,1

  

d/2elements

, 0,...,0

  

d/2elements

)



and covariance matrices:

1. Identity matrix (denoted by I )whereσ

i,i

= 1andσ

i,j

= 0fori = j .

2. Banded matrix (denoted by 

Band

)whereσ

i,i

= 1, σ

i,j

= 0.6for|i −j |=1, σ

i,j

= 0.3

for |i −j |=2andσ

i,j

= 0otherwise.

3. Autocorrelation matrix (denoted by 

Auto

)whereσ

i,i

= 1andσ

i,j

= 0.2

|i−j|

when

i = j .

4. Block diagonal matrix (denoted by 

Block

)wherethe5×5 main diagonal blocks A are

deﬁned by a

i,i

= 1anda

i,j

= 0.2wheni = j , and the off-diagonal blocks are zeros.

Then we generate random samples from X ∼ N(μ

(0)

,) and either Y ∼ N(μ

(1)

,) or

Y ∼ N(μ

(2)

,). The results are summarized in Table 1. As can be seen from the table,

the empirical powers of the considered tests are very close under the given setting, which

supports our theoretical results in Section 5. We also observe that the other nonparametric

tests, not considered in Section 5, are signiﬁcantly less powerful than the proposed test in all

normal location alternatives.

In our second experiment, we consider several examples where the moment conditions

are not satisﬁed. We focus on random samples generated from multivariate Cauchy distri-

butions. Let Cauchy(γ , s) refer to the univariate Cauchy distribution where γ , s are the lo-

cation parameter and the scale parameter, respectively. Let X = (X

(1)

,...,X

(d)

) and Y =

(1)

,...,Y

(d)

) be random vectors where X

(i)

i.i.d.

∼ Cauchy(0, 1) and Y

(i)

i.i.d.

∼ Cauchy(γ, s)

for i = 1,...,d. We ﬁrst consider location differences where γ is not zero but the scale pa-

rameters are identical, that is, s = 1. Similarly, we consider scale differences where the scale

parameter s changes, but the location parameters are identical, that is, γ = 0.

From the results presented in Table 2 and Table 3, it is seen that, unlike the multivari-

ate normal cases, there are signiﬁcant differences between power performance among CvM,

NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3437

TABLE 1

Empirical power of the considered tests against the normal location models at α = 0.05



Band



Block



Auto

m = 20, n = 20 μ

(1)

(2)

(1)

(2)

(1)

(2)

(1)

(2)

CvM 0.662 0.646 0.418 0.406 0.572 0.584 0.452 0.442

Energy 0.656 0.650 0.420 0.408 0.576 0.584 0.452 0.444

MMD 0.658 0.638 0.412 0.398 0.568 0.570 0.458 0.444

CQ 0.656 0.650 0.416 0.412 0.578 0.580 0.454 0.448

WMW 0.668 0.646 0.420 0.402 0.568 0.580 0.458 0.444

NN 0.288 0.288 0.164 0.154 0.242 0.238 0.176 0.174

FR 0.168 0.170 0.090 0.084 0.158 0.116 0.112 0.088

MBG 0.050 0.050 0.050 0.052 0.048 0.044 0.060 0.046

Ball 0.240 0.254 0.186 0.198 0.262 0.250 0.216 0.226

CM 0.042 0.054 0.028 0.040 0.052 0.050 0.038 0.034

BG 0.070 0.060 0.074 0.074 0.074 0.078 0.084 0.078

Run 0.160 0.153 0.101 0.105 0.146 0.128 0.110 0.102

energy, MMD, CQ and WMW tests. In particular, the tests based on the energy, MMD and

CQ statistics have relatively low power against the heavy-tail location alternatives, whereas

the tests based on the CvM and WMW statistics show better performance than the others.

Turning to the scale problems, it can be seen that the CQ and WMW tests are not sensitive

to detect scale differences, which makes sense because they are speciﬁcally designed for lo-

cation problems. On the other hand, the CvM, energy and MMD tests perform reasonably

well in these alternatives. Among the omnibus nonparametric tests, the MMD, energy and

ball tests have competitive power against the scale differences, but not against the location

differences in general. The MBG test is only powerful against the scale differences where the

sample sizes are balanced. The CM and run tests are uniformly outperformed by the CvM

test under all scenarios. The NN and FR tests perform strongly against the location alterna-

tives especially for the balanced case, but not against the scale alternatives. When the sample

TABLE 2

Empirical power of the considered tests against multivariate Cauchy distributions with m = n = 20 at α = 0.05

where γ , s represent the location and scale parameter, respectively. The three highest power estimates in each

column are highlighted in boldface

Location Scale

m = 20, n = 20 γ = 2 γ = 3 γ = 4 γ = 5 s = 2 s = 3 s = 4 s = 5

CvM 0.124 0.252 0.596 0.842 0.560 0.926 0.988 1.000

Energy 0.060 0.066 0.102 0.134 0.316 0.602 0.766 0.866

MMD 0.056 0.064 0.110 0.162 0.448 0.772 0.890 0.970

CQ 0.138 0.268 0.360 0.456 0.046 0.070 0.042 0.068

WMW 0.324 0.698 0.912 0.988 0.052 0.064 0.062 0.056

NN 0.288 0.662 0.884 0.976 0.214 0.194 0.256 0.224

FR 0.178 0.462 0.706 0.888 0.028 0.034 0.048 0.036

MBG 0.060 0.044 0.050 0.074 0.564 0.904 0.964 0.992

Ball 0.064 0.064 0.076 0.098 0.606 0.936 0.994 1.000

CM 0.030 0.078 0.128 0.226 0.056 0.170 0.334 0.490

BG 0.048 0.038 0.048 0.040 0.238 0.394 0.560 0.632

Run 0.059 0.129 0.274 0.422 0.220 0.506 0.767 0.864

3438 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

ABLE 3

Empirical power of the considered tests against multivariate Cauchy distributions with m = 35 and n = 5 at

α = 0.05 where γ , s represent the location and scale parameter, respectively. The three highest power estimates

in each column are highlighted in boldface

Location Scale

m = 35, n = 5 γ = 5 γ = 6 γ = 7 γ = 8 s = 3 s = 4 s = 5 s = 6

CvM 0.340 0.498 0.652 0.758 0.570 0.806 0.928 0.952

Energy 0.110 0.146 0.212 0.262 0.436 0.632 0.794 0.858

MMD 0.108 0.148 0.192 0.240 0.552 0.808 0.926 0.968

CQ 0.284 0.380 0.454 0.544 0.178 0.210 0.262 0.290

WMW 0.796 0.890 0.942 0.960 0.110 0.126 0.134 0.148

NN 0.144 0.294 0.376 0.558 0.118 0.150 0.154 0.182

FR 0.226 0.360 0.464 0.588 0.078 0.092 0.104 0.112

MBG 0.010 0.000 0.008 0.000 0.092 0.130 0.176 0.214

Ball 0.072 0.088 0.098 0.122 0.238 0.406 0.594 0.762

CM 0.082 0.176 0.190 0.262 0.030 0.080 0.092 0.126

BG 0.058 0.052 0.058 0.052 0.320 0.386 0.506 0.514

Run 0.088 0.150 0.198 0.228 0.106 0.174 0.248 0.326

sizes are unbalanced, the performance of the NN and FR tests are degraded a little bit, which

can be explained by Chen, Dou and Qiao (2013)andChen, Chen and Su (2018). The CvM

test, on the other hand, performs consistently well against the heavy-tail location and scale

alternatives and its performance appears immune to the sample proportion.

In summary, the proposed test has almost identical power as the high-dimensional mean

tests against the light-tail location alternatives, whereas it outperforms many popular non-

parametric competitors under the heavy-tail location and scale alternatives. More simulation

results in both high and low dimensions can be found in Appendix E of the Supplementary

Material.

9. Concluding remarks. In this work, we extended the univariate Cramér–von Mises

statistic for two-sample testing to the multivariate case using projection averaging and

demonstrated its robustness, minimax rate optimality and high-dimensional power proper-

ties. We applied the same projection technique to other robust statistics and presented their

multivariate extensions. We also introduced the angular distance that is closely connected to

the Euclidean distance (Remark 6.1) but is more robust to outliers by incorporating infor-

mation from the underlying distribution. Given that the use of distances is of fundamental

importance in many statistical applications (including clustering, classiﬁcation and regres-

sion), we hope that this work will stimulate further study of the angular distance applied to

other statistical problems serving as a robust alternative for the Euclidean distance.

Acknowledgments. The authors would like to thank the Associate Editor and two refer-

ees for their valuable comments which helped to improve the manuscript.

SUPPLEMENTARY MATERIAL

Supplement to “Robust multivariate nonparametric tests via projection averaging”

(DOI: 10.1214/19-AOS1936SUPP; .pdf). This supplemental ﬁle includes the technical proofs

omitted in the main text, and some additional results.

NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3439

REFERENCES

ANDERSON, T. W. (1962). On the distribution of the two-sample Cramér–von Mises criterion. Ann. Math. Stat.

33 1148–1159. MR0145607 https://doi.org/10.1214/aoms/1177704477

NDERSON, T. W. (2003). An Introduction to Multivariate Statistical Analysis,3rded.Wiley Series in Probability

and Statistics. Wiley-Interscience, Hoboken, NJ. MR1990662

NDERSON,N.H.,HALL,P.andTITTERINGTON, D. M. (1994). Two-sample test statistics for measuring

discrepancies between two multivariate probability density functions using kernel-based density estimates.

J. Multivariate Anal. 50 41–54. MR1292607 https://doi.org/10.1006/jmva.1994.1033

AI,Z.andSARANADASA, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist.

Sinica 6 311–329. MR1399305

ARINGHAUS,L.andFRANZ, C. (2004). On a new multivariate two-sample test. J. Multivariate Anal. 88 190–

206. MR2021870 https://doi.org/10.1016/S0047-259X(03)00079-4

ARINGHAUS,L.andHENZE, N. (2017). Cramér–von Mises distance: Probabilistic interpretation, conﬁ-

dence intervals, and neighbourhood-of-model validation. J. Nonparametr. Stat. 29 167–188. MR3635009

https://doi.org/10.1080/10485252.2017.1285029

ERA,A.K.,GHOSH,A.andXIAO, Z. (2013). A smooth test for the equality of distributions. Econometric

Theory 29 419–446. MR3042761 https://doi.org/10.1017/S0266466612000370

ERGSMA,W.andDASSIOS, A. (2014). A consistent test of independence based on a sign covariance related to

Kendall’s tau. Bernoulli 20 1006–1028. MR3178526 https://doi.org/10.3150/13-BEJ514

HAT, B. V. (1995). Theory of U-statistics and its applications. Ph.D. thesis, Karnatak Univ.

HATTACHARYA, B. B. (2018). Two-sample tests based on geometric graphs: Asymptotic distribution and de-

tection thresholds. Preprint. Available at arXiv:1512.00384v3.

HATTACHARYA, B. B. (2019). A general asymptotic framework for distribution-free graph-based two-sample

tests. J. R. Stat. Soc. Ser. B. Stat. Methodol. 81 575–602. MR3961499

ISWAS,M.andGHOSH, A. K. (2014). A nonparametric two-sample test applicable to high dimensional data.

J. Multivariate Anal. 123 160–171. MR3130427 https://doi.org/10.1016/j.jmva.2013.09.004

ISWAS,M.,MUKHOPADHYAY,M.andGHOSH, A. K. (2014). A distribution-free two-sample run test applica-

ble to high-dimensional data. Biometrika 101 913–926. MR3286925 https://doi.org/10.1093/biomet/asu045

HAKRABORTY,A.andCHAUDHURI, P. (2017). Tests for high-dimensional data based on means, spatial signs

and spatial ranks. Ann. Statist. 45 771–799. MR3650400 https://doi.org/10.1214/16-AOS1467

HEN,H.,CHEN,X.andSU, Y. (2018). A weighted edge-count two-sample test for multivariate and object data.

J. Amer. Statist. Assoc. 113 1146–1155. MR3862346 https://doi.org/10.1080/01621459.2017.1307757

HEN,L.,DOU,W.W.andQIAO, Z. (2013). Ensemble subsampling for imbalanced multivariate two-sample

tests. J. Amer. Statist. Assoc. 108 1308–1323. MR3174710 https://doi.org/10.1080/01621459.2013.800763

HEN,H.andFRIEDMAN, J. H. (2017). A new graph-based two-sample test for multivariate and object data.

J. Amer. Statist. Assoc. 112 397–409. MR3646580 https://doi.org/10.1080/01621459.2016.1147356

HEN,S.X.andQIN, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set

testing. Ann. Statist. 38 808–835. MR2604697 https://doi.org/10.1214/09-AOS716

HIKKAGOUDAR,M.S.andBHAT, B. V. (2014). Limiting distribution of two-sample degenerate U-statistic

under contiguous alternatives and applications. J. Appl. Statist. Sci. 22 127–139. MR3616873

HUNG,E.andROMANO, J. P. (2013). Exact and asymptotically robust permutation tests. Ann. Statist. 41 484–

507. MR3099111 https://doi.org/10.1214/13-AOS1090

RAMÉR, H. (1928). On the composition of elementary errors. Skand. Aktuarietidskr. 11 141–180.

UI, H. (2002). Average projection type weighted Cramér–von Mises statistics for testing some distributions. Sci.

China Ser. A 45 562–577. MR1911172

SCANCIANO, J. C. (2006). A consistent diagnostic test for regression models using projections. Econometric

Theory 22 1030–1051. MR2328527 https://doi.org/10.1017/S0266466606060506

RIEDMAN,J.H.andRAFSKY, L. C. (1979). Multivariate generalizations of the Wald–Wolfowitz and Smirnov

two-sample tests. Ann. Statist. 7 697–717. MR0532236

RETTON,A.,BORGWARDT,K.M.,RASCH,M.J.,SCHÖLKOPF,B.andSMOLA, A. (2012). A kernel two-

sample test. J. Mach. Learn. Res. 13 723–773. MR2913716

ALL,P.,MARRON,J.S.andNEEMAN, A. (2005). Geometric representation of high dimension, low sample

size data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 427–444. MR2155347 https://doi.org/10.1111/j.1467-9868.

2005.00510.x

ARCHAOUI,Z.,BACH,F.,CAPPE,O.andMOULINES, E. (2013). Kernel-based methods for hypothesis testing:

A uniﬁed view. IEEE Signal Process. Mag. 30 87–97.

ENZE, N. (1988). A multivariate two-sample test based on the number of nearest neighbor type coincidences.

Ann. Statist. 16 772–783. MR0947577 https://doi.org/10.1214/aos/1176350835

3440 I. KIM, S. BALAKRISHNAN AND L. WASSERMAN

ETTMANSPERGER,T.P.,MÖTTÖNEN,J.andOJA, H. (1998). Afﬁne invariant multivariate rank tests for sev-

eral samples. Statist. Sinica 8 785–800. MR1651508

OEFFDING, W. (1952). The large-sample power of tests based on permutations of observations. Ann. Math. Stat.

23 169–192. MR0057521 https://doi.org/10.1214/aoms/1177729436

U,J.andBAI, Z. (2016). A review of 20 years of naive tests of signiﬁcance for high-dimensional mean

vectors and covariance matrices. Sci. China Math. 59 2281–2300. MR3578957 https://doi.org/10.1007/

s11425-016-0131-0

ANAMORI,T.,SUZUKI,T.andSUGIYAMA, M. (2012). f -divergence estimation and two-sample homogene-

ity test under semiparametric density-ratio models. IEEE Trans. Inform. Theory 58 708–720. MR2917977

https://doi.org/10.1109/TIT.2011.2163380

IM,I.,BALAKRISHNAN,S.andWASSERMAN, L. (2020). Supplement to “Robust multivariate nonparametric

tests via projection averaging.” https://doi.org/10.1214/19-AOS1936SUPP.

RUSKAL,J.B.JR. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem.

Proc. Amer. Math. Soc. 7 48–50. MR0078686 https://doi.org/10.2307/2033241

EE, A. J. (1990). U-Statistics: Theory and Practice. Statistics: Textbooks and Monographs 110.Dekker,New

York. MR1075417

EHMANN,E.L.andROMANO, J. P. (2005). Testing Statistical Hypotheses,3rded.Springer Texts in Statistics.

Springer, New York. MR2135927

I,J.andCHEN, S. X. (2012). Two sample tests for high-dimensional covariance matrices. Ann. Statist. 40

908–940. MR2985938 https://doi.org/10.1214/12-AOS993

IU, R. Y. (2006). Data Depth: Robust Multivariate Analysis, Computational Geometry, and Applications 72.

Amer. Math. Soc., Providence.

IU,Z.andMODARRES, R. (2011). A triangle test for equality of distribution functions in high dimensions.

J. Nonparametr. Stat. 23 605–615. MR2836279 https://doi.org/10.1080/10485252.2010.485644

OPEZ-PAZ,D.andOQUAB, M. (2016). Revisiting classiﬁer two-sample tests. Preprint. Available at

arXiv:1610.06545.

ONDAL,P.K.,BISWAS,M.andGHOSH, A. K. (2015). On high dimensional two-sample tests based on nearest

neighbors. J. Multivariate Anal. 141 168–178. MR3390065 https://doi.org/10.1016/j.jmva.2015.07.002

UKHOPADHYAY,S.andWANG, K. (2018). A nonparametric approach to high-dimensional K-sample compar-

ison problem. Preprint. Available at arXiv:1810.01724.

JA, H. (2010). Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs and Ranks.

Lecture Notes in Statistics 199. Springer, New York. MR2598854 https://doi.org/10.1007/978-1-4419-0468-3

JA,H.andRANDLES, R. H. (2004). Multivariate nonparametric tests. Statist. Sci. 19 598–605. MR2185581

https://doi.org/10.1214/088342304000000558

AN,W.,TIAN,Y.,WANG,X.andZHANG, H. (2018). Ball divergence: Nonparametric two sample test. Ann.

Statist. 46 1109–1137. MR3797998 https://doi.org/10.1214/17-AOS1579

ESARIN, F. (2001). Multivariate Permutation Tests: With Applications in Biostatistics. Wiley, Chichester.

MR1855501

AMDAS,A.,REDDI,S.J.,POCZOS,B.,SINGH,A.andWASSERMAN, L. (2015). Adaptivity and computation-

statistics tradeoffs for kernel and distance based high dimensional two sample testing. Preprint. Available at

arXiv:1508.00655.

OSENBAUM, P. R. (2005). An exact distribution-free test comparing two multivariate distributions based on ad-

jacency. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 515–530. MR2168202 https://doi.org/10.1111/j.1467-9868.

2005.00513.x

CHILLING, M. F. (1986). Multivariate two-sample tests based on nearest neighbors. J. Amer. Statist. Assoc. 81

799–806. MR0860514

EJDINOVIC,D.,SRIPERUMBUDUR,B.,GRETTON,A.andFUKUMIZU, K. (2013). Equivalence of

distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41 2263–2291. MR3127866

https://doi.org/10.1214/13-AOS1140

ZÉKELY,G.J.andRIZZO, M. L. (2004). Testing for equal distributions in high dimension. Interstate 5.

ZÉKELY,G.J.andRIZZO, M. L. (2013). Energy statistics: A class of statistics based on distances. J. Statist.

Plann. Inference 143 1249–1272. MR3055745 https://doi.org/10.1016/j.jspi.2013.03.018

HAS, O. (2010). Comparing Distributions. Springer Series in Statistics. Springer, New York. MR2547894

https://doi.org/10.1007/b99044

ALD,A.andWOLFOWITZ, J. (1940). On a test whether two samples are from the same population. Ann. Math.

Stat. 11 147–162. MR0002083 https://doi.org/10.1214/aoms/1177731909

ANG,L.,PENG,B.andLI, R. (2015). A high-dimensional nonparametric multivariate test for mean vector.

J. Amer. Statist. Assoc. 110 1658–1669. MR3449062 https://doi.org/10.1080/01621459.2014.988215

HOU,W.-X.,ZHENG,C.andZHANG, Z. (2017). Two-sample smooth tests for the equality of distributions.

Bernoulli 23 951–989. MR3606756 https://doi.org/10.3150/15-BEJ766

NONPARAMETRIC TESTS VIA PROJECTION AVERAGING 3441

ZHU,L.-X.,FANG,K.-T.andBHATTI, M. I. (1997). On estimated projection pursuit-type Crámer–von Mises

statistics. J. Multivariate Anal. 63 1–14. MR1491563 https://doi.org/10.1006/jmva.1997.1673

HU,L.,XU,K.,LI,R.andZHONG, W. (2017). Projection correlation between two random vectors. Biometrika

104 829–843. MR3737307 https://doi.org/10.1093/biomet/asx043