Multidimensional Chebyshev's inequality

In probability theory, the multidimensional Chebyshev's inequality is a generalization of Chebyshev's inequality, which puts a bound on the probability of the event that a random variable differs from its expected value by more than a specified amount.

Let X {\displaystyle X} be an N {\displaystyle N} -dimensional random vector with expected value μ = E [ X ] {\displaystyle \mu =\operatorname {E} [X]} and covariance matrix

V = E [ ( X μ ) ( X μ ) T ] . {\displaystyle V=\operatorname {E} [(X-\mu )(X-\mu )^{T}].\,}

If V {\displaystyle V} is a positive-definite matrix, for any real number t > 0 {\displaystyle t>0} :

Pr ( ( X μ ) T V 1 ( X μ ) > t ) N t 2 {\displaystyle \Pr \left({\sqrt {(X-\mu )^{T}V^{-1}(X-\mu )}}>t\right)\leq {\frac {N}{t^{2}}}}

Proof

Since V {\displaystyle V} is positive-definite, so is V 1 {\displaystyle V^{-1}} . Define the random variable

y = ( X μ ) T V 1 ( X μ ) . {\displaystyle y=(X-\mu )^{T}V^{-1}(X-\mu ).}

Since y {\displaystyle y} is positive, Markov's inequality holds:

Pr ( ( X μ ) T V 1 ( X μ ) > t ) = Pr ( y > t ) = Pr ( y > t 2 ) E [ y ] t 2 . {\displaystyle \Pr \left({\sqrt {(X-\mu )^{T}V^{-1}(X-\mu )}}>t\right)=\Pr({\sqrt {y}}>t)=\Pr(y>t^{2})\leq {\frac {\operatorname {E} [y]}{t^{2}}}.}

Finally,

E [ y ] = E [ ( X μ ) T V 1 ( X μ ) ] = E [ trace ( V 1 ( X μ ) ( X μ ) T ) ] = trace ( V 1 V ) = N . {\displaystyle {\begin{aligned}\operatorname {E} [y]&=\operatorname {E} [(X-\mu )^{T}V^{-1}(X-\mu )]\\[6pt]&=\operatorname {E} [\operatorname {trace} (V^{-1}(X-\mu )(X-\mu )^{T})]\\[6pt]&=\operatorname {trace} (V^{-1}V)=N\end{aligned}}.}

Infinite dimensions

There is a straightforward extension of the vector version of Chebyshev's inequality to infinite dimensional settings. Let X be a random variable which takes values in a Fréchet space X {\displaystyle {\mathcal {X}}} (equipped with seminorms || ⋅ ||α). This includes most common settings of vector-valued random variables, e.g., when X {\displaystyle {\mathcal {X}}} is a Banach space (equipped with a single norm), a Hilbert space, or the finite-dimensional setting as described above.

Suppose that X is of "strong order two", meaning that

E ( X α 2 ) < {\displaystyle \operatorname {E} \left(\|X\|_{\alpha }^{2}\right)<\infty }

for every seminorm || ⋅ ||α. This is a generalization of the requirement that X have finite variance, and is necessary for this strong form of Chebyshev's inequality in infinite dimensions. The terminology "strong order two" is due to Vakhania.[1]

Let μ X {\displaystyle \mu \in {\mathcal {X}}} be the Pettis integral of X (i.e., the vector generalization of the mean), and let

σ a := E X μ α 2 {\displaystyle \sigma _{a}:={\sqrt {\operatorname {E} \|X-\mu \|_{\alpha }^{2}}}}

be the standard deviation with respect to the seminorm || ⋅ ||α. In this setting we can state the following:

General version of Chebyshev's inequality. k > 0 : Pr ( X μ α k σ α ) 1 k 2 . {\displaystyle \forall k>0:\quad \Pr \left(\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }\right)\leq {\frac {1}{k^{2}}}.}

Proof. The proof is straightforward, and essentially the same as the finitary version. If σα = 0, then X is constant (and equal to μ) almost surely, so the inequality is trivial.

If

X μ α k σ α 2 {\displaystyle \|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }^{2}}

then ||Xμ||α > 0, so we may safely divide by ||Xμ||α. The crucial trick in Chebyshev's inequality is to recognize that 1 = X μ α 2 X μ α 2 {\displaystyle 1={\tfrac {\|X-\mu \|_{\alpha }^{2}}{\|X-\mu \|_{\alpha }^{2}}}} .

The following calculations complete the proof:

Pr ( X μ α k σ α ) = Ω 1 X μ α k σ α d Pr = Ω ( X μ α 2 X μ α 2 ) 1 X μ α k σ α d Pr Ω ( X μ α 2 ( k σ α ) 2 ) 1 X μ α k σ α d Pr 1 k 2 σ α 2 Ω X μ α 2 d Pr 1 X μ α k σ α 1 = 1 k 2 σ α 2 ( E X μ α 2 ) = 1 k 2 σ α 2 ( σ α 2 ) = 1 k 2 {\displaystyle {\begin{aligned}\Pr \left(\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }\right)&=\int _{\Omega }\mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\,\mathrm {d} \Pr \\&=\int _{\Omega }\left({\frac {\|X-\mu \|_{\alpha }^{2}}{\|X-\mu \|_{\alpha }^{2}}}\right)\cdot \mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\,\mathrm {d} \Pr \\[6pt]&\leq \int _{\Omega }\left({\frac {\|X-\mu \|_{\alpha }^{2}}{(k\sigma _{\alpha })^{2}}}\right)\cdot \mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\,\mathrm {d} \Pr \\[6pt]&\leq {\frac {1}{k^{2}\sigma _{\alpha }^{2}}}\int _{\Omega }\|X-\mu \|_{\alpha }^{2}\,\mathrm {d} \Pr &&\mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\leq 1\\[6pt]&={\frac {1}{k^{2}\sigma _{\alpha }^{2}}}\left(\operatorname {E} \|X-\mu \|_{\alpha }^{2}\right)\\[6pt]&={\frac {1}{k^{2}\sigma _{\alpha }^{2}}}\left(\sigma _{\alpha }^{2}\right)\\[6pt]&={\frac {1}{k^{2}}}\end{aligned}}}

References

  1. ^ Vakhania, Nikolai Nikolaevich. Probability distributions on linear spaces. New York: North Holland, 1981.