Chapman–Robbins bound

In statistics, the Chapman–Robbins bound or Hammersley–Chapman–Robbins bound is a lower bound on the variance of estimators of a deterministic parameter. It is a generalization of the Cramér–Rao bound; compared to the Cramér–Rao bound, it is both tighter and applicable to a wider range of problems. However, it is usually more difficult to compute.

The bound was independently discovered by John Hammersley in 1950,^[1] and by Douglas Chapman and Herbert Robbins in 1951.^[2]

Statement

Let $\Theta$ be the set of parameters for a family of probability distributions $\{\mu _{\theta }:\theta \in \Theta \}$ on $\Omega$ .

For any two $\theta ,\theta '\in \Theta$ , let $\chi ^{2}(\mu _{\theta '};\mu _{\theta })$ be the $\chi ^{2}$ -divergence from $\mu _{\theta }$ to $\mu _{\theta '}$ . Then:

Theorem — Given any scalar random variable ${\hat {g}}:\Omega \to \mathbb {R}$ , and any two $\theta ,\theta '\in \Theta$ , we have $\operatorname {Var} _{\theta }[{\hat {g}}]\geq \sup _{\theta '\neq \theta \in \Theta }{\frac {(E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])^{2}}{\chi ^{2}(\mu _{\theta '};\mu _{\theta })}}$ .

A generalization to the multivariable case is:^[3]

Theorem — Given any multivariate random variable ${\hat {g}}:\Omega \to \mathbb {R} ^{m}$ , and any $\theta ,\theta '\in \Theta$ , $\chi ^{2}(\mu _{\theta '};\mu _{\theta })\geq (E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])^{T}\operatorname {Cov} _{\theta }[{\hat {g}}]^{-1}(E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])$

Proof

By the variational representation of chi-squared divergence:^[3]

\chi ^{2}(P;Q)=\sup _{g}{\frac {(E_{P}[g]-E_{Q}[g])^{2}}{\operatorname {Var} _{Q}[g]}}

Plug in

g={\hat {g}},P=\mu _{\theta '},Q=\mu _{\theta }

, to obtain:

\chi ^{2}(\mu _{\theta '};\mu _{\theta })\geq {\frac {(E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])^{2}}{\operatorname {Var} _{\theta }[{\hat {g}}]}}

Switch the denominator and the left side and take supremum over

\theta '

to obtain the single-variate case. For the multivariate case, we define

{\textstyle h=\sum _{i=1}^{m}v_{i}{\hat {g}}_{i}}

for any

v\neq 0\in \mathbb {R} ^{m}

. Then plug in

g=h

in the variational representation to obtain:

\chi ^{2}(\mu _{\theta '};\mu _{\theta })\geq {\frac {(E_{\theta '}[h]-E_{\theta }[h])^{2}}{\operatorname {Var} _{\theta }[h]}}={\frac {\langle v,E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}]\rangle ^{2}}{v^{T}\operatorname {Cov} _{\theta }[{\hat {g}}]v}}

Take supremum over

v\neq 0\in \mathbb {R} ^{m}

, using the linear algebra fact that

\sup _{v\neq 0}{\frac {v^{T}ww^{T}v}{v^{T}Mv}}=w^{T}M^{-1}w

, we obtain the multivariate case.

Relation to Cramér–Rao bound

Usually, $\Omega ={\mathcal {X}}^{n}$ is the sample space of $n$ independent draws of a ${\mathcal {X}}$ -valued random variable $X$ with distribution $\lambda _{\theta }$ from a by $\theta \in \Theta \subseteq \mathbb {R} ^{m}$ parameterized family of probability distributions, $\mu _{\theta }=\lambda _{\theta }^{\otimes n}$ is its $n$ -fold product measure, and ${\hat {g}}:{\mathcal {X}}^{n}\to \Theta$ is an estimator of $\theta$ . Then, for $m=1$ , the expression inside the supremum in the Chapman–Robbins bound converges to the Cramér–Rao bound of ${\hat {g}}$ when $\theta '\to \theta$ , assuming the regularity conditions of the Cramér–Rao bound hold. This implies that, when both bounds exist, the Chapman–Robbins version is always at least as tight as the Cramér–Rao bound; in many cases, it is substantially tighter.

The Chapman–Robbins bound also holds under much weaker regularity conditions. For example, no assumption is made regarding differentiability of the probability density function p(x; θ) of $\lambda _{\theta }$ . When p(x; θ) is non-differentiable, the Fisher information is not defined, and hence the Cramér–Rao bound does not exist.

References

^ Hammersley, J. M. (1950), "On estimating restricted parameters", Journal of the Royal Statistical Society, Series B, 12 (2): 192–240, JSTOR 2983981, MR 0040631
^ Chapman, D. G.; Robbins, H. (1951), "Minimum variance estimation without regularity assumptions", Annals of Mathematical Statistics, 22 (4): 581–586, doi:10.1214/aoms/1177729548, JSTOR 2236927, MR 0044084
^ ^a ^b Polyanskiy, Yury (2017). "Lecture notes on information theory, chapter 29, ECE563 (UIUC)" (PDF). Lecture notes on information theory. Archived (PDF) from the original on 2022-05-24. Retrieved 2022-05-24.