Chapman–Robbins bound

In statistics, the Chapman–Robbins bound or Hammersley–Chapman–Robbins bound is a lower bound on the variance of estimators of a deterministic parameter. It is a generalization of the Cramér–Rao bound; compared to the Cramér–Rao bound, it is both tighter and applicable to a wider range of problems. However, it is usually more difficult to compute.

The bound was independently discovered by John Hammersley in 1950,[1] and by Douglas Chapman and Herbert Robbins in 1951.[2]

Statement

Let Θ {\displaystyle \Theta } be the set of parameters for a family of probability distributions { μ θ : θ Θ } {\displaystyle \{\mu _{\theta }:\theta \in \Theta \}} on Ω {\displaystyle \Omega } .

For any two θ , θ Θ {\displaystyle \theta ,\theta '\in \Theta } , let χ 2 ( μ θ ; μ θ ) {\displaystyle \chi ^{2}(\mu _{\theta '};\mu _{\theta })} be the χ 2 {\displaystyle \chi ^{2}} -divergence from μ θ {\displaystyle \mu _{\theta }} to μ θ {\displaystyle \mu _{\theta '}} . Then:

Theorem — Given any scalar random variable g ^ : Ω R {\displaystyle {\hat {g}}:\Omega \to \mathbb {R} } , and any two θ , θ Θ {\displaystyle \theta ,\theta '\in \Theta } , we have Var θ [ g ^ ] sup θ θ Θ ( E θ [ g ^ ] E θ [ g ^ ] ) 2 χ 2 ( μ θ ; μ θ ) {\displaystyle \operatorname {Var} _{\theta }[{\hat {g}}]\geq \sup _{\theta '\neq \theta \in \Theta }{\frac {(E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])^{2}}{\chi ^{2}(\mu _{\theta '};\mu _{\theta })}}} .

A generalization to the multivariable case is:[3]

Theorem — Given any multivariate random variable g ^ : Ω R m {\displaystyle {\hat {g}}:\Omega \to \mathbb {R} ^{m}} , and any θ , θ Θ {\displaystyle \theta ,\theta '\in \Theta } , χ 2 ( μ θ ; μ θ ) ( E θ [ g ^ ] E θ [ g ^ ] ) T Cov θ [ g ^ ] 1 ( E θ [ g ^ ] E θ [ g ^ ] ) {\displaystyle \chi ^{2}(\mu _{\theta '};\mu _{\theta })\geq (E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])^{T}\operatorname {Cov} _{\theta }[{\hat {g}}]^{-1}(E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])}

Proof

By the variational representation of chi-squared divergence:[3]

χ 2 ( P ; Q ) = sup g ( E P [ g ] E Q [ g ] ) 2 Var Q [ g ] {\displaystyle \chi ^{2}(P;Q)=\sup _{g}{\frac {(E_{P}[g]-E_{Q}[g])^{2}}{\operatorname {Var} _{Q}[g]}}}
Plug in g = g ^ , P = μ θ , Q = μ θ {\displaystyle g={\hat {g}},P=\mu _{\theta '},Q=\mu _{\theta }} , to obtain:
χ 2 ( μ θ ; μ θ ) ( E θ [ g ^ ] E θ [ g ^ ] ) 2 Var θ [ g ^ ] {\displaystyle \chi ^{2}(\mu _{\theta '};\mu _{\theta })\geq {\frac {(E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}])^{2}}{\operatorname {Var} _{\theta }[{\hat {g}}]}}}
Switch the denominator and the left side and take supremum over θ {\displaystyle \theta '} to obtain the single-variate case. For the multivariate case, we define h = i = 1 m v i g ^ i {\textstyle h=\sum _{i=1}^{m}v_{i}{\hat {g}}_{i}} for any v 0 R m {\displaystyle v\neq 0\in \mathbb {R} ^{m}} . Then plug in g = h {\displaystyle g=h} in the variational representation to obtain:
χ 2 ( μ θ ; μ θ ) ( E θ [ h ] E θ [ h ] ) 2 Var θ [ h ] = v , E θ [ g ^ ] E θ [ g ^ ] 2 v T Cov θ [ g ^ ] v {\displaystyle \chi ^{2}(\mu _{\theta '};\mu _{\theta })\geq {\frac {(E_{\theta '}[h]-E_{\theta }[h])^{2}}{\operatorname {Var} _{\theta }[h]}}={\frac {\langle v,E_{\theta '}[{\hat {g}}]-E_{\theta }[{\hat {g}}]\rangle ^{2}}{v^{T}\operatorname {Cov} _{\theta }[{\hat {g}}]v}}}
Take supremum over v 0 R m {\displaystyle v\neq 0\in \mathbb {R} ^{m}} , using the linear algebra fact that sup v 0 v T w w T v v T M v = w T M 1 w {\displaystyle \sup _{v\neq 0}{\frac {v^{T}ww^{T}v}{v^{T}Mv}}=w^{T}M^{-1}w} , we obtain the multivariate case.

Relation to Cramér–Rao bound

Usually, Ω = X n {\displaystyle \Omega ={\mathcal {X}}^{n}} is the sample space of n {\displaystyle n} independent draws of a X {\displaystyle {\mathcal {X}}} -valued random variable X {\displaystyle X} with distribution λ θ {\displaystyle \lambda _{\theta }} from a by θ Θ R m {\displaystyle \theta \in \Theta \subseteq \mathbb {R} ^{m}} parameterized family of probability distributions, μ θ = λ θ n {\displaystyle \mu _{\theta }=\lambda _{\theta }^{\otimes n}} is its n {\displaystyle n} -fold product measure, and g ^ : X n Θ {\displaystyle {\hat {g}}:{\mathcal {X}}^{n}\to \Theta } is an estimator of θ {\displaystyle \theta } . Then, for m = 1 {\displaystyle m=1} , the expression inside the supremum in the Chapman–Robbins bound converges to the Cramér–Rao bound of g ^ {\displaystyle {\hat {g}}} when θ θ {\displaystyle \theta '\to \theta } , assuming the regularity conditions of the Cramér–Rao bound hold. This implies that, when both bounds exist, the Chapman–Robbins version is always at least as tight as the Cramér–Rao bound; in many cases, it is substantially tighter.

The Chapman–Robbins bound also holds under much weaker regularity conditions. For example, no assumption is made regarding differentiability of the probability density function p(x; θ) of λ θ {\displaystyle \lambda _{\theta }} . When p(x; θ) is non-differentiable, the Fisher information is not defined, and hence the Cramér–Rao bound does not exist.

See also

References

  1. ^ Hammersley, J. M. (1950), "On estimating restricted parameters", Journal of the Royal Statistical Society, Series B, 12 (2): 192–240, JSTOR 2983981, MR 0040631
  2. ^ Chapman, D. G.; Robbins, H. (1951), "Minimum variance estimation without regularity assumptions", Annals of Mathematical Statistics, 22 (4): 581–586, doi:10.1214/aoms/1177729548, JSTOR 2236927, MR 0044084
  3. ^ a b Polyanskiy, Yury (2017). "Lecture notes on information theory, chapter 29, ECE563 (UIUC)" (PDF). Lecture notes on information theory. Archived (PDF) from the original on 2022-05-24. Retrieved 2022-05-24.

Further reading

  • Lehmann, E. L.; Casella, G. (1998), Theory of Point Estimation (2nd ed.), Springer, pp. 113–114, ISBN 0-387-98502-6