Normal-inverse-Wishart distribution

Multivariate parameter family of continuous probability distributions
normal-inverse-Wishart
Notation ( μ , Σ ) N I W ( μ 0 , λ , Ψ , ν ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )}
Parameters μ 0 R D {\displaystyle {\boldsymbol {\mu }}_{0}\in \mathbb {R} ^{D}\,} location (vector of real)
λ > 0 {\displaystyle \lambda >0\,} (real)
Ψ R D × D {\displaystyle {\boldsymbol {\Psi }}\in \mathbb {R} ^{D\times D}} inverse scale matrix (pos. def.)
ν > D 1 {\displaystyle \nu >D-1\,} (real)
Support μ R D ; Σ R D × D {\displaystyle {\boldsymbol {\mu }}\in \mathbb {R} ^{D};{\boldsymbol {\Sigma }}\in \mathbb {R} ^{D\times D}} covariance matrix (pos. def.)
PDF f ( μ , Σ | μ 0 , λ , Ψ , ν ) = N ( μ | μ 0 , 1 λ Σ )   W 1 ( Σ | Ψ , ν ) {\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},{\tfrac {1}{\lambda }}{\boldsymbol {\Sigma }})\ {\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}

In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).[1]

Definition

Suppose

μ | μ 0 , λ , Σ N ( μ | μ 0 , 1 λ Σ ) {\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Sigma }}\sim {\mathcal {N}}\left({\boldsymbol {\mu }}{\Big |}{\boldsymbol {\mu }}_{0},{\frac {1}{\lambda }}{\boldsymbol {\Sigma }}\right)}

has a multivariate normal distribution with mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} and covariance matrix 1 λ Σ {\displaystyle {\tfrac {1}{\lambda }}{\boldsymbol {\Sigma }}} , where

Σ | Ψ , ν W 1 ( Σ | Ψ , ν ) {\displaystyle {\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu \sim {\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}

has an inverse Wishart distribution. Then ( μ , Σ ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})} has a normal-inverse-Wishart distribution, denoted as

( μ , Σ ) N I W ( μ 0 , λ , Ψ , ν ) . {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu ).}

Characterization

Probability density function

f ( μ , Σ | μ 0 , λ , Ψ , ν ) = N ( μ | μ 0 , 1 λ Σ ) W 1 ( Σ | Ψ , ν ) {\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\mathcal {N}}\left({\boldsymbol {\mu }}{\Big |}{\boldsymbol {\mu }}_{0},{\frac {1}{\lambda }}{\boldsymbol {\Sigma }}\right){\mathcal {W}}^{-1}({\boldsymbol {\Sigma }}|{\boldsymbol {\Psi }},\nu )}

The full version of the PDF is as follows:[2]

f ( μ , Σ | μ 0 , λ , Ψ , ν ) = λ D / 2 | Ψ | ν / 2 | Σ | ν + D + 2 2 ( 2 π ) D / 2 2 ν D 2 Γ D ( ν 2 ) exp { 1 2 T r ( Ψ Σ 1 ) λ 2 ( μ μ 0 ) T Σ 1 ( μ μ 0 ) } {\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )={\frac {\lambda ^{D/2}|{\boldsymbol {\Psi }}|^{\nu /2}|{\boldsymbol {\Sigma }}|^{-{\frac {\nu +D+2}{2}}}}{(2\pi )^{D/2}2^{\frac {\nu D}{2}}\Gamma _{D}({\frac {\nu }{2}})}}{\text{exp}}\left\{-{\frac {1}{2}}Tr({\boldsymbol {\Psi \Sigma }}^{-1})-{\frac {\lambda }{2}}({\boldsymbol {\mu }}-{\boldsymbol {\mu }}_{0})^{T}{\boldsymbol {\Sigma }}^{-1}({\boldsymbol {\mu }}-{\boldsymbol {\mu }}_{0})\right\}}

Here Γ D [ ] {\displaystyle \Gamma _{D}[\cdot ]} is the multivariate gamma function and T r ( Ψ ) {\displaystyle Tr({\boldsymbol {\Psi }})} is the Trace of the given matrix.

Properties

Scaling

Marginal distributions

By construction, the marginal distribution over Σ {\displaystyle {\boldsymbol {\Sigma }}} is an inverse Wishart distribution, and the conditional distribution over μ {\displaystyle {\boldsymbol {\mu }}} given Σ {\displaystyle {\boldsymbol {\Sigma }}} is a multivariate normal distribution. The marginal distribution over μ {\displaystyle {\boldsymbol {\mu }}} is a multivariate t-distribution.

Posterior distribution of the parameters

Suppose the sampling density is a multivariate normal distribution

y i | μ , Σ N p ( μ , Σ ) {\displaystyle {\boldsymbol {y_{i}}}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})}

where y {\displaystyle {\boldsymbol {y}}} is an n × p {\displaystyle n\times p} matrix and y i {\displaystyle {\boldsymbol {y_{i}}}} (of length p {\displaystyle p} ) is row i {\displaystyle i} of the matrix .

With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly

( μ , Σ ) N I W ( μ 0 , λ , Ψ , ν ) . {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu ).}

The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart

( μ , Σ | y ) N I W ( μ n , λ n , Ψ n , ν n ) , {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}|y)\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{n},\lambda _{n},{\boldsymbol {\Psi }}_{n},\nu _{n}),}

where

μ n = λ μ 0 + n y ¯ λ + n {\displaystyle {\boldsymbol {\mu }}_{n}={\frac {\lambda {\boldsymbol {\mu }}_{0}+n{\bar {\boldsymbol {y}}}}{\lambda +n}}}
λ n = λ + n {\displaystyle \lambda _{n}=\lambda +n}
ν n = ν + n {\displaystyle \nu _{n}=\nu +n}
Ψ n = Ψ + S + λ n λ + n ( y ¯ μ 0 ) ( y ¯ μ 0 ) T       w i t h     S = i = 1 n ( y i y ¯ ) ( y i y ¯ ) T {\displaystyle {\boldsymbol {\Psi }}_{n}={\boldsymbol {\Psi +S}}+{\frac {\lambda n}{\lambda +n}}({\boldsymbol {{\bar {y}}-\mu _{0}}})({\boldsymbol {{\bar {y}}-\mu _{0}}})^{T}~~~\mathrm {with} ~~{\boldsymbol {S}}=\sum _{i=1}^{n}({\boldsymbol {y_{i}-{\bar {y}}}})({\boldsymbol {y_{i}-{\bar {y}}}})^{T}} .


To sample from the joint posterior of ( μ , Σ ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})} , one simply draws samples from Σ | y W 1 ( Ψ n , ν n ) {\displaystyle {\boldsymbol {\Sigma }}|{\boldsymbol {y}}\sim {\mathcal {W}}^{-1}({\boldsymbol {\Psi }}_{n},\nu _{n})} , then draw μ | Σ , y N p ( μ n , Σ / λ n ) {\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\Sigma ,y}}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }}_{n},{\boldsymbol {\Sigma }}/\lambda _{n})} . To draw from the posterior predictive of a new observation, draw y ~ | μ , Σ , y N p ( μ , Σ ) {\displaystyle {\boldsymbol {\tilde {y}}}|{\boldsymbol {\mu ,\Sigma ,y}}\sim {\mathcal {N}}_{p}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})} , given the already drawn values of μ {\displaystyle {\boldsymbol {\mu }}} and Σ {\displaystyle {\boldsymbol {\Sigma }}} .[3]

Generating normal-inverse-Wishart random variates

Generation of random variates is straightforward:

  1. Sample Σ {\displaystyle {\boldsymbol {\Sigma }}} from an inverse Wishart distribution with parameters Ψ {\displaystyle {\boldsymbol {\Psi }}} and ν {\displaystyle \nu }
  2. Sample μ {\displaystyle {\boldsymbol {\mu }}} from a multivariate normal distribution with mean μ 0 {\displaystyle {\boldsymbol {\mu }}_{0}} and variance 1 λ Σ {\displaystyle {\boldsymbol {\tfrac {1}{\lambda }}}{\boldsymbol {\Sigma }}}

Related distributions

  • The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If ( μ , Σ ) N I W ( μ 0 , λ , Ψ , ν ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\sim \mathrm {NIW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }},\nu )} then ( μ , Σ 1 ) N W ( μ 0 , λ , Ψ 1 , ν ) {\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Sigma }}^{-1})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Psi }}^{-1},\nu )} .
  • The normal-inverse-gamma distribution is the one-dimensional equivalent.
  • The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made.

Notes

  1. ^ Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1]
  2. ^ Simon J.D. Prince(June 2012). Computer Vision: Models, Learning, and Inference. Cambridge University Press. 3.8: "Normal inverse Wishart distribution".
  3. ^ Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.

References

  • Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
  • Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [2]
  • v
  • t
  • e
Discrete
univariate
with finite
support
with infinite
support
Continuous
univariate
supported on a
bounded interval
supported on a
semi-infinite
interval
supported
on the whole
real line
with support
whose type varies
Mixed
univariate
continuous-
discrete
Multivariate
(joint)DirectionalDegenerate
and singular
Degenerate
Dirac delta function
Singular
Cantor
Families
  • Category
  • Commons