Kernel-independent component analysis

In statistics, kernel-independent component analysis (kernel ICA) is an efficient algorithm for independent component analysis which estimates source components by optimizing a generalized variance contrast function, which is based on representations in a reproducing kernel Hilbert space.[1][2] Those contrast functions use the notion of mutual information as a measure of statistical independence.

Main idea

Kernel ICA is based on the idea that correlations between two random variables can be represented in a reproducing kernel Hilbert space (RKHS), denoted by F {\displaystyle {\mathcal {F}}} , associated with a feature map L x : F R {\displaystyle L_{x}:{\mathcal {F}}\mapsto \mathbb {R} } defined for a fixed x R {\displaystyle x\in \mathbb {R} } . The F {\displaystyle {\mathcal {F}}} -correlation between two random variables X {\displaystyle X} and Y {\displaystyle Y} is defined as

ρ F ( X , Y ) = max f , g F corr ( L X , f , L Y , g ) {\displaystyle \rho _{\mathcal {F}}(X,Y)=\max _{f,g\in {\mathcal {F}}}\operatorname {corr} (\langle L_{X},f\rangle ,\langle L_{Y},g\rangle )}

where the functions f , g : R R {\displaystyle f,g:\mathbb {R} \to \mathbb {R} } range over F {\displaystyle {\mathcal {F}}} and

corr ( L X , f , L Y , g ) := cov ( f ( X ) , g ( Y ) ) var ( f ( X ) ) 1 / 2 var ( g ( Y ) ) 1 / 2 {\displaystyle \operatorname {corr} (\langle L_{X},f\rangle ,\langle L_{Y},g\rangle ):={\frac {\operatorname {cov} (f(X),g(Y))}{\operatorname {var} (f(X))^{1/2}\operatorname {var} (g(Y))^{1/2}}}}

for fixed f , g F {\displaystyle f,g\in {\mathcal {F}}} .[1] Note that the reproducing property implies that f ( x ) = L x , f {\displaystyle f(x)=\langle L_{x},f\rangle } for fixed x R {\displaystyle x\in \mathbb {R} } and f F {\displaystyle f\in {\mathcal {F}}} .[3] It follows then that the F {\displaystyle {\mathcal {F}}} -correlation between two independent random variables is zero.

This notion of F {\displaystyle {\mathcal {F}}} -correlations is used for defining contrast functions that are optimized in the Kernel ICA algorithm. Specifically, if X := ( x i j ) R n × m {\displaystyle \mathbf {X} :=(x_{ij})\in \mathbb {R} ^{n\times m}} is a prewhitened data matrix, that is, the sample mean of each column is zero and the sample covariance of the rows is the m × m {\displaystyle m\times m} dimensional identity matrix, Kernel ICA estimates a m × m {\displaystyle m\times m} dimensional orthogonal matrix A {\displaystyle \mathbf {A} } so as to minimize finite-sample F {\displaystyle {\mathcal {F}}} -correlations between the columns of S := X A {\displaystyle \mathbf {S} :=\mathbf {X} \mathbf {A} ^{\prime }} .

References

  1. ^ a b Bach, Francis R.; Jordan, Michael I. (2003). "Kernel independent component analysis" (PDF). The Journal of Machine Learning Research. 3: 1–48. doi:10.1162/153244303768966085.
  2. ^ Bach, Francis R.; Jordan, Michael I. (2003). "Kernel independent component analysis". 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03) (PDF). Vol. 4. pp. IV-876-9. doi:10.1109/icassp.2003.1202783. ISBN 978-0-7803-7663-2. S2CID 7691428.
  3. ^ Saitoh, Saburou (1988). Theory of Reproducing Kernels and Its Applications. Longman. ISBN 978-0582035645.


  • v
  • t
  • e