Recent progresses of neural network unsupervised learning: I. Independent component analyses generalizing PCA

Harold H. Szu

doi:10.1117/12.342876

22 March 1999 Recent progresses of neural network unsupervised learning: I. Independent component analyses generalizing PCA

Harold H. Szu

Author Affiliations +

Proceedings Volume 3722, Applications and Science of Computational Intelligence II; (1999) https://doi.org/10.1117/12.342876
Event: AeroSense '99, 1999, Orlando, FL, United States

Abstract

The early vision principle of redundancy reduction of 10⁸ sensor excitations is understandable from computer vision viewpoint toward sparse edge maps. It is only recently derived using a truly unsupervised learning paradigm of artificial neural networks (ANN). In fact, the biological vision, Hubel- Wiesel edge maps, is reproduced seeking the underlying independent components analyses (ICA) among 10² image samples by maximizing the ANN output entropy (partial)H(V)/(partial)[W] equals (partial)[W]/(partial)t. When a pair of newborn eyes or ears meet the bustling and hustling world without supervision, they seek ICA by comparing 2 sensory measurements (x₁(t), x₂(t))^T equalsV X(t). Assuming a linear and instantaneous mixture model of the external world X(t) equals [A] S(t), where both the mixing matrix ([A] equalsV [a₁, a₂] of ICA vectors and the source percentages (s₁(t), s₂(t))^T equalsV S(t) are unknown, we seek the independent sources <S(t) S^T(t)> approximately equals [I] where the approximated sign indicates that higher order statistics (HOS) may not be trivial. Without a teacher, the ANN weight matrix [W] equalsV [w₁, w₂] adjusts the outputs V(t) equals tanh([W]X(t)) approximately equals [W]X(t) until no desired outputs except the (Gaussian) 'garbage' (neither YES '1' nor NO '-1' but at linear may-be range 'origin 0') defined by Gaussian covariance <V(t) V(t)^T>_G equals [I] equals [W][A] <S(t) S^T(t)greater than [A]^T[W]^T. Thus, ANN obtains [W][A] approximately equals [I] without an explicit teacher, and discovers the internal knowledge representation [W], as the inverse of the external world matrix [A]^-1. To unify IC, PCA, ANN & HOS theories since 1991 (advanced by Jutten & Herault, Comon, Oja, Bell-Sejnowski, Amari-Cichocki, Cardoso), the LYAPONOV function L(v₁,...,v_n, w₁,...w_n,) equals E(v₁,...,v_n) - H(w₁,...w_n) is constructed as the HELMHOTZ free energy to prove both convergences of supervised energy E and unsupervised entropy H learning. Consequently, rather using the faithful but dumb computer: 'GARBAGE-IN, GARBAGE-OUT,' the smarter neurocomputer will be equipped with an unsupervised learning that extracts 'RAW INFO-IN, (until) GARBAGE-OUT' for sensory knowledge acquisition in enhancing Machine IQ. We must go beyond the LMS error energy, and apply HOS To ANN. We begin with the Auto- Regression (AR) which extrapolates from the past X(t) to the future u_i(t+1) equals w_i^TX(t) by varying the weight vector in minimizing LMS error energy E equals <[x(t+1) - u_i(t+1)]²> at the fixed point (partial)E/(partial)w_i equals 0 resulted in an exact Toplitz matrix inversion for a stationary covariance assumption. We generalize AR by a nonlinear output v_i(t+1) equals tanh(w_i^TX(t)) within E equals <[x(t+1) - v_i(t+1)]²>, and the gradient descent (partial)E/(partial)w_i equals - (partial)w_i/(partial)t. Further generalization is possible because of specific image/speech having a specific histogram whose gray scale statistics departs from that of Gaussian random variable and can be measured by the fourth order cumulant, Kurtosis K(v_i) equals <v_i⁴> - 3 <v_i²>² (K greater than or equal to 0 super-G for speeches, K less than or equal to 0 sub-G for images). Thus, the stationary value at (partial)K/(partial)w_i equals plus or minus 4 PTLw_i/(partial)t can de-mix unknown mixtures of noisy images/speeches without a teacher. This stationary statistics may be parallel implemented using the 'factorized pdf code: (rho) (v₁, v₂) equals (rho) (v₁) (rho) (v₂)' occurred at a maximal entropy algorithm improved by the natural gradient of Amari. Real world applications are given in Part II, (Wavelet Appl-VI, SPIE Proc. Vol. 3723) such as remote sensing subpixel composition, speech segmentation by means of ICA de-hyphenation, and cable TV bandwidth enhancement by simultaneously mixing sport and movie entertainment events.

Citation Download Citation

Harold H. Szu "Recent progresses of neural network unsupervised learning: I. Independent component analyses generalizing PCA", Proc. SPIE 3722, Applications and Science of Computational Intelligence II, (22 March 1999); https://doi.org/10.1117/12.342876

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
21 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Sensors

Machine learning

Independent component analysis

Principal component analysis

Neurons

Neural networks

Positron emission tomography

Show All Keywords

Keywords/Phrases

Search In:

Publication Years