Two layer networks perform a projection of the data onto a linear subspace. In this case, the encoding and decoding portions of the network are really single layer linear networks.
This works well in some cases. However, many datasets lie on lower dimensional subspaces that are not linear.
A helix is 1-D, however, it does not line on a 1-D linear subspace.
To solve this problem we can let the encoding and decoding portions each be multilayer networks. In this way we obtain nonlinear projections of the data.
(from Fast Nonlinear Dimension Reduction, Nanda Kambhatla,NIPS93)
(from Fast Nonlinear Dimension Reduction, Nanda Kambhatla,NIPS93)
In the examples below, the original images consisted of 64x64 8-bit/pixel grayscale images. The first 50 principal components were extracted to from the image you see on the left. This was reduced to 5 dimensions using linear PCA to obtain the image in the center. The same imageon the left was also reduced to 5 dimensions using a 5-layer (50-40-5-40-50) network to produce the image on the right.
50 principal components 5 principal components 5 nonlinear components
50 principal components 5 principal components 5 nonlinear components