Summary of Nonlinear Networks and Applications
Backpropagation
- Implementing backprop
- characteristics of cost surfaces
Activation Functions
- linear
- threshold: binary, bipolar
- sigmoid: bipolar (symmetric), sigmoid
- softmax
Cost Functions
- Mean Squared Error (MSE)
- Cross Entropy
Improving Generalization
- using noise to improve learning, annealing
- what does it mean to overtrain?
- early stopping
- weight decay
- pruning (e.g. optimal brain damage)
Speed-up Techniques
Unsupervised Learning
- Dimension Reduction for Compression using Autoassociative Networks
- Principal Component Analysis (PCA) using 3 layer nets
- Nonlinear PCA using 5-layer nets
- Clustering for Compression
- Kohonen's Self-Organizing Maps (SOMs)
Misc Terminology
- correlation matrix vs Hessian
- linear separability
- bias
- decision boundary
- clustering
- dimension reduction
- overtraining
Experimental Design
- What techniques would you use to understand the data? (graphing data, examining correlation matrix, dimension
reduction,...)
- What type of architecture would you use? (number of layers, number of nodes, activation functions) Why?
- What learning algorithm would you use (speed-up technique)? Why?
- What do you do to insure the net is trained adequately? (but not overtrained)