In contrast, multi-layer network models, such as the multi-layer perceptron (MLP), use the back propagation learning model. The back propagation learning model is based on a gradient descent procedure, which tends to converge at a very slow rate. Back propagation is also hampered by the fact that all layers of weights in the network are computed by minimizing the error, which is a function of the output. This tends to slow the learning, since all weights in the network must be determined with each iteration.
The LRF tries to overcome some of these problems by using a localized representation of the input space to limit the number of units that respond to a given input. This allows the LRF to converge at a faster rate than similar types of neural network models, since only those receptive fields which respond to an input need to be updated. Another factor that allows the LRF to reduce the learning time, is that it makes use of self-organized learning techniques, such as K-means, to train the receptive field centers.
The basic network model of the LRF consists of a two layer topology. The first layer of "receptive field" nodes are trained using a clustering algorithm, such as K-means, or some other algorithm which can determine the receptive field centers. Each node in the first layer computes a receptive field response function, which should approach zero as the distance from the center of the receptive field is increased. The second layer of the LRF model sums the weighted outputs of the first layer, which produces the output or response of the network. A supervised LMS rule is used to train the weights of the second layer nodes.
The response function of the LRF network is formulated as follows:
f(x) = SUM(Ti * Ri(x))
where,
Ri(x) = Q( ||x - xi|| / Wi )
x - is a real valued vector in the input space, Ri - is the ith receptive field response function, Q - is a radially symmetric function with a single maximum at the origin, decreasing to zero at large radii, xi - is the center of the ith receptive field, Wi - is the width of the ith receptive field, Ti - is the weight associated with each receptive field.
The receptive field response functions ( Ri(x) ), should be formulated such that they decrease rapidly with increasing radii. This ensures that the response functions provide highly localized representations of the input space. The response function used here is modeled after the Gaussian, and uses the trace of the covariance matrix to set the widths of the receptive field centers.
The weights for the output layer are found using the LMS learning rule. The weights are adjusted at each iteration to minimize the total error, which is based on the difference between the network output and the desired result.
The key element to the success of the LRF is the self-organizing receptive fields. As noted above, the receptive field centers can be determined from a statistical clustering algorithm such as K-means. The inputs to the training phase of the LRF (ie. "lrftrain"), are the outputs from "vkmeans" and the original image. Specifically, these are the original input image, which may be a multi-band image containing all of the feature bands used in the classification, and the "cluster number" image, the "cluster center" image, and the "cluster variance" image. The "cluster number" image specifies which vector belongs to what cluster, the "cluster center" image specifies the cluster center locations in the feature space, and the "cluster variance" image specifies the variances of the data associated with each cluster center.
Prior to using the LRF algorithm, it is necessary to run "vkmeans" on the input training image to fix the cluster centers, followed by a supervised classification of the clustered image, which assigns a desired class to each cluster center. NOTE that the image resulting from the supervised classification MUST be appended to the "cluster center" image before running the LRF. This is necessary since it makes the appropriate desired class assignments to the cluster centers for the training phase of the LRF.
-d is an integer specifying the iteration interval used to print the mean squared error (MSE) to the output statistics file. If this value is left at zero (the default), only the MSE of the first iteration is written to the file. Any other integer will cause the value of the MSE to be written to the statistics file at the iteration interval specified.
-cv is a float value that specifies the convergence value for the algorithm. When the current MSE value reaches the specified convergence value, the algorithm will terminate.
-meu is a float value that specifies the weight update parameter for the learning algorithm. This value can be adjusted from 0 to 1. NOTE: this parameter may have a significant affect on the rate of learning, and it may have to be adjusted several times to get a feel for the optimum learning rate.
-n is an integer that specifies the maximum number of iterations that the algorithm will run before terminating. It is initially set to an arbitrarily large number to allow the algorithm to complete the learning phase.
-delta is a float value that specifies the minimum change in the MSE value from one iteration to the next. This parameter may be used to terminate the algorithm when the change in the MSE is zero or very small, but the MSE has not yet reached the specified convergence value (-cv). This may occur when the learning has reached a "plateau" or "bench" and is no longer learning.
-b is an integer that specifies the border width, in pixels, encompassing the desired region of the image to be classified. This region is ignored during the classification process.
Of the four input images to this routine, all but the "cluster number" image must be of data storage type FLOAT. The "cluster number" image should be of data storage type INTEGER. The output "weight" image is written out as data storage type FLOAT. The output statistics file is stored as an ASCII file.
The statistics output file (-f) contains the following information:
MSE at the first iteration MSE at each specified interval (optional) Total Number of Iterations Final MSE at termination of the algorithm Convergence Parameter used (-cv) Weight Update Parameter used (-meu) Minimum Delta MSE value (-delta) Border Width (-b) Number of Response Nodes in the network Number of Output Classes in the network
The number of receptive field response nodes in the first layer of the LRF is determined by the number of cluster centers in the "cluster center" image. The number of output classes, and hence the number of output nodes in the second (ie. last) layer, is determined by the number of desired classes that was specified in the "supervised" classification phase of the clustering. This information is contained in the last band of the cluster center image. The number of weights in the network is determined by the number of receptive field response nodes and the number of output nodes. That is,
#Wts = (#rf_response_nodes * #output_nodes) + #output_nodes
As an initial step, try running the algorithm with a small number of iterations (ex. -n = 500) to get a feel for how the MSE is behaving (ie. decreasing rapidly, slowly, or increasing). Make sure you have the MSE display parameter set to a reasonable interval (ex. -d = 10) so that you can see how the MSE is behaving. These values will be written to the statistics file (-f).
After you get an idea of how the MSE is behaving, set the convergence value (-cv) to a reasonable value. You may also try decreasing the weight update parameter (-meu) to learn at a slower rate. Often times a large weight update parameter will cause the learning to "oscillate" and never reach a small MSE. You may also want to set the minimum delta MSE parameter (-delta) to a small value, to ensure that the algorithm terminates if the MSE levels off.
This routine was written with the help of and ideas from Dr. Don Hush, University of New Mexico, Dept. of EECE.
lrftrain -i1 feature_image.xv -i2 cluster_centers -i3 variances -i4 cluster_numbers -o weight_image -f stats -d 10 -n 500This example illustrates a good initial step at training on an image. The display interval is set to write out the MSE every 10 iterations. The number of iterations is set to a small value, 500, to ensure that the algorithm will stop in a reasonable amount of time to get a feel for how the MSE is behaving.