Learning Functions

Next: Update Functions Up: Working with Partial Previous: The Initialization Function

Learning Functions

By deleting all recurrent links in a partial recurrent network, a simple feedforward network remains. The context units have now the function of input units, i.e. the total network input consists of two components. The first component is the pattern vector, which was the only input to the partial recurrent network. The second component is a state vector. This state vector is given through the next--state function in every step. By this way the behavior of a partial recurrent network can be simulated with a simple feedforward network, that receives the state not implicitly through recurrent links, but as an explicit part of the input vector. In this sense, backpropagation algorithms can easily be modified for the training of partial recurrent networks in the following way:

Initialization of the context units. In the following steps, all recurrent links are assumed to be not existent, except in step 2(f).
Execute for each pattern of the training sequence the following steps:
- input of the pattern and forward propagation through the network
- calculation of the error signals of output units by comparing the computed output and the teaching output
- back propagation of the error signals
- calculation of the weight changes
- only on--line training: weight adaption
- calculation of the new state of the context units according to the incoming links
Only off--line training: weight adaption

In this manner, the following learning functions have been adapted for the training of partial recurrent networks like Jordan and Elman networks:

JE_BP: Standard Backpropagation for partial recurrent networks
JE_BPMomentum: Standard Backpropagation with Momentum--Term for partial recurrent networks
JE_Quickprop: Quickprop for partial recurrent networks
JE_Rprop: Rprop for partial recurrent networks

The parameters for these learning functions are the same as for the regular feedforward versions of these algorithms (see section ) plus one special parameter.

For training a network with one of these functions a method called teacher forcing can be used. Teacher forcing means that during the training phase the output units propagate the teaching output instead of their produced output to successor units (if there are any). The new parameter is used to enable or disable teacher forcing. If the value is less or equal 0.0 only the teaching output is used, if it is greater or equal 1.0 the real output is propagated. Values between 0.0 and 1.0 yield a weighted sum of the teaching output and the real output.

Next: Update Functions Up: Working with Partial Previous: The Initialization Function

Niels.Mache@informatik.uni-stuttgart.de
Tue Nov 28 10:30:44 MET 1995