Architectures in the back-propagation category include standard networks, recurrent, feed forward with multiple hidden slabs, and jump connection networks. Back-propagation networks are known for their ability to generalize well on a wide variety of problems. They are a supervised type of network, i.e., trained with both inputs and outputs. Back-propagation networks are used in a large number of working applications, since they tend to generalize well.
The first category of neural network architectures is the one where each layer is connected to the immediately previous layer (see Figure 11.17). Generally, three layers (input, hidden, and output) are sufficient for the majority of problems to be handled. A three-layer back-propagation network with standard connections is suitable for almost all problems. One, two, or three hidden layers can be used, however, depending on the problem characteristics. The use of more than five layers in total generally offers no benefits and should be avoided.
The next category of architecture is the recurrent network with dampened feedback from either the input, hidden, or output layer. It holds the contents of one of the layers as it existed when the previous pattern was trained. In this way, the network sees the previous knowledge it had about previous inputs. This extra slab is sometimes called the network's long-term memory. The long-term memory remembers the input, output, or hidden layer that contains features detected in the raw data of previous patterns. Recurrent neural networks are particularly suitable for prediction of sequences, so they are excellent for time series data. A back-propagation network with standard connections, as just described, responds to a given input pattern with exactly the same output pattern every time the input pattern is presented. A recurrent network may respond to the same input pattern differently at different times, depending on the patterns that had been presented
as inputs just previously. Thus, the sequence of the patterns is as important as the input pattern itself. Recurrent networks are trained as the standard back-propagation networks, except that patterns must always be presented in the same order. The difference in structure is that an extra slab in the input layer is connected to the hidden layer, just like the other input slab. This extra slab holds the contents of one of the layers (input, output, or hidden) as it existed when the previous pattern was trained.
The third category is the feed-forward network with multiple hidden slabs. These network architectures are very powerful in detecting different features of the input vectors when different activation functions are given to the hidden slabs. This architecture has been used in a number of engineering problems for modeling and prediction with very good results (see the later section, "ANN Applications in Solar Energy Systems"). This is a feed-forward architecture with three hidden slabs, as shown in Figure 11.19. The information processing at each node is performed by combining all input numerical information from upstream nodes in a weighted average of the form
. j where ctpi) = activation for each node.
b1 = a constant term referred to as the bias.
The final nodal output is computed via the activation function. This architecture has different activation functions in each slab. Referring to Figure 11.19, the input slab activation function is linear, i.e., a(pi) = ft, (where ft is the weighted average obtained by combining all input numerical information from upstream nodes), while the activations used in the other slabs are as follows.
Gaussian complement for slab 4, a pi) = 1 - e-p2 (11.111)
Logistic for the output slab, a(po = ^per (1U12)
Different activation functions are applied to hidden layer slabs to detect different features in a pattern processed through a network. The number of hidden neurons in the hidden layers may also be calculated with Eq. (11.107). However, an increased number of hidden neurons may be used to get more "degrees of freedom" and allow the network to store more complex patterns. This is usually done when the input data are highly nonlinear. It is recommended in this architecture to use Gaussian function on one hidden slab to detect features in the middle range of the data and Gaussian complement in another hidden slab to detect features from the upper and lower extremes of the data. Combining the two feature sets in the output layer may lead to a better prediction.
Was this article helpful?