According to Haykin (1994), a neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use. It resembles the human brain in two respects:

• Knowledge is acquired by the network through a learning process.

• Interneuron connection strengths, known as synaptic weights, are used to store the knowledge.

Artificial neural network models may be used as an alternative method in engineering analysis and predictions. ANNs mimic somewhat the learning process of a human brain. They operate like a "black box" model, requiring no detailed information about the system. Instead, they learn the relationship between the input parameters and the controlled and uncontrolled variables by studying previously recorded data, similar to the way a nonlinear regression might perform. Another advantage of using ANNs is their ability to handle large, complex systems with many interrelated parameters. They seem to simply ignore excess input parameters that are of minimal significance and concentrate instead on the more important inputs.

A schematic diagram of a typical multilayer, feed-forward neural network architecture is shown in Figure 11.17. The network usually consists of an input layer, some hidden layers, and an output layer. In its simple form, each single neuron is connected to other neurons of a previous layer through adaptable synaptic weights. Knowledge is usually stored as a set of connection weights (presumably corresponding to synapse efficacy in biological neural systems). Training is the process of modifying the connection weights in some orderly fashion, using a suitable learning method. The network uses a learning mode, in which an input is presented to the network along with the desired output and the weights are adjusted so that the network attempts to produce the desired output. The weights after training contain meaningful information, whereas before training they are random and have no meaning.

Figure 11.18 illustrates how information is processed through a single node. The node receives the weighted activation of other nodes through its incoming connections. First, these are added up (summation). The result is

then passed through an activation function; the outcome is the activation of the node. For each of the outgoing connections, this activation value is multiplied by the specific weight and transferred to the next node.

A training set is a group of matched input and output patterns used for training the network, usually by suitable adaptation of the synaptic weights. The outputs are the dependent variables that the network produces for the corresponding input. It is important that all the information the network needs to learn is supplied to the network as a data set. When each pattern is read, the network uses the input data to produce an output, which is then compared to the training pattern, i.e., the correct or desired output. If there is a difference, the connection weights (usually but not always) are altered in such a direction that the error is decreased. After the network has run through all the input patterns, if the error is still greater than the maximum desired tolerance, the ANN runs again through all the input patterns repeatedly until all the errors are within the required tolerance. When the training reaches a satisfactory level, the network holds the weights constant and the trained network can be used to make decisions, identify patterns, or define associations in new input data sets not used to train it.

By learning, we mean that the system adapts (usually by changing suitable controllable parameters) in a specified manner so that some parts of the system suggest a meaningful behavior, projected as output. The controllable parameters have different names, such as synaptic weights, synaptic efficacies, free parameters, and others.

The classical view of learning is well interpreted and documented in approximation theories. In these, learning may be interpreted as finding a suitable hypersurface that fits known input-output data points in such a manner that the mapping is acceptably accurate. Such a mapping is usually accomplished by employing simple nonlinear functions that are used to compose the required function (Pogio and Girosi, 1990).

A more general approach to learning is adopted by Haykin (1994), in which learning is a process by which the free parameters of a neural network are adapted through a continuing process of simulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameter changes take place.

Generally, learning is achieved through any change in any characteristic of a network so that meaningful results are achieved, meaning that a desired objective is met with a satisfactory degree of success. Thus, learning could be achieved through synaptic weight modification, network structure modifications, appropriate choice of activation functions in and other ways.

The objective is usually quantified by a suitable criterion or cost function. It is usually a process of minimizing an error function or maximizing a benefit function. In this respect, learning resembles optimization. That is why a genetic algorithm, which is an optimum search technique (see Section 11.6.2), can also be employed to train artificial neural networks.

Several algorithms are commonly used to achieve the minimum error in the shortest time. There are also many alternative forms of neural networking systems and, indeed, many different ways in which they may be applied to a given problem. The suitability of an appropriate paradigm and strategy for an application depends very much on the type of problem to be solved.

The most popular learning algorithms are back-propagation and its variants (Barr and Feigenbaum, 1981; Werbos, 1974). The back-propagation (BP) algorithm is one of the most powerful learning algorithms in neural networks. Back-propagation training is a gradient descent algorithm. It tries to improve the performance of the neural network by reducing the total error by changing the weights along its gradient. The error is expressed by the root mean square value (RMS), which can be calculated by

where E is the RMS error, t is the network output (target), and o is the desired output vectors over all patterns, p. An error of zero would indicate that all the output patterns computed by the ANN perfectly match the expected values and the network is well trained. In brief, back-propagation training is performed by initially assigning random values to the weight terms (wy)1 in all nodes. Each time a training pattern is presented to the ANN, the activation for each node, op,-, is computed. After the output of the layer is computed the error term, 8pi, for each node is computed backward through the network. This error term is the product of the error function, E, and the derivative of the activation function and, hence, is a measure of the change in the network output produced by an incremental change in the node weight values. For the output layer nodes and the case of the logistic-sigmoid activation, the error term is computed as

8pi = (tpi - api) api(1 - api) (11.104) For a node in a hidden layer,

'The j subscript refers to a summation of all nodes in the previous layer of nodes, and the i subscript refers to the node position in the present layer.

In this expression, the k subscript indicates a summation over all nodes in the downstream layer (the layer in the direction of the output layer). The j subscript indicates the weight position in each node. Finally, the 6 and a terms for each node are used to compute an incremental change to each weight term via

The term e, referred to as the learning rate, determines the size of the weight adjustments during each training iteration. The term m is called the momentum factor. It is applied to the weight change used in the previous training iteration, wjold). Both of these constant terms are specified at the start of the training cycle and determine the speed and stability of the network. The training of all patterns of a training data set is called an epoch.

Was this article helpful?

Start Saving On Your Electricity Bills Using The Power of the Sun And Other Natural Resources!

## Post a comment