## Group method data handling neural network architecture

One type of neural network that is very suitable for modeling is the group method of data handling (GMDH) neural network. The group method of data handling technique was invented by A. G. Ivakhenko, from the Institute of Cybernetics,

Ukrainian Academy of Sciences (Ivakhenko, 1968, 1971), but enhanced by others (Farlow, 1984). This technique is also known as polynomial networks. Ivakhenko developed the GMDH technique to build more accurate predictive models of fish populations in rivers and oceans. The GMDH technique worked well for modeling fisheries and many other modeling applications (Hecht-Nielsen, 1991). The GMDH is a feature-based mapping network.

The GMDH technique works by building successive layers, with links that are simple polynomial terms. These polynomial terms are created by using linear and nonlinear regression. The initial layer is simply the input layer. The first layer created is made by computing regressions of the input variables, from which the best ones are chosen. The second layer is created by computing regressions of the values in the first layer, along with the input variables. Only the best, called survivors, are chosen by the algorithm. This process continues until the network stops getting better, according to a prespecified selection criterion. More details on the GMDH technique can be found in the book by Hecht-Nielsen (1991).

The resulting network can be represented as a complex polynomial description of the model in the form of a mathematical equation. The complexity of the resulting polynomial depends on the variability of the training data. In some respects, GMDH is very much like using regression analysis but far more powerful. The GMDH network can build very complex models while avoiding over-fitting problems. Additionally, an advantage of the GMDH technique is that it recognizes the best variables as it trains and, for problems with many variables, the ones with low contribution can be discarded.

The central idea behind the GMDH technique is that it is trying to build a function (called a polynomial model) that behaves as closely as possible to the way the predicted and actual values of the output would. For many end users, it may be more convenient to have a model that is able to make predictions using polynomial formulas that are widely understood than a normal neural network, which operates like a "black box" model. The most common approach to solving such models is to use regression analysis. The first step is to decide the type of polynomial that regression should find. For example, a good idea is to choose, as terms of the polynomial, powers of input variables along with their covariants and trivariants, such as lx1 , x2, x3, ■ ■ ■, x1 , x2 , x3 ," ■, x1 x2, x1 x3,''', Xn —1 Xn, x1x2 x3,- ■ } (11.114)

The next step is to construct a linear combination of all the polynomial terms with variable coefficients. The algorithm determines the values of these coefficients by minimizing the squared sum of differences between sample outputs and model predictions, over all samples.

The main problem when utilizing regression is how to choose the set of polynomial terms correctly. In addition, decisions need to be made on the degree of the polynomial. For example, decisions have to be made on how complex the terms should be or whether the model should evaluate terms such as x10, or maybe limit consideration to terms such as x4 and lower. The GMDH

technique works better than regression by answering these questions before trying all possible combinations.

The decision about the quality of each model must be made using some numeric criterion. The simplest criterion (a form of which is also used in linear regression analysis) is the sum, over all samples, of the squared differences between the actual output (ya) and the model's prediction (yp) divided by the sum of the squared actual output. This is called the normalized mean squared error (NMSE). In equation form,

However, if only the NMSE is used on real data, the NMSE value gets smaller and smaller as long as extra terms are added to the model. This is because the more complex the model, the more exact it is. This is always true if NMSE is used alone, which determines the quality of the model by evaluating the same information already used to build the model. This results in an "overcomplex" model or model overfit, which means the model does not generalize well because it pays too much attention to noise in the training data. This is similar to overtraining other neural networks.

To avoid this danger, a more powerful criterion is needed, based on information other than that which was used to build the evaluated model. There are several ways to define such criteria. For example, the squared sum of differences between the known output and model prediction over some other set of experimental data (a test set) may be used. Another way to avoid overfitting is to introduce a penalty for model complexity. This is called the predicted squared error criterion.

Theoretical considerations show that increasing model complexity should be stopped when the selection criterion reaches a minimum value. This minimum value is a measure of model reliability.

The method of searching for the best model based on testing all possible models is usually called the combinatorial GMDH algorithm. To reduce computation time, the number of polynomial terms used to build the models to be evaluated should be reduced. To do so, a one-stage procedure of model selection should be changed to a multilayer procedure. In this, the first two input variables are initially taken and combined into a simple set of polynomial terms. For example, if the first two input variables are x1 and x2, the set of polynomial terms would be {c, xb x2, x1 X x2}, where (c) represents the constant term. Subsequently, all possible models made from these terms are checked and the best is chosen; any one of the evaluated models is a candidate for survival.

Then, another pair of input variables is taken and the operation is repeated, resulting in another candidate for survival, with its own value of the evaluation criterion. By repeating the same procedure for each possible pair of n input variables, n(n - 1)/2 candidates for survival are generated, each with its own value of the evaluation criterion.

Subsequently, these values are compared, and several candidates for survival that give the best approximation of the output variable are chosen. Usually a predefined number of the best candidates are selected for survival and are stored in the first layer of the network and preserved for the next layer. The candidates selected are called survivors.

The layer of survivors is used for inputs in building the next layer in the network. The original network inputs used in the first layer may also be chosen as inputs to the new layer. Therefore, the next layer is built with polynomials of this broadened set of inputs. It should be noted that, since some inputs are already polynomials, the next layer may contain very complex polynomials.

The layer building of the GMDH procedure continues as long as the evaluation criteria continue to diminish. Each time a new layer is built the GMDH algorithm checks whether the new evaluation criterion is lower than the previous one and, if this is so, continues training; otherwise, it stops training.