## Regression Analyses

Power Efficiency Guide

Get Instant Access

Regression tells us how a dependent variable - energy - is related to the independent one - production - by providing an equation that allows for estimating energy consumption for the given production output. This relationship between production and energy consumption for most industries is a linear form (Fig. 3.11), which means that the relationship between the points in the graph can be approximated by a straight line and expressed by a linear equation (3.7) in a general form as follows:

where a and b are constants that need to be calculated for each data set. x is an independent variable, which is production output in our case and will be denoted as 'P'. y is dependent variable, which is energy in our case, so we will denote it by 'E'. Now the same equation can be rewritten as follows:

Such a linear equation will produce a regression line that 'fits' through the uneven scatter of data points. In principle, we can draw many lines through a single data scatter. It can even be done manually, which most of us have probably forgotten since a computer is now on everyone's desk.

Figure 3.11 Relationship between Production and Energy Consumption

Production [t/m]

Figure 3.11 Relationship between Production and Energy Consumption

For a particular data set, if the constant values of 'a' and 'b' are calculated by the 'least square method', the resulting line will go through the center of the data scatter and therefore is called a 'best-fit line'. This line has a particular property - the sum of the vertical distances of data points from the best fit line is equal to zero. Nowadays, spreadsheet software will offer a 'best-fit' approximation based on the least square method, draw a line and give the equation as shown in Figure 3.11. For the presented set of data, the best-fit approximation is as follows:

The regression equation given by the least square method is the best approximation we can obtain of the underlying energy/production relationship, assuming linearity over the relevant data range.

However, we should be aware that this is just a mechanical process of fitting a regression line to the data set and which does not take into account any information about the actual environment from which the energy/production data comes. Spreadsheet software will produce an equation for any data set entered into a computer. If the data is wrong, or linearity does not hold, the software will still generate an equation. It is our - user's - responsibility to assure that input data is a correct reflection of the energy/production relationship that is analyzed. Of course, it is assumed that measurement errors are within pre-set acceptable ranges. If that is so, spreadsheet software will produce values for constants a and b that will in turn determine the regression equation and best-fit line.

If the best-fit line is extended to the y-axis (the dotted line in Fig. 3.11), it will intercept the y - axis at the value y = b. The slope of the line (a) that goes through the scatter must always be positive, which means that if production increases, energy consumption must also increase, or vice versa.

Here, we should point out that regression analysis is limited only to the range of actual observations that quantify the relationship between energy and production. No relationship exhibits the same pattern at all activity levels. It also means that linearity may apply only over the observed range of production outputs. Consequently, one should be careful when making conclusions based on an extrapolation of properties outside the relevant data range.

At a very low production output (say less than 30 % of nominal value), the linearity may be lost and replaced by some form of a polynomial, power, logarithmic or exponential relationship. On the other hand, we can assume that relationship will be linear in segments over a range of production output levels. Still, a very low production output is not a normal operating state, because plants are not designed and built to operate continuously at low capacity. Therefore, instead of trying to fit the data with some exponential regression line, a question should be asked firstly as to why at all is the plant running at such a low output?

If there are any material changes in the underlying energy/production relationship, i.e. expansion of production capacity, installation of more efficient machines, etc., a sufficient amount of new energy/production data (after relevant change) have to be collected and then the regression equation must be updated.

When handled with care, the regression equation proves to be a quite useful tool in the performance evaluation process. But, if the regression analysis is conducted without giving any thought to the context of the analysis, it can do more harm than good.