An Improved Resilient Propagation Algorithm by Introducing Deterministic Weight Modification and Magnified Gradient Function

School of Science and Technology 科技學院
Computing Programmes 電腦學系

An Improved Resilient Propagation Algorithm by Introducing Deterministic Weight Modification and Magnified Gradient Function

Alan FUNG


Programme	Bachelor of Computing with Honours in Computing
Supervisor	Dr. Vanessa Ng
Areas	Neural Networks, Algorithms
Year of Completion	2010

Objectives

The aim of project is to improve an existing training algorithm on feed-forward neural network. This algorithm will base on Resilient Propagation (RPROP) and try to improve the global convergence capability and speed up the convergence rate. The main approach is modifying RPROP based on the theories of Deterministic Weight Modification (DWM) and Backpropagation with Magnified Gradient Function (MGFPROP).

The following describes the objectives concerning the development of a novel training algorithm based on RPROP and augmented with DWM and MGF:

Design and Implement the algorithm using Java: DWM for improve the global convergence capability and MGF for speed up the convergence rate and capability.
Evaluate and compare the performance of the new algorithm with existing ones based on standard data sets.
Develop graphical tools for visualization and analysis of convergence rate and global convergence capability.

This report would introduce three new algorithms, which are RPROP with DWM, RPROP with MGF and the hybrid algorithm – RPROP with MGF & DWM. RPROP with DWM would modify the weight of RPROP in a deterministic way, while RPROP with MGF would magnify the gradient function of RPROP. The hybrid algorithm would be the integration of these 2 algorithms. Besides, the system with GUI would be implemented by using Java.

Background and Methodology

Nowadays, machine learning is the major development trend on computer technology, people all over the world use computers for helping them to solve problems. Neural network is the most widely used supervised machine learning technique. Algorithms like Backpropagation (BP), Quickprop and Resilient Propagation (RPROP) is now extensively applied in the training of multi-layer feed-forward neural networks. BP is the basic algorithm of multi-layer feed-forward neural networks but the training is too slow and the error is easily trapped in local minimum. Quickprop is a faster algorithm but fluctuation of error is great and the error is also easily trapped in local minimum. The following shows an architectural graph of multi-layer perceptron neural network:

RPROP is also a faster algorithm with good stability but sometimes the global convergence capability is low. RPROP was chosen to be modified because it has a fast learning speed and it is easy to implement. Besides, many scholars have brought up different varied algorithms that countered with the weaknesses of the above algorithms, like Deterministic Weight Modification (DWM) and Backpropagation with Magnified Gradient Function (MGFPROP). DWM is proposed to decrease the value of total error function whether the program is getting stick in local minimum. MGFPROP is another algorithm based on BP that counters with the convergence rate. These two theories can also be applied in RPROP and give better performance.

This project mentioned two approaches, the Magnified Gradient Function (MGF) and Deterministic Weight Modification (DWM), to augment RPROP. The purpose of MGF is to improve the learning speed by magnifying the gradient function of the algorithm and DWM is to modify the particular weight in a deterministic way. For fitting RPROP, the theory of DWM has been changed so that the particular weight can be modified more accurately.

The UCI benchmarking problems has been tested and the performance of the algorithm of RPROP with MGF have improved, while the algorithm of RPROP with DWM increase the percentage of converged case compared with the original RPROP and other learning algorithms. Moreover, the hybrid algorithm of the above two approaches makes a better performance than the above two algorithms mentioned in average. There is different improvement level on this hybrid algorithm in different benchmarking problems. The augment level and the performance comparisons of other RPROP variants are discussed. The implementation of the whole programs will use Java language based on the source code of RPROP given by supervisor which is written in C. Because an object-oriented language has a better structure design and is easy to build a user interface.

Deterministic Weight Modification (DWM) between output and hidden layer was introduced to modify weights in RPROP. DWM is a deterministic approach and it is proposed to modify the particular weight instead of modifying randomly. It is a reserved method of the forward pass and the aim is to find out the corresponding change of the weights. Since DWM is initially applied in BP, the theory needs to make some changes for fitting with RPROP in order to improve global convergence capability more efficiency. Another approach is to modify RPROP by adding Magnified Gradient Function (MGF). MGFPROP is an algorithm based on BP and has an improved convergence rate and a better global convergence capability by strengthen the gradient term of BP.

DWM and MGF can be integrated together to improve the performance of RPROP since they modified RPROP in different ways. DWM increase the global convergence capability by modifying particular weight, while MGF improve the performance by magnifying the local gradient of actual output. The following shows the algorithm structure of hybrid algorithm:

In this hybrid algorithm, MGF will be the main improvement tools for RPROP, because DWM will only be executed when RPROP with MGF trapped in local minimum. So the simulation result of this hybrid algorithm will similar to the result of RPROP with MGF. For evaluation, the algorithms will be tested by 6 problems including the character recognition problem, the 5-bit counting problem, the thyroid problem, the wine problem, the breast cancer problem and the iris problem. 4 of them are UCI benchmarking problem which are the real case problem and the other 2 which are the character recognition problem and the 5-bit counting problem is the small scale problem. Most of them are the difficult problem for BP, Quickprop, RPROP and even some RPROP variants. The performance of RPROP with DWM, RPROP with MGF and the hybrid algorithm will be illustrated by these problems. The characteristic and the network architecture of the problems will be described on the table below, where P, N, K and M represent the number of pattern, input, hidden, and output node, respectively. The problem descriptions is shown below:

Problem	Description	Network Architecture P-N-K-M	S
Character Recognition	Recognize the inputs from 8 x 8 matrix to the 26 lower case letter set [a, …, z].	26 – 64 – 13 – 26	2
5-bit Counting	Count the number of '1's from the 5 input units.	32 – 5 – 9 – 6	2
Wine	Classify the type of wines in the same region in Italy by the quantities of constituents.	178 – 13 – 15 – 3	5
Breast Cancer	Recognize the patients of breast cancer by the characteristics of the cell nuclei.	699 – 9- 14 – 1	5
Iris	Classify the type of iris plant.	150 – 4 – 14 – 3	3
Thyroid	Predict the state of the thyroid gland of patients.	215 – 5 – 6 – 3	3

Implementation

The whole program is written by Java because it has maturity architecture for designing Graphical User Interface (GUI) and drawing graphs. The interface has a list of parameters for let users to select and a button for execution. Users can choose which problems and the algorithms used in the training. The error level, maximum iteration and the number of weight sets are default as 0.001, 5000 and 100 which is the same as the evaluation part of this report. But users can change these parameters for other uses. Other parameters can also be modified inside the 'Setting' menu. The following shows the main user interface:

The value of S can be modified only when the algorithm chosen was RPROP with MGF or the hybrid algorithm. When the algorithm is selected to be others, S will become 1 and cannot be modified. An example is shown below:

The system can allow users to save the trained weight set, the flow of the system error and the training record. The function of save the trained weight set and the flow of the system error can be selected only if a specific weight set and specific hidden node is chosen. And the plot graph function can be selected only if a specific weight set and specific hidden node is chosen also.

In the graph plotting frame, the line curve of the flow of error will be shown in green and the position of error in each epoch will be shown in purple. The x-axis consists of the final epoch of the error and a certain number of epochs. The y-axis consists of the error level (default as 0.001), the system error in the first epoch and the final system error if fail to converged. A line of error level will be displayed in grey for specification. The following shows a screen capture for broken line graph with a converged case:

The following shows a screen capture for broken line graph with a un-converged case:

Evaluation

Concerned with the integration of RPROP with DWM, The methodology of this algorithm shows that the biggest reason of low global convergence capability in RPROP is the 'extreme error' occurs in some output nodes. Below shows the convergence behavior of RPROP with DWM and the original RPROP in the wine problem. Note that i represents the number of iterations and E(i) represents the system error at the ith iteration.

From the convergence behavior of RPROP with DWM, some properties have been shown. At the beginning, the system error of RPROP with DWM and the original RPROP are the same. When the algorithm judged the error cannot converge, DWM would be executed and the system error would be raise up. After that, the algorithm will execute RPROP and drop quickly from the spur. The algorithm may execute many time if it keeps finding an occurrence of 'extreme error'. In the above case, RPROP has trapped in local minimum and the error cannot converged, while DWM has been executed several times in RPROP with DWM and its error finally converged.

One of the properties of RPROP with DWM is the fluctuation of system error. It is a normal phenomenon and the stability of algorithm would not be affected. Because when every time the error rises up, a particular output neuron would be fixed up. The error will then drop down very soon because the properties of RPROP is the error drop very fast at the beginning. The following shows an example of such error:

Concerned with the integration of RPROP with MGF, the parameter S in the algorithm is the most important factor which can affect the performance of the algorithm. If S increased, the situation can be changed. The can be scale up greatly and the value of local gradient δpm(i) and δpk(i) can be more obvious. The following shows a graph which is the relationship between f'(x, S) and f(x) in different value of S:

Below shows the convergence behavior of RPROP with MGF and the original RPROP in the 5-bit counting problem. Note that i represents the number of iterations and E(i) represents the system error at the ith iteration.

The final approach is integrating RPROP, DWM and MGF and form a hybrid algorithm. Below shows the convergence behavior of the hybrid algorithm which is RPROP with MGF & DWM, RPROP with MGF and the original RPROP in the breast cancer problem. Note that i represents the number of iterations and E(i) represents the system error at the ith iteration.

The graph above is a classic convergence behavior for RPROP. The convergence behavior of RPROP with MGF and the hybrid algorithm are the same because DWM has not executed in this case. MGF keeps the characteristic of RPROP and performs a smooth curve of system error converge. It also shows that combining DWM would not lower the performance of the global convergence capability of both RPROP and RPROP with MGF.

Below shows the convergence behavior of the hybrid algorithm, RPROP with MGF and the original RPROP in the iris problem. Note that i represents the number of iterations and E(i) represents the system error at the ith iteration.

The convergence behavior of RPROP with MGF and the hybrid algorithm are different because DWM has executed in this case. The error has a little fluctuation in the curve of hybrid algorithm indicate that DWM has been executed. And the convergence rate can be improved in the hybrid algorithm. Below show the performance between the hybrid algorithm, RPROP with DWM, RPROP with MGF and other learning algorithms. 'Rate' means the average number of iteration to converge, which is inversely proportional to the convergence rate. '%' means the global convergence capability, and 'FAIL' means a learning algorithm cannot converge within 5000 iterations for all the 100 different runs.

Learning Algorithm	Character Recognition	5-bit Counting
	Rate %	Rate %
BP	1221 (89%)	FAIL (0%)
Quickprop	50 (100%)	528 (67%)
iRprop	37 (90%)	204 (20%)
RPROP	39 (86%)	209 (15%)
RPROP with DWM	42 (92%)	376 (91%)
RPROP with MGF	36 (92%)	199 (31%)
Hybrid Algorithm	36 (95%)	188 (98%)

Conclusion and Future Development

The project has investigated 3 new algorithms, which is RPROP with DWM, RPROP with MGF and the hybrid algorithm to improve the performance of RPROP in different approaches. RPROP with DWM was proposed to increase the global convergence capability of RPROP by modifying a corresponding weight. It finds out an extreme error in a corresponding output neuron and reduces the corresponding error by modifying a corresponding weight. The simulation result shows that RPROP with DWM can successfully improve the global convergence capability of RPROP in all 6 problems tested, and even higher than 90% in 3 of them. In the 5-bit counting problem, the differences of improvement between RPROP with DWM and the original RPROP are 76%. But the convergence rate of RPROP with DWM would lightly decrease. It due to the classification process of judging the case is converged or not.

RPROP with MGF was proposed to increase both the convergence rate and the global convergence capability of RPROP by magnifying the gradient function and without increasing the complexity of the algorithm. The simulation result shows that RPROP with MGF can successfully improve both the convergence rate and the global convergence capability of RPROP, especially in real case problems. In the breast cancer problem, the differences of improvement of convergence percentage between RPROP with MGF and the original RPROP are even 91%, and the percentage of 4 of problems tested are higher than 90%. But the disadvantage is S is needed to be reset when applying to different problems.

The hybrid algorithm is the integration of these 2 approaches. The convergence analysis of it shows that it carries the characteristic and the advantages of them and without any side effect. In the simulation result, the global convergence capability on all problems is further improved compare with RPROP with MGF, RPROP with DWM and the original RPROP, and the percentage of 5 of problems tested are higher than 90%. It also fixed the low convergence rate problem in RPROP with DWM and some of them are even higher than RPROP with MGF. Overall, it outperformed BP, Quickprop, RPROP and iRprop for most of the adaptive problems in terms of convergence rate and global convergence capability.

The algorithm of RPROP with DWM is not so perfect in this stage. Firstly, there are 95% un-converged cases are suffered in the 'extreme error' problem. The reason of the other 5% can be explored and a solution can be found out for them. Also, the maximum number of iteration may not enough in the evaluation especially in the iris problem. Increasing the maximum number of iteration runs may lead the differences of convergence rate of different algorithms can be more obvious.

Finally, the interface part may also be improved. The graph plotting function can be improved by allowing more than 1 curve existed on the graph. For example, the convergence behavior of different algorithms can be displayed on 1 graph by using this approach.