Using a Sensitivity Measure to Improve Training ...

Viewer
Transcript

2006 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006

Using a Sensitivity Measure to Improve Training Accuracy and Convergence for Madalines Yingfeng Wang and Xiaoqin Zeng Abstract—Madalines with discrete input, output and activation function are suitable for solving many inherently discrete problems and meanwhile are more facile for implementing and less complex for computing than their continuous counterparts. However, there has not yet been efficient training algorithm for Madalines. By now the most popular one must be the MRII proposed by Winter and Widrow [1] [2]. In this paper, based on the MRII, we present a new algorithm to improve the training accuracy and convergence for Madalines. In our algorithm, a sensitivity measure is used to replace the confidence measure used in MRII so as to better satisfy the minimal disturbance principle. Computer simulations are run to verify the effects of our training algorithm. The experimental verification shows that our algorithm has higher success rate and faster convergence speed than the MRII.

T

I. INTRODUCTION

he most significant property of neural networks is that they have the ability to learn. Hence the design and implementation of learning or training mechanism is a key issue for all kinds of neural networks. In this paper, we concentrate on the training algorithm of Madalines and aim at increasing its success rate and convergence speed. The Madaline is a discrete feedforward multilayer neural network with supervised learning mechanism. Although Madalines have been superseded to a large extent by the more computational powerful Multilayer Perceptrons (MLPs), the continuous multilayer feedforward neural networks, they are still important because they are suitable for handling many of the inherently discrete tasks, such as signal processing and pattern recognition. Furthermore, their discrete feature can facilitate hardware implementation, cost less for fabrication, reduce computation complexity, and be computationally simple to understand and interpret. It is well known that the back-propagation (BP) algorithm is the most mature learning algorithm for the feedforward multilayer neural networks with continuous activation functions. Actually a Madaline can be regarded as a specific case of the feedforward multiplayer neural networks. Unfortunately, the BP algorithm can’t be directly applied to Madalines due to their hard-limiting activation function being not differentiable. In literature, there are many studies on Madalines’ learning. Winter and Widrow [1] [2] first investigated the learning Yingfeng Wang is with the Department of Computer Science and Engineering, Hohai University, Nanjing 210098, China. (email: [email protected]). Xiaoqin Zeng is with the Department of Computer Science and Engineering, Hohai University, Nanjing 210098, China (email: [email protected]).

0-7803-9490-9/06/$20.00/©2006 IEEE

algorithm for Madalines, and proposed the MRII algorithm. The MRII algorithm trains a Madaline by iteratively adapting the weights of its neurons, also called Adalines, from the first layer to the last output layer. In order to meet the requirement of the minimal disturbance principle represented by [3], the absolute value of the sum of weighted input elements, referred as confidence in [1] [2], is employed as a measure to determine the order of Adalines to be adapted in a layer. The adaptation will be done from the Adaline with the least confidence. But this algorithm is difficult to be efficiently parallelized, and the application of this algorithm is hampered by its low success rate. Later on, Kim and Park [4] proposed the expand-and-truncate (ETL) algorithm. Based on the ETL algorithm, Yamamoto and Saito [5] further proposed the improved expand-and-truncate (IETL) algorithm. However, these two algorithms are only suitable for Madalines with one hidden layer. Recently, Zhu et al. [6] employed the conventional BP algorithm to train modified Madalines whose biases are all random variables with smooth distribution. But, the modification of the structure may limit the application scope of Madalines. In our study, a new learning algorithm based on the MRII algorithm is developed. We introduce a sensitivity of Adalines, which is defined as the probability of an Adaline’s output inversions due to its weight variation with respect to all possible input patterns. Our algorithm employs the sensitivity of Adalines, instead of the confidence of Adalines, as a measure to determine the adaptation order of Adalines in a layer. In our algorithm, the weights of the Adaline with the least sensitivity value will be considered for adaptation first. The advantage of our algorithm is that the sensitivity measure satisfies the minimal disturbance principle much better than the confidence measure and thus can be helpful to increase the success rate and convergence speed of the learning algorithm. Experimental results have demonstrated the helpfulness of the sensitivity measure even if our algorithm still needs to be improved before being mature. The rest of the paper is organized as follows. In the next section, the Madaline model and the sensitivity measure are briefly described. Then, a review and an update of the MRII algorithm are given in Section III. Verification experiments and the corresponding results are discussed in Section IV. Finally Section V concludes the paper and discusses our future work on Madalines’ learning.

1750

II. THE MADALINE MODEL AND THE SENSITIVITY MEASURE

that layer have the same input vector X . The weight set of

This section is the preliminaries for the following discussions, in which some details, such as definitions and notations, of the networks and the sensitivity measure are given.

the layer is W = {W1 ,..., Wnl } , and the output vector of

A. The Madaline model Madalines, being a kind of feedforward multilayered neural networks with discrete input, output and activation function, consist of a set of Adalines that work together to establish an input-output mapping. An Adaline is a basic building block of the Madaline. With n binary inputs and one binary output, a single Adaline is capable of performing certain logic functions. Without losing generality, we assume in the paper that each element of the input takes on a bipolar value of either +1 or -1 and is associated with an adjustable floating-point weight. The sum of weighted input elements plus a bias is computed, producing a linear output or analog output, which is then fed to an activation function to yield a digital output. To be consistent with the bipolar inputs and outputs, the activation function is the commonly used symmetrical hard-limiting function: x≥0 1 f (x) = (1)

x < 0. -1 A Madaline is a layered network of Adalines. Links only exist between Adalines of two adjacent layers, and there is no link between Adalines in the same layer and in any two non-adjacent layers. All Adalines in a layer are fully linked to all the Adalines in the immediately preceding layer and all the Adalines in the immediately succeeding layer. At each layer, except the input layer, the inputs of each Adaline are the outputs of those Adalines in the previous layer. In general, a Madaline can have L layers, and each layer l

(1 ≤ l ≤ L) has n l

(n l ≥ 1) Adalines. The form

n 0 − n1 − ... − n L is used to represent a Madaline with a l given structural configuration, in which each n (0 ≤ l ≤ L) not only stands for a layer from left to right including the input layer, but also indicates the number of 0

Adalines in the layer. n is an exception, which refers to the L

dimension of input vectors. n refers to the output layer. Since the number of Adalines in layer l − 1 is equal to the output dimension of that layer, which is in turn equal to the input dimension of layer l , the input dimension of layer l is

n l −1 . For Adaline i (1 ≤ i ≤ n l ) in layer l , the input vector is X = ( x1 ,..., x n l −1 ) l

l

l

T

, the weight vector is

Wi l = ( wil1 ,..., winl l −1 ) T , its bias is θ il and the output is

y il = f ( X lWi l + θ il ) . For each layer l , all Adalines in

l

l

l

l

the layer is Y = ( y1 ,..., y nl ) . For an entire Madaline, the l

input

vector

l

is

l

X1

T

or

Y0 ,

the

weight

is

W = W ∪ ... ∪ W Wi′ l = ( wi′1l ,..., win′ ll −1 ) T and θ i′l be the corresponding 1

L

, and the output is Y

L

. Let

varied weight vector and bias respectively. B. The sensitivity measure of Adalines Usually, the sensitivity of a neural network reflects the effects of its parameters’ variation on its output. In the training of a Madaline, it is hopeful to know how the adaptation of an Adaline’s weights will result in the Adaline’s output inversion and finally affect the Madaline’s output. So, the sensitivity of Adalines to weight variation can be a useful measure for selecting appropriate Adaline for adaptation during the training. Taking all possible input patterns into consideration, we adopt the following definition for the sensitivity of Adalines. Definition: The sensitivity of an Adaline is defined as the probability of output inversions of the Adaline due to its weight variation with respect to all inputs, which is expressed as

s=

N err , N inp

(2)

where N err is the number of output inversions arising from weight variation with respect to all input patterns, and N inp is the number of all inputs. In order to compute the sensitivity, we assume that all n-dimensional inputs of are uniformly distributed, so N inp is n

equal to 2 . One direct way for computing the sensitivity is to follow the Adaline’s working process, one input at once, to compute the outputs with both given and varied weights for n

all 2 inputs. For each input, the cost of additions and multiplications is O( n) , so the total time complexity is

O(n2n ) in this brute-force way. Obviously, the complexity will be very high when n is large. In [7], we presented a heuristic algorithm with less complexity for computing the sensitivity. In the derivation of the algorithm, a geometric model is first established, and then analytical geometry and tree techniques are employed to farthest reduce the computational complexity. From geometric point of view, all vertexes of a hypercube can be employed as mathematical model to express an Adaline’s input space, and the hyperplane P : n

∑x w

1751

j =1

j

j

+ θ = 0 can be regarded as a dividing plane that

may in general divide the input space into three parts, namely the vertexes on P , the vertexes in one side of P and the vertexes in the other side of P . According to (1), the outputs of the Adaline are always 1 for the input vertexes on P , the same for the input vertexes in the same part, and just opposite for the input vertexes in the two opposite parts. Because of the weight variation, P is changed to n

P ′ : ∑ x j w′j + θ = 0 , and this change may cause some j =1

input vertexes that previously belong to one part under P ’s division fall into another part with the opposite output under P′ ’s division. Under this model, the computation of the sensitivity is attributed to compute the number of such input vertexes because it is exactly equal to N err . Based on the geometric model, it is clear that the complexity of the computation of the sensitivity is closely related to the relative positions of the two hyperplanes, i.e., the given weights and their variations. Fortunately, the positional relationship among the hypercube and the two hyperplanes can be easily located by means of analytical geometry technique. For some cases, for example, the hypercube places in the middle of the two hyperplanes without intersection or is crossed by the two hyperplanes from each of its edges parallel to a given axis, the computational complexity of the sensitivity is only O(n log 2 n) . As to the other cases, the computational complexity is dependent on the intersections of the hypercube and the two hyperplanes, the less the intersections are, the less the complexity will be. In the algorithm, tree technique is employed to restrict the complexity to be between

O(n log 2 n) and O(2n ) . Obviously, the algorithm is more efficient in an average sense. For the details of the algorithm, please refer to [7]. In the following sections, we will use the algorithm to compute the sensitivity of Adalines. III. THE TRAINING ALGORITHM IMPROVED FROM MRII In this section, The MRII algorithm is first briefly reviewed, and then a new training algorithm aiming at better satisfying the minimal disturbance principle is given on the basis of the MRII algorithm by replacing the confidence with the sensitivity.

For a given layer, the trial will be done according to the minimal disturbance principle, which states that adaptations made to the network’s weights in order to correct erroneous responses for a particular input sample should disturb the responses to other input samples as little as possible. The ultimate purpose of this principle is to disturb the established mapping for other input samples as little as possible. Winter and Widrow used the absolute value of analog output of Adalines, called confidence, to measure the disturbance degree. The MRII algorithm first sorts all Aadalines in one layer by their confidence. Then reverse outputs of Adalines one by one from the Adaline with the least confidence to the one with the most confidence. If one trial doesn’t decrease the number of output errors, let the Adaline resume its previous output value and begin to the next trial. After all single Adalines in the layer are exhausted, the trial involving two Adalines at a time will be handled. The sequence of pairwise trials is the least and the second least confident Adalines, the second and the third least confident Adalines, the third and the forth least confident Adalines, and so on. If the trial fails to reduce output errors, reset the outputs of Adalines involved in the current trial. Until all pairwise trials are completely performed, three-wise trials are implemented in the similar way. If one trial can improve the output performance, the LMS algorithm will be employed to update weights of Adalines involved this trial. In order to improve the algorithm’s performance, Winter modified some details in [2]. Firstly, he allowed only half of Adalines per layer to participate the trial. Secondly, he required the confidence to multiply a factor, named gain, before being sorted. In [2], the gain is defined as follows,

gain = 1 +

adaptation count 5∗ N ,

(3) where N is the number of samples in the training data set and the adaptation count is the number of an Adaline being accepted for adaptation. Thirdly, the number of Adalines in a trial should be no more than three. In a summary, the MRII algorithm can be described as follows: 1. Randomly select a sample from the training data set; 2. If the net responses correctly to the sample, go to step 4; 3. For layer l from 1 to L do: 3.1. Sort the Adalines in layer l by confidence∗gain;

A. Review of the MRII algorithm In [1], Winter and Widrow presented the profile of the MRII algorithm. The algorithm tries to adapt weights when a Madaline network can not give an ideal output for a given input sample. When output errors occur, Adalines in the Madaline will be adapted in an order from the first layer to the output layer. Once a trial, which needs to reverse one or more Adalines’ outputs, is accepted, the algorithm should go back to the first layer, unless the trial happens in the first layer. 1752

⎡ nl ⎤ ⎥ } do: ⎢2⎥

3.2. For pair number k from 1 to min{3, ⎢

3.2.1. Implement all possible k-wise trial in the

⎡ nl ⎤ ⎢ ⎥ least confident Adalines; ⎢2⎥ 3.2.2. If one trial can not reduce output errors, do: Restore the outputs of the Adalines in this trial to their previous values;

else do: If l > 1 ，employ the LMS algorithm to update the weights of the Adalines involved; go to step 3 (back to layer 1); else record this trial; 3.3. If l = 1 , employ the LMS algorithm to update the weights of the Adalines involved in all successful trials; 4. If the number of errors for all training samples meets the requirement, or the number of average training times for each sample is more than a given number, stop; else go to step 1. B. Using the sensitivity to update the MRII algorithm After carefully analyzing the confidence, we find it has some disadvantages as a criterion to decide the order in the trial. Firstly, updating the weights of the Adaline with the least confidence can not promise the least modification of the network. It is know that the modification degree is

their sensitivity values multiplying the gain value. In addition, because the sensitivity measure can be helpful to mostly decrease the effect of the weight adaptation for one input sample on the output of all other input samples, our training algorithm could select training samples in a given order instead of random selection used in MRII, that may lead to the unstable training performance. All in all, the improved training algorithm can be presented as follows. 1. Select a sample in a fixed order from the training data set; 2. If the net responses correctly to the sample, go to step 4; 3. For layer l from 1 to L do: 3.1. Obtain weights’ variations that cause output inversion of the Adalines in layer l by LMS; 3.2. Computer sensitivity values of the Adalines in layer l by the algorithm given in [7]; 3.3. Sort the Adalines in layer l by sensitivity∗gain;

⎡ nl ⎤ ⎥ } do: ⎢2⎥

3.4. For pair number k from 1 to min{3, ⎢

n

θ ′ − θ + ∑ wi′ − wi , where wi′ and θ ′ are the varied

3.4.1. Implement all possible k-wise trial in the

⎡ nl ⎤ ⎢ ⎥ least confident Adalines; ⎢2⎥

i =1

weight and bias. Since the modification degree has no direct connection with the confidence, the modification of the Adaline with the least confidence does not sometimes lead to the least modification of the network. Further, in some cases, even the least modification may not realize the least change to the mappings established by other input samples. For example, if an Adaline’s confidence for other input samples is also very small, their input-output mappings may be changed easily, though the modification degree is in a low level. Hence, the confidence for an input sample is not a suitable criterion to evaluate the degree of the variation for already established mappings by previous input samples. This inspires us to find another measure to directly evaluate the effect of the weight modification on an input-output mapping of a Madaline. In our approach, the sensitivity measure is proposed to replace the confidence. According to the definition of the sensitivity given in Section II, sensitivity measure indicates the effect of weight variation on the input-output mappings with respect to all possible input patterns. It means that if an Adaline has the least sensitivity, its output for all input patterns will be varied least with a given weight modification. Since an Adaline’s weight adaptation should make its output reverse for a given input sample in training, the LMS algorithm needs to be employed to compute the variation of weight. Hence, for each Adaline and a given input sample, the variations of weight and bias can be obtained by running the LMS algorithm. With the obtained variations, we can compute the sensitivity of Adalines in a layer by the algorithm given in [7], and then easily sort the Adalines in the layer according to their sensitivity values. Therefore, the step 3.1 in the MRII algorithm can be replaced by sorting Adalines in layer l by

3.4.2. If one trial can not reduce output errors, do: Restore the outputs of the Adalines in this trial to their previous values; else do: If l > 1 ，employ the LMS algorithm to update the weights of the Adalines involved; go to step 3 (back to layer 1); else record this trial; 3.5. If l = 1 , employ the LMS algorithm to update the weights of the Adalines involved in all successful trials; 4. If the number of errors for all training samples meets the requirement or the number of training times for each sample is more than a given number, stop; else go to step 1. It is worth while to notice that, in order to compute the sensitivity of each Adaline in a layer, our algorithm needs to obtain each weight variation that will result in the inversion of the corresponding Adaline’s output. This can be done by applying the LMS algorithm to each Adaline in the layer. However, comprared to the MRII algorithm, the complexity of the computation of Adalines’ sensitivity is higher than that of the computation of Aadalines’ confidence. Hence, for each sample to which the network can’t give correct response, the complexity of our algorithm is higher than that of the MRII algorithm. But another important factor that affects the time complexity of the training process is the convergence speed, i.e., the iteration times for feeding training samples or the total number of adaptations of Adalines during the training.

1753

Since our sensitivity measure can more precisely locate the Adaline for adaptation, our algorithm can quickly make most samples to have correct response, and perform fewer adaptations of Adalines in training process. This is one advantage of our algorithm over the MRII algorithm. Another advantage of our algorithm is a higher training success rate over that of the MRII. Those advantages have been demonstrated by the experiments discussed in next section. IV.

EXPERIMENTAL VARIFICATIONS

To verify the effects of our algorithm, a number of computer simulations were performed. We aimed at comparing, under identical conditions, the training accuracy and the convergence speed of our algorithm with that of the MRII algorithm. In [2], Winter employed an emulator problem to verify the effectiveness of the MRII algorithm. So, the emulator problem was also used in our experiments. In the emulator problem, a given network with fixed parameters, which weights are randomly obtained, acts as a reference network. This reference network generates an input-output mapping for adaptive networks to learn. The adaptive networks and the reference network of course should have the same input dimension and output dimension. In our experiments, the architecture of the reference network is 16-3-3, and random seed is 1000000 for yielding its weights. The architecture of the adaptive networks is fixed at 16-5-3. There are 650 samples selected from the reference network for the training data set. Although the emulator problem permits researchers to design experiments freely, especially in the selection of architecture of the reference network and the samples, it is hard for other researchers to recreate the same experiments. Hence, the monk’s problems from UCI repository (http://www.ics.uci.edu/~mlearn/MLRepository.html) are also selected as experimental examples in our experiments. Except the Id attribute that is unique for each sample, the monk’s problem has seven attributes including a class (output) attribute indicating two classes and six input attributes. Among the input attributes, four of them have three or four possible values, and two of them have two possible values. Since Madalines’ input elements are binary, we need to use two input elements to represent each attribute with three or four values and one input element to represent each two-value attribute. So, the experimental networks for the monk’s problem have a 10-dimensional input and 1-dimensional output. There are three training data sets in monk’s problems. We used each of them to train three networks with different architectures of 10-3-1, 10-6-1 and 10-9-1. In the experiments, the random seed for yielding the initial weights of the networks is 20, and the learning rate and the absolute value of training goal for the LMS algorithm are set to be 0.01 and 0.1. During the training, three important item values were collected for both our algorithm and the MRII algorithm, they are success rate, total adaptation number and

total training time. The success rate is the percentage of training samples having correct outputs from the network in training. The adaptation number indicates the number of the actions for adapting Adaline’s weights. Since the MRII algorithm selects the training sample randomly and ours selects training samples in a fixed order, we took the average training times for each sample as a measure in our experiments. All experimental results are listed in eight tables, in which the first two are for the emulator problem and the other six are for the monk’s problems. For each training data set and a network with given architecture, training is performed with both the MRII algorithm and our algorithm by feeding the training samples to the network one after another. In order to realize different purposes, we set two stop criterions for running the two algorithms. One is the average training times for each sample is no more than 1000 so as to compare the success rate of the two algorithms. The experimental results are presented in Table I, III, V and VII. They demonstrate that our algorithm has higher success rate in most cases. The other stop criterion is the success rate is more than 95% or the average training times for each sample is no less than 10000 so that we can compare the adaptation number (convergence speed) of the two algorithms. The experimental results, given in Table II, IV, VI and VIII, demonstrate that our algorithm has smaller adaptation number (higher convergence speed) in most cases. It is noticed that the total training time of our algorithm is longer than that of the MRII in most cases of the experiments, but the adaptation number of Adalines of our algorithm is smaller than that of the MRII. These phenomena reveal that our algorithm takes more time to compute weight variations and the sensitivity, while less time to accurately locate and adapt Adalines. This clearly verifies what we expected that the sensitivity measure is better than the confidence measure in satisfying the minimal disturbance principle in the Madaline’s training. V. CONCLUSION In this paper, a new algorithm based on the MRII algorithm is presented, which employs a sensitivity measure of Adalines to better realize the minimal disturbance principle. The experimental results indicate that the success rate and the convergence speed of our algorithm are mostly better than that of the MRII algorithm. Although the sensitivity measure can improve the training efficiency in success rate and convergence speed, two critical factors for a learning algorithm, the experimental results also show that the total training time of our algorithm is mostly longer than that of the MRII algorithm. This is, as we mentioned in last section, due to the use of LMS algorithm to determine the parameter of weight variations for every Adalines in a layer and the heavy computation of the sensitivity. To overcome this shortage, we have been considering investigating approximate way to replace the current precise but time-consuming way. Actually, we have

1754

already developed a statistical approximate way [8] for computing the sensitivity of Adalines with less time complexity. Our future work will try to find other appropriate way of using the sensitivity to quickly and precisely locate the Adalines that need to be adapted, and to automatically and efficiently select most of the coefficients and parameters that at present are determined by empirical or even blind trials.

TABLE II EXPERIMENTAL RESULTS FOR THE EMULATOR PROBLEM UNDER THE SECOND STOP CRITERION

Our algorithm Architecture Training times for each sample Success rate Adaptation number Total time (minute) The MRII algorithm Architecture Average training times for each sample Success rate Adaptation number Total time (second)

ACKNOWLEDGEMENT This work was supported by the Provincial Natural Science Foundation of Jiangsu, China under Grant BK2004114 and the National Natural Science Foundation of China under Grant 60571048. REFERENCES [1] [2] [3] [4] [5]

[6]

[7] [8]

R. Winter, and B. Widrow, “ Madaline Rule II: A Training Algorithm for Neural Networks,” IEEE International Conference on Neural Networks, vol.1, pp. 401 - 408, 1988. R. Winter, “Madaline Rule II: A New Method for Training Networks for Adalines,” PhD Dissertation, Stanford University, 1989. W. Ridgway III, “An Adaptive Logic System with Generalizing Properties,” PhD Dissertation, Stanford University, 1962. J. Kim, and S. Park, “The Geometrical Learning of Binary Neural Networks,” IEEE Transactions on Neural Networks, vol.6, no.1, pp.237-247, 1995. A. Yamamoto, and T. Saito, “A Flexible Learning Algorithm for Binary Neural Networks,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E81-A, no.9, pp. 1925-1930, 1998. H. Zhu, K. Eguchi, and T. Tabata, “A Training Algorithm for Multilayer Neural Networks of Hard-Limiting Units with Random Bias,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol.E83-A, no.6 pp.1040-1048, 2000. X. Zeng, Y. Wang, and K. Zhang, “Computation of Aadalines’ Sensitivity to Weight Perturbation,” IEEE Transactions on Neural Networks, vol.17, no.2, pp.515-518, 2006. Y. Wang, X. Zeng, D. Yeung and Z. Peng, “Computation of Madalines’ Sensitivity to Input and Weight Perturbations,” Neural Computation, accepted for publication. TABLE I

EXPERIMENTAL RESULTS FOR THE EMULATOR PROBLEM UNDER THE FIRST

16-5-3 1199 95.5% 377461 22

TABLE III EXPERIMENTAL RESULTS FOR TRAINING DATA SET 1 IN THE MONK’S PROBLEM UNDER THE FIRST STOP CRITERION

Our algorithm Architecture 10-3-1 10-6-1 Training times for each 31 591 sample Success rate 100% 100% Adaptation number 776 5434 Total time (second) 2 19 The MRII algorithm Architecture 10-3-1 10-6-1 Average training times 1000 1000 for each sample Success rate 82.3% 97.6% Adaptation number 25069 3566 Total time (second) 2 2

10-9-1 88 100% 1037 6 10-9-1 1000 97.6% 6146 3

TABLE IV EXPERIMENTAL RESULTS FOR TRAINING DATA SET 1 IN THE MONK’S PROBLEM UNDER THE SECOND STOP CRITERION

STOP CRITERION

Our algorithm Architecture Training times for each sample Success rate Adaptation number Total time (minute) the MRII algorithm Architecture Average training times for each sample Success rate Adaptation number Total time (second)

16-5-3 11 95.7% 3096 11

Our algorithm Architecture 10-3-1 10-6-1 Training times for each 28 81 pattern Success rate 97.6% 95.2% Adaptation number 763 2139 Total time (second) 2 7 The MRII algorithm Architecture 10-3-1 10-6-1 Average training times 10000 42 for each pattern Success rate 80.6% 96.8% Adaptation number 236188 710 Total time (second) 14 1

16-5-3 1000 93.7% 10089 30 16-5-3 1000 73.4% 317190 19

1755

10-9-1 10 95.2% 430 2 10-9-1 19 97.6% 395 1

TABLE V EXPERIMENTAL RESULTS FRO TRAINING DATA SET 2 IN THE MONK’S PROBLEM

TABLE VIII EXPERIMENTAL RESULTS FRO TRAINING DATA SET 3 IN THE MONK’S PROBLEM

UNDER THE FIRST STOP CRITERION

UNDER THE SECOND STOP CRITERION

Our algorithm Architecture 10-3-1 10-6-1 Training times for each 16 1000 pattern Success rate 100% 97.6% Adaptation number 927 9820 Total time (second) 2 32 The MRII algorithm Architecture 10-3-1 10-6-1 Average training times 713 1000 for each pattern Success rate 100% 88.8% Adaptation number 26468 37027 Total time (second) 3 3

Our algorithm Architecture 10-3-1 10-6-1 Training times for each 76 28 pattern Success rate 95.1% 95.1% Adaptation 1666 501 Time (second) 3 2 The MRII algorithm Architecture 10-3-1 10-6-1 Average training times 20 12 for each pattern Success rate 95.1% 95.9% Adaptation 308 178 Time (second) 1 1

10-9-1 1000 92.9% 20394 95 10-9-1 1000 88.8% 34792 4

TABLE VI EXPERIMENTAL RESULTS FRO TRAINING DATA SET 2 IN THE MONK’S PROBLEM UNDER THE SECOND STOP CRITERION

Our algorithm Architecture 10-3-1 10-6-1 Training times for each 13 98 pattern Success rate 95.9% 95.9% Adaptation number 911 4413 Total time (second) 2 14 The MRII algorithm Architecture 10-3-1 10-6-1 Average training times 707 10000 for each pattern Success rate 95.3% 87.6% Adaptation number 26448 289762 Total time (second) 2 26

10-9-1 114 95.3% 3537 17 10-9-1 1140 95.3% 37658 5

TABLE VII EXPERIMENTAL RESULTS FRO TRAINING DATA SET 3 IN THE MONK’S PROBLEM UNDER THE FIRST STOP CRITERION

Our algorithm Architecture 10-3-1 10-6-1 Training times for each 1000 122 pattern Success rate 96.7% 100% Adaptation number 7159 1542 Total time (second) 13 5 The MRII algorithm Architecture 10-3-1 10-6-1 Average training times 1000 1000 for each pattern Success rate 94.3% 98.4% Adaptation number 7277 2015 Total time (second) 2 2

10-9-1 9 100% 183 1 10-9-1 1000 98.4% 3645 3 1756

10-9-1 4 95.1% 157 1 10-9-1 26 95.9% 505 1

Using a Sensitivity Measure to Improve Training ...

Engineering, Hohai University, Nanjing 210098, China (email: [email protected].cn). In our study, a new learning algorithm based on the MRII algorithm is developed. We introduce a sensitivity of. Adalines, which is defined as the probability of an Adaline's output inversions due to its weight variation with respect to.

Download PDF

339KB Sizes 0 Downloads 269 Views

Report

Using a Sensitivity Measure to Improve Training ...

Recommend Documents