x1;...; xn

Viewer
Transcript

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 2, MARCH 2006

Computation of Adalines’ Sensitivity to Weight Perturbation Xiaoqin Zeng, Yingfeng Wang, and Kang Zhang Abstract— In this paper, the sensitivity of Adalines to weight perturbation is discussed. According to the discrete feature of Adalines’ input and output, the sensitivity is deﬁned as the probability of an Adaline’s erroneous outputs due to weight perturbation with respect to all possible inputs. By means of hypercube model and analytical geometry method, a heuristic algorithm is given to accurately compute the sensitivity. The accuracy of the algorithm is veriﬁed by computer simulations. Index Terms—Adaline, Madaline, neural networks, sensitivity.

I. INTRODUCTION The sensitivity of neural networks to their parameter perturbation, i.e., the effect of the parameter perturbation on neural networks’ output, is obviously an important measure for the evaluation of neural networks’ performance. In literature, a number of studies on the sensitivity of neural networks emerged, and so did their applications. They vary in their target networks and approaches. This paper focuses on the study of Aadalines’ sensitivity and proposes a novel computational method. Stevenson et al. [1] ﬁrst systematically and theoretically investigated the sensitivity of Adalines. They used the surface of a hypersphere with radius n1=2 as mathematical model to approximately express the input space for Adalines with n-dimensional input. Based on such a geometrical model, they deﬁned sensitivity of an Adaline as the probability of erroneous output of the Adaline, and then derived the sensitivity as a function of the percentage perturbation in inputs and weights under the assumption that the input and weight perturbations are small and the dimension of input is sufﬁciently large. Unfortunately, since the discrete inputs of an Adaline generally do not span the whole hypersphere surface, the expression of the input space by the surface of a hypersphere is not exact. What the inputs actually span are the vertexes of a hypercube that is inside tangent to the hypersphere. Hence, when the input dimension of Adalines is not sufﬁciently large their results may have large deviations. Another way proposed by Piché [2] is a stochastic method, in which the assumption that inputs and weights as well as perturbations are all independently, identically distributed with mean zero, is made. Under such a stochastic model and the condition that perturbations are small enough, Piché derived an analytical expression for the sensitivity as the ratio of the variance of the output perturbation to the variance of the output. However, this way is only applicable to analyzing the behavior of an ensemble of Adalines, but not to an individual one because of too strong assumptions of the stochastic model. The applications of the sensitivity of neural networks have appeared in many neural network research aspects, such as improving error tol-

Manuscript received May 27, 2004; revised March 26, 2005. This work was supported by the Provincial Natural Science Foundation of Jiangsu, China, under Grant BK2004114 and the National Natural Science Foundation of China under Grant 60571048. X. Zeng is with the Department of Computer Science and Engineering, Hohai University, Nanjing, Jiangsu 210098, China, and also with the State Key Laboratory for Novel Software Technology of Nanjing University, Nanjing, China (e-mail: [email protected]). Y. Wang is with the Department of Computer Science and Engineering, Hohai University, Nanjing, Jiangsu 210098, China (e-mail: [email protected]). K. Zhang is with the Department of Computer Science, University of Texas, Dallas, TX 75083 USA (e-mail: [email protected]). Digital Object Identiﬁer 10.1109/TNN.2005.863418

515

erance [3], deleting redundant inputs [4], pruning architectures [5], and so on. Recently, we explored the sensitivity of multilayer perceptron (MLP) networks by using a hypercube model [6] to consider the sensitivity of perceptrons and successfully applied the sensitivity of perceptrons to prune the hidden neurons of MLPs [7]. In this paper, we discuss the computation of Adalines’ sensitivity by adjusting the hypercube model with a different analytic way. In our research, we employ the vertexes of a hypercube to represent Adalines’ input space. Based on such a mathematical model, some formulas are derived and an algorithm for the computation of Adalines’ sensitivity is designed. Our approach is different from the aforesaid ones and it does offer certain advantages over them. For example, it is exact in expressing the input space of Adalines so that the sensitivity computation is accurate; furthermore, it does not demand the dimension of input to be large enough, the weight perturbation to be very small, and the bias to be zero. However, it is worth noticing that the increasing of accuracy may cause the increasing of computational complexity. The contribution of this paper is the design of a heuristic algorithm for accurately computing the sensitivity with relatively less complexity. II. THE ADALINE MODEL An Adaline is a basic building block of Madalines, which in general has n n > binary input elements and one binary output. Each input element is associated with an adjustable weight of real number. By computing the sum of weighted input elements plus a bias to yield a linear output and then feeding the linear output to an activation function to produce an output, an Adaline is capable of implementing certain logic functions. In this letter, the input vector of an Adaline is denoted as X ; xn T . Each input element takes on a value of either or 0 ; x1 ; w1 ; ; wn T and the Adaline’s weight vector is denoted as W bias is ; the output of the Adaline is expressed as y f X 3 W , where f x is the following symmetrical hard limit function:

(

1)

( ...

)

()

= +1 1 = ( ... ) = ( +)

( ) = 01 1

0

x (1) : x< With the introduction of perturbation, the perturbed input and weight x10 ; ; x0n T and vectors are respectively denoted as X 0 0 T W0 w10 ; ; wn , and their corresponding perturbation vectors x1 ; ; xn T and W w1 ; ; wn T . are X f x

= ( ... ) 1 = (1 . . . 1 )

0

= ( ... ) 1 = (1 . . . 1 )

III. THE DEFINITION OF THE SENSITIVITY How shall we deﬁne a sensitivity to correctly reﬂect the effects of parameter perturbation on Adalines’ output? Mathematically, this can be done by establishing a relationship between the output deviation of Adalines and their parameter perturbation (i.e., treating sensitivity as a function of parameter perturbation), and then analyzing and computing the function to explore sensitivity features and set a sensitivity measure. Two critical parameters in Adalines are weight and input. Since input perturbation can be transformed to weight perturbation, for the sake of simplicity and without the loss of generality, only weight perturbation is focused in this paper. Thus, the most direct and natural way to express the output deviation arising from weight perturbation is the difference between deviated and nondeviated outputs

1y = f (X 3 (W + 1W ) + ) 0 f (X 3 W + ): (2) It is obvious that (2) reﬂects the relationship between 1y and 1W , and 1y can be easily computed when X, W, 1W, and are all known. In real life situations, a given Adaline has ﬁxed incoming weights and

1045-9227/$20.00 © 2006 IEEE

516

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 2, MARCH 2006

bias, and weight perturbations can usually be estimated with domain knowledge, but an individual input would be meaningless for the computation of y , especially in measuring the Adaline’s performance. It would be more desirable that sensitivity, as a measure, should be a function, in an ensemble sense, of the overall inputs rather than a speciﬁc one. Besides, the binary attribute of Adalines’ output makes it unnecessary to compute the absolute magnitude of y . Actually, the number of erroneous outpust due to the perturbation with respect to all inputs is enough. For these considerations, we adopt the following deﬁnition for the sensitivity of Adalines. Deﬁnition: The sensitivity of an Adaline is deﬁned as the probability of erroneous output of the Adaline due to its weight perturbation with respect to all inputs, which is expressed as

1

1

s

=

Nerr Ninp

(3)

where Nerr is the number of output errors arising from weight perturbations with respect to all input patterns, and Ninp is the number of all inputs.

Fig. 1. Illustration of input space divided by P and P .

1

. Thus, the xn -coordinates for a given line’s intersection points with respect to P and P 0 can be, respectively, calculated by

01

n

^ =0

xn

=0

IV. THE COMPUTATION OF THE SENSITIVITY In order to compute the sensitivity, we assume that all inputs of n-dimension are uniformly distributed, so Ninp is equal to n . One direct way for computing the sensitivity is to follow the Adaline’s working process, one input at once, to compute the outputs with both unperturbed and perturbed weights for all n inputs. For each input, the cost of additions and multiplications is O n , so the total computational complexity is O n n in this way. Obviously, the complexity will be very high when n is large. This paper presents a heuristic algorithm with less complexity to compute the sensitivity. From geometric point of view, all vertexes of a hypercube denoted as can be employed as the mathematical model to express an Adaline’s n xj w j can be reinput space, and the hyperplane P j =1 garded as a dividing plane that may in general divide all vertexes of into three parts, namely the vertexes on P , the vertexes in one side of P and the vertexes in the other side of P . According to (1), the outputs of the Adaline are always 1 for the inputs on P . They are the same for the inputs in the same part, and they are just opposite for the inputs in the two opposite parts. Because of the weight perturbation, P is changed n xj wj0 , and this change may cause some inputs to P 0 j =1 that previously belong to one part under P ’s division fall into another part with the opposite output under P 0 ’s division. What we want to do here is to ﬁnd the number of those inputs because it is exactly equal to Nerr . Generally, n-dimensional space can be divided into four parts by P and P 0 , which is illustrated in Fig. 1. It can be derived that if wn 1 0 > , the number of vertexes in part I and III is Nerr ; otherwise, wn the number of vertexes in part II and part IV is Nerr , which also equals the difference between the total number of vertexes and the number of vertexes in part I and III. Thus, only the number of vertexes in part I and III needs to be considered. As a result, the division of Nerr by n is the sensitivity of the Adaline. A solution for computing the number of vertexes in part I and III has n01 edges parallel to a is proposed below. It is known that given coordinate axis, say xn -axis. Each line that is the extension of an edge must have one and only one intersection point with P and P 0 separately under the assumption of wn 6 and wn0 6 . The idea is to locate the intersection points on each parallel line, and then compare the coordinates on that given axis of the two intersection points with the 1 and 0 to identify if the two vertexes on that edge are in part I or III. Each line parallel to the xn -axis can be determined by assigning ; xn01 with either xj 0 or xj the n 0 coordinates x1 ;

2

2

(2)

:

+ =0

+ =0

0

2

1

( ...

x0n

)

= 1

=

wn

n

01

n

3 x 0 w =

j

j

wn

j =1

^ =0

n

rj

j =1

3x +r j

(4)

n

01 0 wj 3 xj +

j =1

0 wn

=

01

n

rj0

3 x + r0 j

j =1

(5)

n

where rj 0wj =wn , rj0 0wj0 =wn0 , rn 0=wn and rn0 0=wn0 j n 0 . By the comparison of xn , x0n , 1 and 0 , it is easy to determine whether the two vertexes on a given line are in part I and III or not. However, if we treat all of the n01 parallel lines one by one, the computational complexity will not be decreased. In order to reduce the complexity, it is most desirable to treat the parallel lines as less as possible. Obviously, the ﬁrst attempt is to take all parallel lines into consideration at once, and this may result in the following four special cases. 1) Both hyperplanes are below the hypercube, in this case, there is no vertex in parts I and III, which is shown in Fig. 2(a). 2) Both hyperplanes are above the hypercube, this case is equivalent to 1), which is shown in Fig. 2(b). 3) Both hyperplanes pass through the middle of the hypercube, this case also has no vertex in parts I and III, which is shown in Fig. 2(c). 4) The whole hypercube is between these two hyperplanes, in this case, all vertexes are in parts I and III, which is shown in Fig. 2(d). be the maximum For identifying these four special cases, let xmax n xn -coordinate for the intersection points with respect to P , and xmin n be the minimum. By (4), they can be calculated as follows:

= (1

=

1)

=

= 1

^ ^ 2

^

^

xmax n

=

^ =

xmin n

=0

1

01 w

j

n

2

=0

3x +

wj

and

()

:

j =1

01

n

max

(x ;x ;...;x

)

rj

j =1

01

n

min

(x ;x ;...;x

)

j =1

rj

3x +r = j

^

01

n

n

3x +r =0 j

jr j + r j

j =1

01

n

n

jr j + r j

j =1

(6)

n

n

:

(7) Because of the symmetry of the ﬁrst items (the sum) of (6) and (7), can be easily converted into each other by xmax and xmin n n

^

^

^

xmax n

= 0x^ + 2r min n

n

:

(8)

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 2, MARCH 2006

517

Fig. 3.

2)

Tree for n = 5.

Each son node of a given node at hh; ih i, where h (1 n) is the level number of the given node in the tree and ih (1 ih n 0 1) is the order number of the given node among its brother nodes, is yielded by changing only h i j n 0 1 from 1 to 0 in the mark of one bj l=1 l its father node, where il (1 il n 0 1) is the order number of a node that appears at level l in the path from tree’s root to the given node. The x ^n of the father node is greater than or equal to that of its son node. Among all son nodes of a given node at hh; ih i, following their production sequence from hl=1 il to n 0 1, the ﬁrst yielded one is the eldest brother in the leftmost with order 1, the last yielded one is the youngest brother in the rightmost; namely, a younger brother is on the right of its elder brother, ^n of younger brother’s is less than or equal to that of so the x its elder brother’s under the assumption that jrj j jrj +1 j for all j (1 j n 0 2).

h

Fig. 2.

Illustration of no vertex in part I and III or all vertexes in part I and III.

Thus, the summation does not need to be calculated twice. This relationship is also applicable to other pairs of intersection points, which 0 max ^n and are symmetrical to the center (0; 0; . . . ; 0; rn ). As to P 0 , x 0 min max min x ^n , the counterparts of x ^n and x ^n can also be calculated in a max ^min 0 max 0 min ^n , x ^n and x ^n , the conditions for the similar way. With x n ,x four special cases can be expressed, respectively, as max < 01) ^ (^x0nmax < 01); T1 = (^ xn 1) min > 1) ^ (^x0nmin > 1); T2 = (^ xn 2) max min > 01) ^ (^x0nmax < 1)^(^x0nmin > T3 = (^ xn < 1) ^ (^ xn 3) 01); max 0 min 0 max xn < 01) ^ (^ xn > 1)) _ ((^ xn < 01) ^ T4 = ((^ 4) min > 1)). (^ xn Besides the above four special cases, it is also unnecessary, in most cases, to treat each parallel line. For example, if xn -coordinates of the two intersection points of a line fall into [01; 1], this line [as the one shown in Fig. 2(c)] does not need to be taken into account. The problem is how to efﬁciently exclude this kind of lines. Inversely, it is equivalent to ﬁnd all of those parallel lines which xn -coordinates of the intersection points are not in [01; 1]. Following this idea, we now turn to consider those xn -coordinates that are of intersection points on either P or 0 P and are either greater than 1 or less than 01. But for simplicity and without the loss of generality, the following discussions merely concentrate on those xn -coordinates of the intersection points that locate ^n > 1 because of the following two reasons. on P and hold x 1) Due to the equality between P and P 0 , the intersection points on them can be treated in the same way. 2) Due to the symmetrical attribute of (8), only one case of x ^n > 1 or x ^n < 01 needs to be considered. ^n is not enough for determining if a releHowever, such a satisﬁed x vant vertex is contained in the part I and III, it further needs to refer 0 ^n of the same line, that is only when x ^n > 1 to its corresponding x 0 and meanwhile x ^n < 1 can the vertex be counted. Fortunately, if the 0 ^n can be original correspondence between W and W 0 is maintained, x ^n and vice versa. Now the problem is reduced to easily derived from x ^n s that hold x ^n > 1 from poshow we can efﬁciently ﬁnd all those x sible 2n01 xn -coordinates. In our approach, tree technique is employed to model the n01 x -coordinates into a partial descending order. For a given 2 n n01 nodes, in which each node repn, there exists a unique tree with 2 n 01 parallel ones and is marked by a ^n related to a line in 2 resents an x series of b1 b2 . . . bn01 with either bj = 1 or bj = 0 (1 j n 0 1) depending on whether rj 3 xj = jrj j or rj 3 xj = 0jrj j appears in the summation of (4). Fig. 3. illustrates the tree for n = 5. This kind of tree has the following features. 1) The root is on level 1 and marked by b1 b2 . . . bn01 = max ^n . 11 . . . 1, so it represents x

3)

With such a tree model, the search for x ^n can be done more efﬁciently max and then following the depth-ﬁrst ^n by starting from the root of x ^n > 1 is found false in a node its offspring nodes way, because once x and younger brother nodes as well as their offspring nodes can all be excluded. Although the tree’s architecture is determined by the input dimension n, the search path in the tree is determined by the values of rj (1 j n 0 1) and thereby W . Concerning implementation aspects, there is no extra action needed to organize the tree besides sorting the elements of W into ascending order in terms of their absolute values, which guarantees the assumption of jrj j jrj +1 j (1 j n 0 2) to be true. In addition, the depth-ﬁrst search can be realized by starting from b1 b2 . . . bn01 = 11 . . . 1 and then properly adjusting b1 b2 . . . bn01 of current node to go to its eldest son node or its father’s next younger brother node, if they exist. With each b1 b2 . . . bn01 , its corresponding x ^n can be calculated by (4). Further, since b1 b2 . . . bn01 of current node is derived from previous node in the search path by changing one or three bj s (1 j n 0 1), x ^n of current node can be obtained form previous node by performing a few subtractions and additions rather than n grade summation. The adjustments from a current node with b1 b2 . . . bn01 and x ^n to its eldest son node and its father’s next younger brother node can be done the following way. To the eldest son: Let j = max(fl j (1 l n 0 1) ^ (bl = g[f g 6 n 0 1. It needs to set bj+1 = 0 and 0 j j otherwise, no son node exists. 2) To father’s next younger brother: Let k = max(fl j (1 l n 0 1) ^ (bl = 0)g [ f0g) and j = max(fl j (1 l k 0 1) ^ (bl = 0)g [ f0g), if (k 6= 0) ^ (j 6= 0) ^ (k = ^n = x ^n + 2jrj j; if j + 1). It needs to set bj = 1, and x (k 6= 0) ^ (j 6= 0) ^ (k > j + 1), it needs to set bj = 1, ^n = x ^n + 2jrj j 0 2jrj +1 j + 2jrk j; bj +1 = 0, bk = 1, and x otherwise, stop the search. To summarize the previous discussions, it comes to the following heuristic algorithm for the computation of Adalines’ sensitivity. 1)

0) 0 ), if j = x ^n = x ^n 2 rj +1 ;

518

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 2, MARCH 2006

1) Initialization, such as , and so on; 2) Sort in an ascending order in terms of absolute value of its elements, and adto keep the just the element order of original correspondence with . 3)Compute , , , , , and ; 4) Switch in terms of , , and : Case1 ( is true): ; exit; // No vertex is in part I & III is true): ; exit; // All Case2 ( vertexes are in part I & III Case3 ( is false): ; // Count vertexes in part I & III 4.1. ; // Focus on , assume // and start from Loop for each node having searched by depth-first way: Derive the corresponding from ; If , ; If , ; 4.2. Sort in an ascending order in terms of absolute value, and meanwhile adjust the element order of to keep the correspondence with ; ; 4.3. Do as 4.1 with respect to 5) If , ; 6) is the computed sensitivity of the Adaline.

TABLE I ASSOCIATED PARAMETERS AND EXPERIMENTAL RESULTS SIX ADALINES

FOR THE

Based on the above discussion, it is clear that the complexity of the algorithm is closely related to the given weights and their perturbations, i.e., the relative positions of the two hyperplanes. If the conditions of Case1 and Case2 in the algorithm are satisﬁed, the complexity will be only O(n log2 n) for the cost of the sorting; otherwise, in the case of Case3 it will be between O(n log2 n) and O(2n ), depending on the depth-ﬁrst search of the tree. In the search, the worst case is that the path contains all 2n01 nodes, but this case is rare especially when = 0. Anyhow, the algorithm will be more efﬁcient in an average sense. Finally, there is a boundary case that cannot be ignored, in which a vertex of the hypercube is on a hyperplane and thus lead an intersection 0 ^n = 61 or x ^ n = 61 . point to coincide with the vertex and has x According to the condition of x 1 in (1), it is needed to make the vertex on the hyperplane play an equal role as the vertexes above the hyperplane (when wn > 0 or wn0 > 0) or the vertexes below the ^n = 61, we have hyperplane (when wn < 0 or wn0 < 0). Aiming at x the following two adjustment rules. xn < 1) and (^ xn < 01) should be respectively 1) If wn > 0, (^ xn 1) and (^ xn 01), while (^ xn > 1) and replaced by (^ (^ xn > 01) are unchanged. xn > 1) and (^ xn > 01) should be respectively 2) If wn < 0, (^ xn 1) and (^ xn 01), while (^ xn < 1) and replaced by (^ (^ xn < 01) are unchanged. 0 ^n = 61. This boundary case can be The rules are also suitable for x involved in the algorithm by adjusting some relevant logic expressions, such as T1 , T2 , T3 , T2 , etc. V. EXPERIMENTAL VERIFICATION To verify the theoretical results, a number of experiments have been conducted. Table I lists experimental results for six Adalines along with

their relevant parameters. The input dimensions of the six Adalines are 5, 10, 15, 20, 25, and 30, respectively, and their weights are arbitrarily

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 2, MARCH 2006

given from at ﬁrst all being positive to mixtures of positive and negative and to ﬁnally all negative. In order to compare with Stevenson’s approach and because it requires the bias to be zero, the biases of the six Adalines are all set to zero. In experiment, under the conditions that the elements of W are all identical and the bias is zero for each Adaline, computer simulations that simulate Adalines’ working process and computer computations according to the algorithm are separately run to compute the actual probability of erroneous outputs and the theoretical sensitivity for the six Adalines. Both the simulation results p and the theoretical results s given in Table I show that they are completely equal. This veriﬁes the correctness of our approach. Further, the corresponding theoretical sensitivities s0 based on Stevenson’s approach for the six Adalines are also computed and listed in the last column of Table I. The comparison of the data in columns p, s, and s0 demonstrates that our approach is more accurate.

1

519

Associative Memory Design for 256 Gray-Level Images Using a Multilayer Neural Network Giovanni Costantini, Daniele Casali, and Renzo Perfetti Abstract—A design procedure is presented for neural associative memories storing gray-scale images. It is an evolution of a previous work based on the decomposition of the image with 2 gray levels into binary patterns, uncoupled neural networks. In this letter, an -layer neural stored in network is proposed with both intralayer and interlayer connections. The connections between different layers introduce interactions among all the neurons, increasing the recall performance with respect to the uncoupled case. In particular, the proposed network can store images with the commonly used number of 256 gray levels instead of 16, as in the previous approach. Index Terms—Associative memories, brain-state-in-a-box (BSB) neural networks, gray-scale images, multilayer architectures.

VI. CONCLUSION In this paper, a quantiﬁed sensitivity for Adalines to weight perturbation is given. The sensitivity is the basis of the study of Madalines’ sensitivity. In addition, the sensitivity is hopefully expected to be a measure for evaluating importance of each Adaline in a Madaline, and so as to be helpful for training and pruning Madaline. REFERENCES [1] M. Stevenson, R. Winter, and B. Widrow, “Sensitivity of feedforward neural networks to weight errors,” IEEE Trans. Neural Netw., vol. 1, no. 1, pp. 71–80, Jan. 1990. [2] S. W. Piché, “The selection of weight accuracies for madalines,” IEEE Trans. Neural Netw., vol. 6, no. 2, pp. 432–445, Mar. 1995. [3] J. L. Bernier et al., “Improving tolerance of MLP by minizing sensitivity to weight deviations,” Neurocomput., vol. 31, pp. 87–103, 2000. [4] J. M. Zurada, A. Malinowski, and S. Usui, “Perturbation method for deleting redundant inputs of perception networks,” Neurocomput., vol. 14, pp. 177–193, 1997. [5] A. P. Engelbrecht, “A new pruning heuristic based on variance analysis of sensitivity information,” IEEE Trans. Neural Netw., vol. 12, no. 6, pp. 1386–1399, Nov. 2001. [6] X. Zeng and D. S. Yeung, “A quantiﬁed sensitivity measure for multilayer perceptron to input perturbation,” Neural Comput., vol. 15, no. 1, pp. 183–212, 2003. [7] , “Hidden neuron pruning of multilayer perceptrons using a quantiﬁed sensitivity measure,” Neurocomput., to be published.

I. INTRODUCTION The design of neural associative memories storing gray-scale images is a challenging problem investigated by few authors. Consider an image with n pixels and L gray levels. The ﬁrst approach is based on neural networks with multivalued stable states, a model introduced in [1]. The activation function is a quantization nonlinearity with L plateaus corresponding to the gray levels. Denoting by n the number of pixels, the required number of neurons is n and the number of interconnections is n2 . Some design methods have been proposed for networks with this type of nonlinearity, with interesting experimental results [2]. A second approach is based on complex-valued neural networks [3]–[6]. The neuron state can assume one of L complex values, equally spaced on the unit circle. Each phase angle corresponds to a gray level. The number of neurons is n; the number of interconnections is n2 . For complex-valued neural networks a generalized Hebb rule was proposed in [3], [4]. In [7], we proposed a third approach where each pixel can be repbL , so the image can be decomposed into L resented by L bits, b1 binary patterns with n components. Each binary pattern can be stored into a binary associative memory. There are L uncoupled networks, each with n2 interconnections. The main advantage is that the L subnetworks can be implemented via parallel hardware with considerable saving in time, both for learning and recall. However, this approach presents two drawbacks. First, the storage probability of a random set of images is the product of the storage probabilities in the subnetworks. Hence, the capacity is quite lower than that of each subnetwork. In the same way, the recall probability, starting from noisy versions of the stored images, is reduced with respect to the recall probability of each subnetwork. As the number of gray levels increases, both problems become worse since the number of independent networks increases. As a consequence, the method suggested in [7] is applicable only up to 16 gray levels. To overcome this limitation, we present an evolution of our previous approach based on the introduction of connections between layers. Building interlayer connections introduces interactions

2

2

2

...

Manuscript received October 8, 2004; revised May 26, 2005. G. Costantini and D. Casali are with the Department of Electronic Engineering, University of Rome “Tor Vergata,” Rome I-00100, Italy (e-mail: [email protected]; [email protected]). R. Perfetti is with the Department of Electronic Engineering, University of Perugia, Perugia I-06125, Italy (e-mail: [email protected]). Digital Object Identiﬁer 10.1109/TNN.2005.863465 1045-9227/$20.00 © 2006 IEEE

Dallas, TX 75083 USA (e-mail: [email protected]). Digital Object Identifier .... are symmetrical to the center (0;0;...;0; rn). As to P0. , ^x0max n and ..... Table I. The comparison of the data in columns p, s, and s0 demon- strates that our ...

Download PDF

223KB Sizes 4 Downloads 181 Views

Report

x1;...; xn

Recommend Documents