ARTICLE IN PRESS Pattern Recognition
(
)
--
Contents lists available at ScienceDirect
Pattern Recognition journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / p r
An improved time-adaptive self-organizing map for high-speed shape modeling Mohammad Izadi, Reza Safabakhsh ∗ Computer Engineering Department, Amirkabir University of Technology, Tehran 15914, Iran
A R T I C L E
I N F O
Article history: Received 21 February 2007 Received in revised form 16 August 2008 Accepted 30 October 2008 Keywords: Active contour model Time-adaptive self-organizing map TASOM Adaptive speed parameter Boundary curvature Person tracking
A B S T R A C T
In this paper, an improved active contour model based on the time-adaptive self-organizing map with a high convergence speed and low computational complexity is proposed. For this purpose, the active contour model based on the original time-adaptive self-organizing map is modified in two ways: adaptation of the speed parameter and reduction of the number of neurons. By adapting the speed parameter, the neuron motion speed is determined based on the distance of each neuron from the shape boundary which results in an increase in the speed of convergence of the contour. Using a smaller number of neurons, the computational complexity is reduced. To achieve this, the number of neurons used in the contour is determined based on the boundary curvature. The proposed model is studied and compared with the original time-adaptive self-organizing map. Both models are used in several experiments including a tracking application. Results reveal the higher speed and very good performance of the proposed model for real-time applications. © 2008 Elsevier Ltd. All rights reserved.
1. Introduction SHAPE representation is an important topic in machine vision and pattern recognition [1–7]. In recent years, region-based and edge-based contour models have been proposed for shape representation in images. Region-based representation is very sensitive to shape segmentation and for poor shape segmentation, a good representation of the shape is not possible. In addition, if the shape is not connected, it will not be modeled well. Another method for shape representation is based on selecting control points on the object contour. This method does not show severe problems, but it has high computational complexity. In contour-based methods, the shape of an object is extracted from image locally and when the shape changes, remodeling the shape and modifying the contour is performed quickly. Active contour model was introduced by Kass et al. in 1988 [1]. This model consists of a continuously deformable curve used to locate features in an image controlled by internal smoothness and elastic forces and external image forces. Many refinements have been proposed to this approach for better modeling. Cootes et al. [2] combined deformable shape descriptors with statistical modal analysis built from a training set of annotated images. Object shapes are represented by a subset of boundary points,
∗
Corresponding author. Tel.: +98 21 66959149; fax: +98 21 66419728. E-mail addresses:
[email protected],
[email protected],
[email protected],
[email protected] (R. Safabakhsh).
and a correspondence is established between these points from different images of the training set. The deformations are modeled using linear combinations of the eigenvectors of the variations from the mean shape and thus, defining the characteristic pattern of a shape class and allowing deformation reflecting the variations in the training set. A similar modal analysis scheme is proposed by Pentland and Sclaroff [8] which gives a set of linear deformations of the shape equivalent to the modes of vibration of the original shape. However, the modes are somewhat arbitrary and may not represent real variations occurring in a class of shapes. Staib and Duncan [9] used elliptic Fourier descriptors as modal parameters to represent open and closed boundaries. The Fourier coefficients are used to bias toward a range of shapes about a mean by using a Gaussian distribution on the parameters as a prior probability. A Bayesian approach is then used to obtain the maximum a posteriori estimate of the boundary. Chakraborty et al. [10] extended this approach to incorporate region homogeneity. Fourier descriptors are somewhat limited because they are not suitable for describing some shapes such as those including corners. Wang and Staib [11] also proposed a statistical point model using principal component analysis. Caselles et al. [12] proposed a geodesic active contour model based on a level set method. They proved that a particular case of the classical energy snake model is equivalent to finding a geodesic or minimal distance path in Riemannian space with a metric derived from the image content. Cohen et al. [13] proposed a global minimal path approach based on a fast marching method. Han et al. [14] used a heuristic method to find the minimal path.
0031-3203/$ - see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2008.10.034
Please cite this article as: M. Izadi, R. Safabakhsh, An improved time-adaptive self-organizing map for high-speed shape modeling, Pattern Recognition (2008), doi:10.1016/j.patcog.2008.10.034
ARTICLE IN PRESS 2
M. Izadi, R. Safabakhsh / Pattern Recognition
In addition to the above methods, active contour models based on self-organizing map (SOM) have also been proposed. The SOM algorithm, which was first introduced by Kohonen [15] in 1982, transforms input vectors to a discrete map in an adaptive fashion. This discrete map is often constructed by a one-dimensional (1-D) array or a two-dimensional (2-D) grid of neurons usually called the lattice. The SOM-based active contour model was first introduced by Abrantes and Marques [16] in 1996. They represented a contour model by training a self-organizing network where neurons form a closed chain. After convergence, concatenation of the line segments between every two consecutive neurons produces the desired contour of the object. Here, the input data are usually the 2-D coordinates of the feature points of the object. These feature points are often the edges of the object, which are detected by some suitable edge detection algorithm. Some other modifications in SOM training are also necessary as explained in Ref. [16]. The disadvantage of using the batch SOM is that due to its simplicity, it works well only for simple objects. Venkatesh and Rishikesh [17] modified the SOM algorithm to represent a new SOM-based active contour model. Using a fixed number of neurons and a chain topology that can be closed or open, they proposed a new contour model. This algorithm requires the initial contour to be near the boundary, and the number of points (neurons) on the contour remains fixed. These points are the image pixels that intersect with the initial contour. Using this large number of points on the contour slows down the convergence of the algorithm. However, the proposed model converges to correct results in the presented experiments. A problem with implementing contour modeling through the basic SOM-based networks is that some parts of the initial contour might be closer to the feature points of the object than the other parts. Using the basic SOM training in such cases folds the contour or some parts of the contour never approach the feature points of the object, especially when the object boundary includes concave segments. Non-uniform spread of points on the contour is another problem that occurs in the contour modeling with a fixed number of points. A modified SOM algorithm is the time-adaptive self-organizing map (TASOM) introduced by Shah-Hosseini and Safabakhsh in Refs. [18,19,21]. They also represented a TASOM-based active contour model in Ref. [20]. For this model, the TASOM network with a chain topology was used. They assume that the initial contour is given by a user through the selection of a number of control points inside or outside the object boundary. By applying the proposed algorithm to the initial contour, the contour converges to the object boundary. In addition, Khosravi and Safabakhsh proposed a modified TASOM to detect and track human eye sclera [22]. In this paper, we propose a new TASOM-based active contour model. The speed of neuron movement in this model is adapted and the network converges quickly. Furthermore, the number of neurons and the computation complexity is decreased by selecting the number of neurons on every segment of the contour proportional to the segment curvature. In Section 2, the time adaptive SOM networks are explained briefly and two new modifications to increase the speed of active contour model are introduced. In Section 3, the new active contour model is represented and the modifications are considered and analyzed. Section 4 compares the performance of the proposed algorithm with that of the original TASOM-based model given in Ref. [20] in several experiments. Finally, the conclusions are summarized in Section 5. 2. The TASOM and the proposed modifications The TASOM algorithm is a modified version of the basic SOM algorithm introduced by Shah-Hosseini and Safabakhsh in
(
)
--
Refs. [18,19,21]. This network uses adaptive learning parameters which change based on the input vectors behavior or the environment changes. In the basic SOM algorithm, the learning parameters (learning rate and neighborhood size) are initially at their highest values and decrease with time, so that the feature map stabilizes and learns the topographic map of the environment. As a result, the algorithm does not do well in unsteady environments and requires adaptation of the learning parameters. The TASOM network has an adaptive learning rate and a neighborhood function for each neuron, making it more flexible for approximating changing input distributions. In this network, the learning rate and neighborhood function of each neuron changes based on the distance between the neuron and the input vectors. This feature increases the network convergence speed and makes the network adaptable in response to input changes. Accordingly, the TASOM network is appropriate for use in non-stationary environments. Using this ability, objects are tracked in an image sequence and very good results are reported [21]. The TASOM-based active contour model is formed by a closed or open chain topology network and its speed of convergence is higher than that of the basic SOM. Using the neuron addition/deletion capability, the TASOM-based active contour model controls the number of neurons based on the boundary length and the required accuracy. This model does not show the shortcomings of the basic SOM active contour models; but it still has a few problems that must be settled. Low convergence speed is an important problem of the TASOMbased active contour model in some applications. Using a fixed value for the speed parameters of the TASOM algorithm is effective in creating this problem. To increase the speed of the initial contour residing on the object boundary, the speed parameter max must be increased. But, for some neurons, this may lead to none-convergence or convergence to the inner edges of the shape. But if max is not set to a fixed value for each neuron such that it has a large value for the neurons which are far from the boundary and a small value for the neurons which are near the boundary, then the convergence speed will be increased and no neuron will reside on the inner edges. Adapting the speed parameter in this way leads to the neurons convergence to the boundary much faster. It is preferred to change the speed parameters of the neurons based on the distance of each neuron from the boundary. Accordingly, our new TASOM algorithm finds the minimum distance of each neuron from the boundary in each iteration and then the speed parameter is set to a fraction of the distance. The above modification increases the speed of the TASOMbased active contour model; but further increase in the speed is still desirable. To decrease the computational complexity of the TASOM-based active contour model, we propose to decrease the number of neurons used in the model. The contour is formed by concatenating the line segments between every two neighboring neurons in the model. It is, therefore, preferred to decrease the number of neurons on the boundary segments that have a low curvature. For example, for covering the boundary of a rectangle by a TASOM-based contour, a contour with four neurons is sufficient. Hence, uniform distribution of neurons on the boundary is unnecessary and a weakness of the original model. To decrease the number of neurons, the neurons, which approximately reside on a straight line, are found in each iteration of the new algorithm and then all neurons, except for the first and the last neuron of the line, are removed from the contour. Consequently, the number of neurons which need training will considerably decrease, resulting in the decrease of the computational complexity of the algorithm. Making the above modifications in the original TASOM-based active contour model yields a much faster active contour model which is expounded in the next section.
Please cite this article as: M. Izadi, R. Safabakhsh, An improved time-adaptive self-organizing map for high-speed shape modeling, Pattern Recognition (2008), doi:10.1016/j.patcog.2008.10.034
ARTICLE IN PRESS M. Izadi, R. Safabakhsh / Pattern Recognition
3. The new TASOM-based active contour algorithm
)
--
3
(5) The initial parameters of the TASOM network are appropriately chosen: the learning rate of each neuron j, j (0), is initialized to a value close to 1. The scaling value sl(0) should be set to some positive value, preferably 1. The parameters Ek (0) and E2k (0) may be initialized by some small random values. (6) For all neurons j, set cf (j) ← false, max (j) ← +∞, and ir(j) ← rmax . Neighboring neurons of each neuron j, are included in the set NHj . To have a closed topology, for every neuron j, we set NHj = {j − 1, j + 1}, where NH1 = {N, 2} and NHN = {N − 1, 1}. For an open topology, the set NHj is defined as before, but the other two sets are defined as NH1 = {2} and NHN = {N − 1}. Also, (wold , wold , . . . , wold ) ← (w1 , w2 , . . . , wN ). 1 2 N (7) For each feature point xk ∈ X, the closeness flag cf(j) and the influence radius ir(j) of each neuron j having the weight vector wj is updated as cf (j) ← true if d(xk , wj ) xw then (2) ir(j) ← rmin
The two modifications discussed in the previous section are applied to the original TASOM-based active contour model to produce a faster model for real-time applications. In this section, the new algorithm is expounded and the two modifications are considered and analyzed. 3.1. The TASOM-based active contour model with new modifications In the new TASOM-based active contour model, the object boundary is modelled by N control points (p1 , p2 , . . . , pN ) and is approximated by concatenating the line segments between any two neighboring control points, pj and pj+1 (Fig. 1). The proposed algorithm is summarized in the following steps: (1) The constant parameters l and h control the lowest and highest allowed distances between any two neighboring neurons, respectively. Moreover, the closeness distance parameter xw represents the desired closeness of neurons to the feature points of the image. The speed parameter min is used for controlling the minimum movement speed of the control points toward the object boundary. The influence radius parameters rmin and rmax are used for setting the influence radius of each neuron. Fig. 1 illustrates the above parameters. The parameter dl is used for deciding whether three neighboring neurons are located on a straight line. These parameters are initialized based on the object size and the required speed and accuracy. (2) The initial contour is represented by a number of control points (p1 , p2 , . . . , pN ). (3) The set of feature points X = {x1 , x2 , . . . , xK } of the image is extracted. (4) A TASOM network with an open or closed chain topology is constructed. The number of neurons is set equal to the number of the control points on the initial contour. Then, the weight of neurons, wj , is initialized to the control points: (w1 , w2 , . . . , wN ) ← (p1 , p2 , . . . , pN )
(
(8) The feature points xk ∈ X of the image are used to train the TASOM weights wj : (a) For the current input vector (feature point) xk ∈ X, the winning neuron i(xk ) is found: i(xk ) = arg min{d(xk , wj (n))| where d(xk , wj (n)) ir(j)}, j = 1, 2, . . . , N
(3)
(b) For the winning neuron i(xk ), the maximum speed parameter max (i) is updated:
max (i) = max(min(max (i), xk − wold ), xw ) i
(4)
where parameter is a constant value between zero and one. If the selected value is close to 1, the speed of convergence will increase and vice versa. However, if this parameter is too close to 1, the contour might not reside on the boundary well. If the number of neurons on the initial contour is very low, it is better to set max equal to 1
(1)
The distance between two consecutive neurons which must be less than
νh
and more than
νl
Control points (Neurons)
( p1 , p 2 ,
, pN )
rmin The shape boundary
The influence area of a neuron near the boundary (if the distance of a neuron is less than
δ xw , it is considered to be near the boundary)
rmax The influence area of a neuron far from the boundary
Fig. 1. Illustration of the proposed active contour model parameters.
Please cite this article as: M. Izadi, R. Safabakhsh, An improved time-adaptive self-organizing map for high-speed shape modeling, Pattern Recognition (2008), doi:10.1016/j.patcog.2008.10.034
ARTICLE IN PRESS 4
M. Izadi, R. Safabakhsh / Pattern Recognition
for the first several epochs of training (In this paper, the parameter max is set to 1 for the first three epochs and afterwards it is updated as above). (c) The neighborhood width i (n) of the winning neuron i(xk ) is updated by the following equation:
i (n + 1) = (1 − )i (n) ⎛
⎜ + g ⎝
⎞ 1 ⎟ wi (n) − wj (n)⎠ sg · sl(n) · |NHi | j∈NHi
(5) where, | · | shows the cardinality of a set. The neighborhood widths of the other neurons do not change. Parameter b is a constant value between zero and one, controlling the speed by which the neighborhood widths follow the normalized neighborhood errors. The parameter sg is a positive constant controlling the slope of function g( · ). Function g( · ) is used for normalization of the weight distances and is a scalar function such that (dg(z)/dz) 0 for z > 0, and for 1-D lattices of N neurons g(0) = 0 and 0 g(z) < N. The function used for g(z) in this paper is g(z) = (N − 1)(z/1 + z). (d) The learning rate parameters j (n) of all neurons j are updated by
j (n + 1) = (1 − )j (n)
+ f
1 x − wj (n) sf · sl(n) k
for all j
--
wj −pj
max
j
(10) The weights of neurons are updated by the control points, wj ← pj . (11) Any neuron not used during each epoch of training on X is identified and deleted from the TASOM network. However, neuron deletion should be avoided for the first several epochs, especially when the initial contour has very few control points. (12) After each mline epoch of training (mline is a positive integer value), all neurons lying on a straight line are deleted except for the first and the last neuron as follow: (a) j ← 1, j1 ← 1, and S ← . (b) For each neuron k ∈ S ∪ {j+1}, if the distance of the neuron k from the line connecting neurons j1 and j+2 is less than dl , i.e., (|(wj ,1 − wj+2,1 )(wj+1,2 − wj ,2 ) 1 1 − (wj ,2 − wj+2,2 )(wj+1,1 − wj ,1 )|/ 1 1 (wj ,1 − wj+2,1 )2 + (wj ,2 − wj+2,2 )2 ) < dl 1
wj (n + 1) = wj (n) + j (n + 1)hj,i (n + 1) for all j
)
is used to make the TASOM network invariant to scaling transformation. (9) Contour updating: after one epoch training of TASOM by the feature points, the control points which have won more than one time are updated and others remain unchanged: ⎧ wj −pj ⎪ p + if cf (j) is true ⎪ min j ⎨ wj −pj (9) pj = ⎪ wj −pj ⎪ ⎩p + (j) else
(6)
where parameter is a constant value between zero and one, controlling the speed by which the learning rates should follow the normalized learning rate errors. Parameter sf is a positive constant controlling the slope of function f( · ). Function f( · ) is used for normalization of the distance between the weight and input vectors, and is a monotonically increasing scalar function so that for each positive z, we have 0 f (z) < 1 and f(0) = 0. The function used for f(z) here is f (z) = (z/1 + z). (e) The weights of all neurons is updated as
× (xk (n) − wj (n)),
(
(7)
where index i shows the winning neuron and index j stands for all the other neurons of the network. The Gaussian neighborhood function hj,i = exp(−(d2j,i / 2i (n))) is centered at the winning neuron i. The distance between neurons j and i, denoted by d2j,i , is calculated in the lattice space. For the closed topology on the lattice, we have dj,i = min(j −
i, N − j − i). For the open topology, we have dj,i = j − i. (f) The scaling value sl(n) is updated so that it incrementally estimates the standard deviation or the diameter of the input distribution using equation
sl (n + 1) (8) sl(n + 1) = l
where sl (n + 1) = (E2l (n + 1) − El (n + 1)2 )+ , E2l (n + 1) = E2l + s (x2k,l (n) − E2l (n)), El (n + 1) = El (n) + s (xx,l (n) − El (n)),
(z)+ = max(z, 0), and l = 1, 2. Parameters s and s are constant values between zero and one, and El (n) and E2l (n) incrementally approximate E(xk,l ) and E(x2k,l ), respectively; with more emphasis on the recent input data. E( · ) represents the statistical expectation. The scaling value sl(n)
1
(10)
Then S ← S ∪ {j + 1}; otherwise, j1 ← j + 1, all neurons belong to S are deleted, and S ← . (c) j ← j + 1. (d) Go to (12b) if j N − 1; else go to step 13. For the first several epochs, it is preferred not to execute this step. (13) If the distance of each neuron from any feature point is less than xw or the number of epochs is greater than the value of Itr (given by user), then pj ← wj and the final contour is obtained by concatenating the line segments between every two consecutive control points and the algorithm is finished. (14) Between any two neighboring neurons j and j+1, a new neuron k is inserted if d(wj , wj+1 ) > h , such that: wk =
wj + wj+1 2
,
k =
j + j+1 2
,
k =
j + j+1 2
(11)
(15) Any two neighboring neurons j and j+1 are replaced by a new neuron k if d(wj , wj+1 ) < l . The weight vector and learning parameters of neuron k are obtained as specified in Eq. (11). (16) For all neurons j, pj ← wj . Then, training is continued by going to step 6. 3.2. Adaptive speed parameter max One of the basic problems of the TASOM-based active contour model in Ref. [20] is that its speed parameter max is fixed. If max is a small value, the speed of convergence decreases. If it is a large value, no neuron may fall in the distance xw from the boundary and the algorithm will not converge or neurons will converge to inside edges. In order to make the algorithm converge, parameter xw must be increased to enlarge the mentioned distance. Therefore, the neurons must move over the enlarged distance slowly (with the speed min ) or the contour cannot fit the boundary well. So, if the parameter max is a large value, the accuracy of the contour
Please cite this article as: M. Izadi, R. Safabakhsh, An improved time-adaptive self-organizing map for high-speed shape modeling, Pattern Recognition (2008), doi:10.1016/j.patcog.2008.10.034
ARTICLE IN PRESS M. Izadi, R. Safabakhsh / Pattern Recognition
fitting decreases; and if it is a small value, the speed of convergence decreases. To increase both the speed of convergence and accuracy, it is better to use an adaptive speed parameter. In this case, the value of max is increased sufficiently for every neuron based on its distance from the boundary. For a large distance, max must be large and for a small distance, it must be small. In the new algorithm, this is performed in step (8b) so that for each neuron, the nearest feature point to the neuron is extracted and the parameter max is set to the distance of the extracted feature point from the neuron multiplied by a constant . If the value selected for is close to 1, the speed of convergence will increase, but the neuron movements will not be smooth and the neuron distribution on the boundary will be poor, especially when the boundary has many deep depressions (as shown in the experiments) and there are a few initial control points. But whatever is close to zero, the speed of convergence decreases and the neuron movements are smooth and the neuron spread on the boundary is well.
3.3. Neuron reduction in proportion to the boundary curvature In step 12 of the new algorithm, after each mline epoch of training (mline is a positive integer), all neurons which lie on a straight line with maximum error dl except for the first and the last neurons are deleted. If dl is larger, the algorithm will be less sensitive to the boundary curvatures. In the new algorithm, after the neuron addition step, a neuron is inserted between every two neighboring neurons which are far from each other. They construct a straight line. During the next epoch, the inserted neuron is moved away from the two former neighboring neurons to the boundary. If its distance from the connected line between the two former neighboring neurons is less than dl , it will be deleted. Thus, the final contour may not cover some boundary curvatures. This distance is the distance of the inserted neuron from the boundary. The parameter max is calculated in the step (8b). Accordingly, it is desired that every inserted neuron moves to the boundary and is not deleted unless the inserted neuron and its two neighboring neurons move to the boundary in the same direction. To retain the inserted neuron and allow it to move toward the boundary, it is preferred to execute the deletion step after each mline epochs. The larger the value to mline is, the more opportunity the inserted neuron has to go away from its two neighboring neurons. If mline is set equal to 1, the maximum distance of the inserted neuron from the connected line between its two neighboring neurons is equal to max . But the neuron often does not move toward the vertical direction of the connected line. By assuming a uniform distribution for the motion of the neuron, the mean displacement toward the vertical direction of the connected line is approximately equal to max /2. So if the value of dl is approximately larger than max /2, the inserted neuron will be deleted; otherwise it will not be deleted and will move toward the boundary. In each epoch, the value of max for every neuron is bounded inside xw max dmin , where dmin is the neuron distance from the nearest feature point. As a result, the parameter dl should not be larger than an upper bound; because otherwise some inserted neurons will be deleted. This upper bound is proportional to the parameter xw and . The closer parameter is to 1, the larger the upper bound will be; and the closer is to zero, the closer will the upper bound be to the approximate value of xw /2 (by assuming a uniform distribution for the motion of the neuron). Certainly, if the distribution of the motion vector is normal, the upper bound will be larger than xw /2. If dl is larger than the upper bound, there is another way to resolve the problem of neuron deletion. We can allow the inserted neuron to move away by increasing the value of mline . In this way, the upper bound will approximately be mline times.
(
)
--
5
Using step 12, the neurons reside on the boundary in proportion to the boundary curvatures. Also, the number of trained neurons during the training phase is decreased compared to the original algorithm in which step 12 is ignored. As a result, the computational complexity of the training phase is decreased. In the literature, there are other curve simplification methods, such as the quadratic-error-based simplification method [23], which are widely used and show a good performance. Incorporating such methods in our algorithm may increase the performance and speed of the proposed method. 4. Experimental results Three experiments were carried out to show the efficiency and advantages of the proposed algorithm. The first experiment considers the effect of parameter on the new algorithm. The convergence speeds of the new and original algorithms are compared. The second experiment studies the effect of parameter dl and mline on the new algorithm. Both algorithms are applied to three shapes and the computational complexities are compared. The last experiment employs the new and original algorithms for tracking a moving person. The algorithm was implemented in C++ and run on a computer with a 2.4 GHz Celeron CPU and 512 MB RAM. 4.1. The effect of parameter on the new TASOM algorithm To consider the effect of parameters max and on the speed and accuracy of the new and original algorithms, both algorithms are applied to the star image (Fig. 2) with different values for these parameters. For both algorithms, the following parameters are selected:
l = 1,
h = 3,
sf = 10000,
xw = 1,
sg = 20,
min = 0.01,
= 0.1,
= 0.1,
rmin = 3,
s = 0.01,
s = 0.01 In this example, step 12 of the new algorithm is ignored. As shown in Fig. 2(a), four initial control points are selected to form the initial contour. The contour model represented in Ref. [20] is employed to find the star boundary with max and xw equal to 1. The final contour is obtained after 64 epochs of training the neurons, which took 14.131 s (Figs. 2(c), (d)). To increase the speed of convergence, max is selected equal to 5. But, due to the small value of xw (equal to 1), the neurons do not converge to the boundary, as shown in Figs. 2(e), (f). To make neurons converge with a higher speed, xw is selected equal to 2.5 and max equal to 5. Here, the neurons converged after 16 epochs which took 1.983 s; but as shown in Fig. 2(h), the final contour does not reside on the boundary well. To reside on the boundary exactly, the neurons must move to the boundary with a speed parameter min that requires more than 200 epochs of training (over 50 s). The new model is applied to the star image with equal to 0.9 and the same initial contour as before. As shown in Fig. 2(i), the final contour is drawn inside the star and the neurons are disordered. By increasing the number of initial points to 25, this fault is eliminated (Fig. 2(j)) and the final contour resides on the boundary after 9 epochs which took 0.951 s. Next, the new contour model is applied to the star for equal to 0.7, 0.5, and 0.3 with the same initial contour as before (4 points). After training, the final contour results after 11 (1.552 s), 17 (1.823 s), and 26 (3.395 s) epochs, respectively, as shown in Figs. 2(k), (l), and (m). By increasing the value of , the neurons move to the boundary more smoothly and surely and the algorithm converges in less time. Based on the experiments, the appropriate value for is between 0.3 and 0.5. In general, the proposed
Please cite this article as: M. Izadi, R. Safabakhsh, An improved time-adaptive self-organizing map for high-speed shape modeling, Pattern Recognition (2008), doi:10.1016/j.patcog.2008.10.034
ARTICLE IN PRESS 6
M. Izadi, R. Safabakhsh / Pattern Recognition
(
)
--
Fig. 2. (a) The star image, (b) the star boundary, (c) the neurons' movement towards the boundary using the model proposed in Ref. [20] with max = 1, (d) the resulting contour using the model proposed in Ref. [20] with max = 1, (e) the neurons' movement towards the boundary using the model proposed in Ref. [20] with max = 5, (f) the neurons have not converged, (g) the neurons' movement towards the boundary using the model proposed in Ref. [20] with xw = 2.5, max = 5, (h) the resulting contour using the model proposed in Ref. [20] with xw = 2.5, max = 5; it does not reside on the boundary well, (i) the neurons' movement towards the boundary using the new contour model with p = 0.9; they does not reside on the boundary well, (j) the neurons' movement towards the boundary using the new contour model with p = 0.9; because of increasing the number of initial neurons, the neurons have resided on the boundary well. The neurons' movement towards the boundary using the new contour model with (k) p = 0.7, (l) p = 0.5, and (m) p = 0.3.
algorithm takes much less time than the original algorithm (less than 15%).
r
4.2. The effect of parameter dl and mline on the new TASOM algorithm To consider the effect of parameter dl on the speed and accuracy of the proposed algorithm, the following values are selected for the parameters:
l = 1,
h = 3,
rmax = 300,
s = 0.01,
xw = 1,
sf = 10000,
s = 0.01,
min = 0.01, sg = 20,
= 0.5,
rmin = 3,
= 0.1, mline = 1,
r Fig. 3. Boundary with depression and protrusion of size r.
= 0.1, Itr = 200
By selecting a value for dl , all concave and convex segments of the boundary with a size less than dl will be ignored (Fig. 3). So, if the
parameter dl is large, the accuracy of the boundary approximation will decrease and the final contour may not include the boundary curvatures. As shown in Fig. 4(a), the head shape is modeled with an initial contour made by four neurons and setting dl equal to 2.
Please cite this article as: M. Izadi, R. Safabakhsh, An improved time-adaptive self-organizing map for high-speed shape modeling, Pattern Recognition (2008), doi:10.1016/j.patcog.2008.10.034
ARTICLE IN PRESS M. Izadi, R. Safabakhsh / Pattern Recognition
(
)
--
7
Fig. 4. (a) The head image, (b) the head boundary, (c) the resulting contour for dl = 2, (d) the resulting contour for dl = 0.3, and (e) the resulting contour for dl = 0.5.
Fig. 5. Resulting contours for: (a) dl = 2, p = 0.1, (b) dl = 0.5, p = 0.1, (c) dl = 2, p = 0.7, (d) dl = 2, p = 0.1, xw = 4, (e) dl = 1, p = 0.1, xw = 1, mline = 1, and (f) dl = 1, p = 0.1, xw = 1, mline = 2.
The resulting contour is shown in Fig. 4(c). Setting dl equal to a large value, some of boundary curvatures are ignored and the boundary is not covered well. Instead, a few neurons are used for the final contour (black squares). If parameter dl is smaller, the final contour resides more accurately on the boundary and the boundary curvatures are better covered. As shown in Fig. 4(d), for dl equal to 0.3, the boundary is covered well. Although the number of neurons on the final contour is greater than the latter final contour, it is still not a large number. It is preferred to set dl equal to a value to get a final contour, which desirably covers the boundary with a minimum number of neurons. For example, as shown in Fig. 4(e), it is better to set dl equal to 0.5. The final contour is sufficiently accurate and the number of used neurons is smaller than the one for dl equal to 0.3. As illustrated in Fig. 5(a), the new algorithm is applied to the head image with the previous values for the parameters. and dl are set equal to 0.1 and 2, respectively. Based on the value of , which is close to zero and the value of dl , which is much larger than xw /2 (that is 0.5), some of the boundary curvatures are not covered. If dl is set equal to 0.5 (close to value of xw /2), as shown in Fig. 5(b), the final contour covers the boundary well and the number of neurons, which make the final contour, is approximately minimal. In addition, if dl remains 2 and the value of is changed into 0.7 (which is closer to 1), as shown in Fig. 5(c), the problem of neuron deletion is resolved. By increasing the value of xw , the problem can be resolved, too. As illustrated in Fig. 5(d), the final contour is obtained with , dl , and xw equal to 0.1, 2, and 4, respectively. Although the final contour has covered the boundary well, the final contour has not approached the boundary well since the value of xw is large. Experiments show that it is better to identify the parameters to which dl is related and then to determine dl based on these parameters and the characteristics of the problem. Now, if dl is larger than the upper bound, there is another way to resolve the neuron deletion problem. We can allow the inserted neuron to go away by increasing the value of mline . In this way, the upper bound will approximately be equal to mline . The final contour for , dl , xw , and mline equal to 0.1, 1, 1, and 1, respectively, is shown in Fig. 5(e). The value of dl is large; therefore, some boundary curvatures are not covered by the final contour. But if mline is set
equal to 2, the problem is resolved and the boundary curvatures are covered as shown in Fig. 5(f). Based on experiments, in order to get an accurate and proper result, we suggest xw be selected equal to 1. In addition, it is preferred to select dl equal to 0.5 to cover all curvatures on the shape boundary. By setting dl and xw as above, an accurate and proper contour will often be obtained. Using step 12, the neurons reside on the boundary in proportion to the boundary curvatures. In addition, the number of trained neurons during the training phase is decreased (in comparison to that of the original algorithm, but without step 12). As a result, the computational complexity of the training phase is decreased. To evaluate the computational complexity of the algorithm, the algorithm is applied to three shapes with and without step 12 included. Then, the following parameters are used to compare the two cases: the total number of neurons trained during the training phase, the number of epochs required for convergence, and the algorithm convergence time. The results for the two cases are shown in Table 1. In all cases, the previous parameter values are used and the values of , dl , and mline are selected as 0.5, 0.5, and 1, respectively. Table 1 shows the strong positive impact of using step 12 in the algorithm. For all cases, when step 12 is used, the number of trained neurons is lower than the number of trained neurons when step 12 is not used. Consequently, the convergence time is less when step 12 is used. Using step 12 leads to more epochs (up to one or two) for training, but generally, less time for convergence (less than half of when the step 12 is not used). As shown in Fig. 6, although the boundary of the coin is very rough, the resulting contour is smoother and more proper when step 12 is included in the algorithm. As shown in Fig. 6, all contours obtained with and without step 12 included in the algorithm are very similar. To compare the proposed and the original algorithms, the original algorithm is also applied to the three shapes using the parameters used in the first experiment. As shown in Table 1, the number of epochs and the number of trained neurons are considerably higher than those for the proposed algorithm with and without step 12. In general, the proposed algorithm seems to be about twenty times faster than the original algorithm in this experiment.
Please cite this article as: M. Izadi, R. Safabakhsh, An improved time-adaptive self-organizing map for high-speed shape modeling, Pattern Recognition (2008), doi:10.1016/j.patcog.2008.10.034
ARTICLE IN PRESS 8
M. Izadi, R. Safabakhsh / Pattern Recognition
(
)
--
Table 1 Comparison of the efficiency of using step 12 versus not using it and the original algorithm Image
The proposed algorithm using step 12
Coin Head Ellipse
The proposed algorithm not using step 12
The original algorithm
The number of trained neurons
The convergence speed (epoch)
Time (s)
The number of trained neurons
The convergence speed (epoch)
Time (s)
The number of trained neurons
The convergence speed (epoch)
Time (s)
1502 366 234
16 14 13
0.902 0.371 0.274
1877 1584 736
15 15 12
1.071 1.152 0.601
8799 13077 11679
36 64 59
4.673 6.860 5.508
Fig. 6. The resulting contours with step 12 (a, b, c), and without step 12 (d, e, f).
4.3. Tracking a moving person The proposed model is used in this section to track a person in an image sequence. To track the person, a contour is placed around the person and as he moves, the contour is updated. In each frame, using background subtraction method, the region of the moving person is extracted and enhanced using morphological operations (Figs. 7(a)–(c)). In each frame, the contour obtained for the previous frame is transferred to the current frame (Fig. 7(e)). Then using an iterative algorithm, this contour is drawn outside the extracted region and near the boundary. The resulting contour (Fig. 7(f)) is used as the initial contour for the current frame. Using the Sobel operator, the edges of the extracted region are found and the pixels of edges are used as the feature points in the training phase. For the first frame, the initial contour must be given by the user. To evaluate the efficiency of the new model, the model proposed in Ref. [20] and the new model are used to track a person using the following values for parameters:
l = 1,
h = 3,
sf = 10000,
s = 0.01,
xw = 1,
sg = 20,
s = 0.01
min = 0.01,
= 0.1,
= 0.1,
rmin = 3,
In both models, since there are many initial control points (the initial points for each frame are reduced until the distance between every two consecutive control points becomes 10 pixels), step 11 is started from the first epoch of training. For the new model, Itr, dl , and mline are set equal to 50, 0.25, and 1, respectively. For all neurons, the parameter max is set equal to 1 for the first three epochs. First, the model proposed in Ref. [20] with max equal to 1 is applied to the feature points. As shown in Fig. 8(a), the neurons have moved to the boundary slowly and after 25 epochs which took 3.119 s, they have converged (Fig. 8(b)). If the value of max is increased, the contour will not reside on the boundary well and some neurons will not converge. Then, the new contour model with equal to 0.3 is applied to the feature points. As shown in Fig. 8(c), the neurons have moved to the boundary much faster and they have converged after 20 epochs of training which took 2.453 s. As shown in Fig. 8(d), the neurons have spread on the boundary in proportion to its curvature. Because of having numerous initial control points, the value of can increase. The new contour model for equal to 0.5 and 0.7 is obtained, too. The neurons' movement in both cases is shown in Figs. 8(e) and (f). The neurons have converged after 10 and 8 epochs (0.859 and 0.614 s), respectively. The resulting contours are shown in Figs. 8(g) and (h). In Table 2, the results of the above experiment are illustrated. Based on this table, the total number
Please cite this article as: M. Izadi, R. Safabakhsh, An improved time-adaptive self-organizing map for high-speed shape modeling, Pattern Recognition (2008), doi:10.1016/j.patcog.2008.10.034
ARTICLE IN PRESS M. Izadi, R. Safabakhsh / Pattern Recognition
(
)
--
9
Fig. 7. Contour-based tracking: (a) the previous frame and its extracted contour, (b) current frame, (c) the extracted region of the moving person using a background subtraction method, (d) the boundary of the extracted region, (e) the contour transferred from previous frame to the current frame, and (f) the transferred contour is drawn outside the region of the moving person.
Fig. 8. (a) The neurons' movement towards the boundary using the model proposed in Ref. [20] with max = 1, (b) the resulting contour after 27 epochs for the model proposed in Ref. [20] with max = 1, (c) the neurons' movement towards the boundary using the new model with max = 0.3, (d) the resulting contour after 19 epochs, (e) the neurons' movement towards the boundary using the new model with max = 0.5, (f) the obtained contour after 10 epochs, (g) the neurons' movement towards the boundary using the new model with max = 0.7, and (h) the resulting contour after 8 epochs.
Table 2 Comparison of the speed of the proposed and the original algorithms
The original contour model The proposed model with = 0.3 The proposed model with = 0.5
Number of epochs
Number of trained neurons
Number of final contour neurons
Time (s)
27 19 10
6943 5012 2015
271 187 183
3.119 2.453 0.859
Please cite this article as: M. Izadi, R. Safabakhsh, An improved time-adaptive self-organizing map for high-speed shape modeling, Pattern Recognition (2008), doi:10.1016/j.patcog.2008.10.034
ARTICLE IN PRESS 10
M. Izadi, R. Safabakhsh / Pattern Recognition
of epochs, the total number of trained neurons, and the time of convergence for the new contour model are both lower than those of the original model. Also, the contour resulting from the original model consists of more neurons many of which are not necessary. According to Table 2, the speed of the proposed algorithm is about four times higher than that of the original algorithm. For the new contour model, the larger the value of , the lower the total number of epochs is. It should be noted that if the value of is very close to one, the neurons' movement will be rough and some neurons may not converge.
5. Conclusions and future work This paper proposes improvements in the TASOM-based active contour model to represent object boundaries in images. The motion speed of each neuron in the proposed model is made adaptive based on the neuron distance from the boundary. If the distance is large, the neuron speed increases and vice versa. This feature leads to the increase of the convergence speed of neurons. In addition, the number of neurons required to form the contour is made proportional to the boundary curvatures leading to a decrease in the total number of trainable neurons and thus the computational complexity. So, by increasing the convergence speed and decreasing the total number of trainable neurons, the computational complexity of the new algorithm is highly reduced, making the algorithm useful for real-time applications. The proposed model and the active contour model presented in Ref. [20] were compared in several experiments. The first experiment compared the neuron motion speeds in the two models. The results indicate the greater convergence speed of the new algorithm. In the second experiment, the flexibility of the new model in covering the boundary curvatures was considered by applying both models to three different shapes. The results showed that the number of trainable neurons in the original algorithm is approximately 2.5 times that of the proposed algorithm. To evaluate the new algorithm in real-world applications, the models were employed to track a moving person and the results were compared. These results indicated that the proposed active contour model is fast enough to be utilized in real-time applications. In general, the proposed model is much faster than the original model, but there are some aspects which require further attention. The proposed algorithm takes a larger number of epochs than the original algorithm on concave segments of boundaries due to the removal of the neurons laying on straight lines. If the concave segments on the boundary are detected and the neuron removal step is not applied to neurons close to such segments, then the speed of the algorithm will further increase. We will consider this feature in our future work.
(
)
--
References [1] M. Kass, A. Witkin, D. Terzopoulos, Snakes: active contour models, International Journal of Computer Vision 1 (4) (1988) 312–331. [2] T.F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham, Active shape models—their training and application, Computer Vision and Image Understanding 61 (1) (1995) 38–59. [3] L.H. Staib, J.S. Duncan, Boundary finding with parametrically deformable models, IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (11) (1992) 1061–1075. [4] Y. Wang, L.H. Staib, Boundary finding with correspondence using statistical shape models, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 1998, pp. 338–345. [5] C. Han, W.S. Kerwin, T.S. Hatsukami, J.N. Hwang, C. Yuan, Detecting objects in image sequences using rule-based control in an active contour model, IEEE Transactions on Biomedical Engineering 50 (6) (2003). [6] A. Neumann, Graphical Gaussian shape models and their application to image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (3) (2003). [7] H. Shah-Hosseini, R. Safabakhsh, TASOM: a new time adaptive self-organizing map, IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 33 (2) (2003) 271–282. [8] A.P. Pentland, S. Sclaroff, Closed-form solutions for physically based shape modeling and recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (7) (1991) 715–729. [9] L.H. Staib, J.S. Duncan, Boundary finding with parametrically deformable models, IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (11) (1992) 1061–1075. [10] A. Chakraborty, L. Staib, J. Duncan, An integrated approach to boundary finding in medical images, in: Proceedings of the IEEE Workshop Biomedical Image Analysis, 1994, pp. 13–22. [11] Y. Wang, L. Staib, Boundary finding with correspondence using statistical shape models, in: IEEE Conference Computer Vision and Pattern Recognition, 1998, pp. 338–345. [12] V. Caselles, R. Kimmel, G. Sapiro, Geodesic active contours, International Journal of Computer Vision 22 (1) (1997) 61–79. [13] L.D. Cohen, R. Kimmel, Global minimum for active contour models: a minimal path approach, International Journal of Computer Vision 24 (1) (1997) 57–78. [14] C. Han, T.S. Hatsukami, J.N. Hwang, C. Yuan, A fast minimal path active contour model, IEEE Transactions on Image Processing 10 (2001) 865–873. [15] T. Kohonen, Self-organized formation of topologically correct feature maps, Biological Cybernetics 43 (1982) 59–69. [16] A.J. Abrantes, J.S. Marques, A class of constrained clustering algorithms for object boundary extraction, IEEE Transactions on Image Processing 5 (11) (1996) 1507–1521. [17] Y.N. Venkatesh, N. Rishikesh, Self-organizing neural networks based on spatial isomorphism for active contour modeling, Pattern Recognition 33 (7) (2000) 1239–1250. [18] H. Shah-Hosseini, R. Safabakhsh, A learning rule modification in the selforganizing feature map algorithm, in: Proceedings of the Fourth International CSI Computer Conference, Tehran, Iran, 1999, pp. 1–9. [19] H. Shah-Hosseini, R. Safabakhsh, TASOM: the time adaptive self-organizing map, in: Proceedings of the IEEE International Conference Information Technology: Coding and Computing, 2000, pp. 422–427. [20] H. Shah-Hosseini, R. Safabakhsh, A TASOM-based algorithm for active contour modeling, Pattern Recognition Letters 24 (2003) 1361–1373. [21] H. Shah-Hosseini, R. Safabakhsh, TASOM: a new time adaptive self-organizing map, IEEE Trans. Systems, Man, and Cybernetics, Part: B: Cybernetics 33 (2) (2003). [22] M.H. Khosravi, R. Safabakhsh, Human eye sclera detection and tracking using a modified time-adaptive self-organizing map, Pattern Recognition, 2008, submitted for publication. [23] M. Garland, Y. Zhou, Quadric-based simplification in any dimension, ACM Transactions on Graphics 24 (2) (2005) 209–239.
About the Author—MOHAMMAD IZADI was born in 1980 in Isfahan, Iran. He got his B.S. degree in computer engineering from the University of Isfahan in 2003 and his M.S. degree in computer engineering from Amirkabir University of Technology in Tehran, Iran in 2006. His research interests include computer vision and neural networks.
About the Author—REZA SAFABAKHSH was born in Isfahan, Iran, in 1953. He received the B.S. degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 1976, and the M.S. and Ph.D. degrees in electrical engineering from the University of Tennessee, Knoxville, in 1979 and 1986, respectively. He worked at the University of Tennessee, Knoxville, TN, from 1977 to 1983, at Walters State College, Morristown, TN, from 1983 to 1986, and at the Center of Excellence in Information Systems, Nashville, TN, from 1986 to 1988. Since 1988, he has been with the Department of Computer Engineering and Information Technology, Amirkabir University of Technology, Tehran, where he is currently a Professor and the director of the Computational Vision/Intelligence Laboratory. His current research interests include a variety of topics in neural networks, computer vision, and multiagent systems. Dr. Safabakhsh is a member of the IEEE and several honor societies, including Phi Kappa Phi and Eta Kapa Nu. He is the Founder and a member of the Board of Executives of the Computer Society of Iran, and was the President of this society for the first four years.
Please cite this article as: M. Izadi, R. Safabakhsh, An improved time-adaptive self-organizing map for high-speed shape modeling, Pattern Recognition (2008), doi:10.1016/j.patcog.2008.10.034