Int. J. Signal and Imaging Systems Engineering, Vol. 3, No. 1, 2010
31
A modified training scheme for SOFM to cluster multispectral images T.N. Nagabhushan* and D.S. Vinod Department of Information Science and Engineering, Sri Jayachamarajendra College of Engineering, Mysore, India E-mail:
[email protected] E-mail:
[email protected] *Corresponding author Abstract: In this paper, we propose modifications to Kohonen’s Self-Organising Feature Map (SOFM) to achieve faster convergence specifically with respect to multispectral images. First, the raw image is pre-processed using data reduction technique to obtain reduced data set and then Condensed Nearest Neighbour (CNN) rule is applied to yield standard subset of samples. The samples in the standard subset are used to find the Best Matching Unit (BMU) and the samples in the reduced data set are used to update BMU and its neighbouring neurons. The SOFM is tested on • Synthetic image data set and • Harangi 1991, 1992 image data sets. Results are compared with conventional SOFM. Keywords: SOFM; self-organising feature map; CNN; condensed nearest neighbour; reduced data set; standard subset; BMU; best matching unit; multispectral image clustering. Reference to this paper should be made as follows: Nagabhushan, T.N. and Vinod, D.S. (2010) ‘A modified training scheme for SOFM to cluster multispectral images’, Int. J. Signal and Imaging Systems Engineering, Vol. 3, No. 1, pp.31–39. Biographical notes: T.N. Nagabhushan received his BE Degree in Electrical Engineering from the University of Mysore and Master’s in Electrical Engineering at Indian Institute of Science. He obtained his PhD Degree from Indian Institute of Science in the area of constructive learning RBF networks. He is the chairman of Information Science and Engineering at Sri Jayachamarajendra College of Engineering, Mysore, India. His research interests include machine learning algorithms and applications. D.S. Vinod received his Bachelor’s Degree in Electronics and Communications Engineering and Master’s Degree in Computer Engineering from the University of Mysore, India. He did his PhD at Visvesvaraya Technological University. He did his research work on Multispectral Image Analysis. He is currently working as Faculty at the Department of Information Science and Engineering, Sri Jayachamarajendra College of Engineering, Mysore, India. His research interests include image processing, neural networks and algorithms.
1
Introduction
Clustering of multispectral image data is a complex process and requires consideration of many factors like size of the image, feature dimension, noise, overlapping clusters, number of clusters, unequal cluster density and unequal cluster size (Tran et al., 2005). A number of clustering algorithms have been applied to multispectral image data sets. Lu and Weng (2007) address the problem of clustering of remotely sensed images. They discuss the different methods in determination of clustering algorithm, selection of suitable training samples, feature extraction pre-processing, post-processing and cluster accuracy assessment. The advantage of neural networks in clustering
Copyright © 2010 Inderscience Enterprises Ltd.
of remotely sensed data over conventional clustering algorithms is suggested by Roli et al. (1996). Srinivas et al. (2008) used SOFM to remotely sensed data in analysis of regional flood. Kohonen’s SOFM is one of the widely used unsupervised neural networks that partition the input space into different clusters based on similarity among data samples (Kohonen, 1982a, 1982b). Conventional SOFM algorithm consumes long training cycles when large size image data is fed to it. With every presentation of data pattern, SOFM updates BMU and its neighbours. Multispectral images are multiband images taken from different bands of a geostationary satellite and would occupy a large memory space. They are often rich in redundancy. Hence, clustering with conventional
32
T.N. Nagabhushan and D.S. Vinod
SOFM requires more memory space besides longer training cycles. To improve the learning characteristics of conventional SOFM, we propose novel modifications to the learning algorithm. The modifications are introduced in the computation of BMU and updating its neighbours in the input space. The raw image is subjected to data reduction process to obtain a reduced data set. Further, CNN rule is applied on the reduced data set to generate standard subset of samples. The BMU is determined using standard subset of samples while the BMU and its topological neighbours are updated using reduced data samples. The modified procedure is discussed in the subsequent sections.
2
Unsupervised learning with SOFM
Kohonen’s (1982a, 1982b) SOFM performs unsupervised learning on the input data samples and create clusters. Many variants of SOFM have appeared in the literature to deal with uniform and non-uniform data distributions (Kohonen, 1987, 1993; Kohonen et al., 1990). SOFM transforms the input patterns of arbitrary dimensions into a one or two-dimensional discrete map in a topologically ordered manner. Each neuron is fully connected to all the patterns in the input. The SOFM algorithm proceeds by initialising the synaptic weights in the network. This is done by randomly selecting required number of patterns for the input. Once the network is initialised, for each input pattern, the winner or BMU is found. The winner neuron and its topological neighbours are updated. The procedure is repeated for finite number of iterations.
3
Proposed modifications
Since conventional SOFM has limitation on large data sets having variable redundancies, specifically with multispectral images, we propose to introduce the following modifications to original SOFM algorithm: •
•
Generate reduced data from original image data. The reduced data set is used to update the BMU and its neighbours. We have proposed modifications to Gowda’s (1984) data reduction technique to cover both the single band images as well as multiband images. The proposed modification achieves data reduction without dimensionality reduction. A standard subset of samples is further obtained from reduced data set. This standard subset of samples is used to determine the BMU.
The above-mentioned process requires offline computations on image data and two efficient procedures have been used to get reduced data set and standard data subset, respectively.
3.1 Generation of reduced data set Since the multispectral images often have high degree of redundancies, using all data samples to train SOFM render it inefficient. Therefore, it is essential to pre-process the image data to generate a reduced data set. There have been many data reduction procedures proposed in the literature. They include the works of Wehrens et al. (2004), Gowda (1984), Jagannathan et al. (1996), Tate (1994), Zhu and Po (1996), Vasilyev (1997), Chitroub et al. (2001), Mielikainen and Kaarna (2002) and De Backer et al. (1998). Most of them do dimensionality reduction besides data reduction (De Backer et al., 1998; Gowda, 1984). One of the most significant work on data reduction of multispectral images is due to Gowda (1984). Gowda (1984) proposed a scheme of multi-stage isodata clustering incorporating dimensionality reduction and data reduction to multispectral images. Here, the d features are transformed to three-dimensional primary colour space namely blue, green and red coordinates. The three-dimensional data is reduced by the application of storage bin arrays. This is followed by a multi-stage isodata technique incorporating a novel seed point picking method to obtain the desired number of clusters. The disadvantage of this algorithm is that the algorithm cannot be applied for clustering single band images, as well as analysing any one of the selected individual bands in the multispectral image data set. It is generally desirable to hold all the features without dimensionality reduction and obtain subset of data that can lead to efficient clustering. Efficient clustering has two characteristics, fast convergence and less storage requirements. Hence, we have proposed modifications to Gowda’s (1984) method to cover single band images as well as multiband images. The proposed modification achieves data reduction without dimensionality reduction and works with both multiband images and single band images. The same algorithm is modified to retain all features and generate only reduced data set. To reduce multiband image data with d spectral bands, d number of nb × nb storage bin matrices are required. The samples (pixel values) are stored in the corresponding storage bins depending on the feature values. The bins are updated as and when the samples are stored. The updated features in the bins form the reduced data set. The procedure to obtain reduced data set is presented here. Procedure to obtain reduced data set
A modified training scheme for SOFM to cluster multispectral images Table 2 Non-empty bin
33
Reduced data set obtained in non-empty bins Samples in bin
Normalised feature values
1
–
–
2
–
–
3
1, 5
1.42647, 0.985294, 0.352941, 0
2
1.38235, 0.823529, 0.352941, 0
4 5
3, 4, 6, 8, 9,10 1.62255, 0.911766, 0.916666, 0.25
6
7
2, 0.882353, 1.32353, 0.352941
7
–
–
8
–
–
9
–
–
3.2 Generation of standard subset
We have introduced modifications to Gowda’s (1984) procedure and proposed an approach that keeps the features of the image intact and remove the redundancy using storage bin arrays. The reduced data set is obtained without sacrificing the dimensionality of the images. Illustration Consider a data set with 10 samples shown in Table 1 where each sample has four features. Assigning a bin size of 3 × 3 with a threshold of 0.1, the algorithm generates samples, which are shown in Table 2. It can be seen that only four bins are non-empty and all the features take one of the bins having reference numbers 3, 4, 5 and 6, respectively. It is observed that bins 4 and 6 are assigned single samples 2 and 7, respectively. Hence, these bins contain only the normalised feature values of the respective samples as the updated data. Table 2 shows the normalised average value of the features in respective bins. Thus, the algorithm while preserving all features yields reduced data. Table 1
Data set
Sample
Feature values
1
5.1, 3.5, 1.4, 0.2
2
4.9, 3.0, 1.4, 0.2
3
4.7, 3.2, 1.3, 0.2
4
4.6, 3.1, 1.5, 0.2
5
5.0, 3.6, 1.4, 0.2
6
5.4, 3.9, 1.7, 0.4
7
7.0, 3.2, 4.7, 1.4
8
6.4, 3.2, 4.5, 1.5
9
6.9, 3.1, 4.9, 1.5
10
6.3, 3.3, 6.0, 2.5
In conventional SOFM training, a BMU is determined for every presented pattern. In the proposed approach, we determine a standard subset of samples from reduced data set using CNN rule. This will greatly influence the speed of convergence. The concept of standard subset of samples was introduced by Subba Reddy et al. (1996) and CNN rule was introduced by Gowda and Krishna (1979). The CNN decision rule iteratively produces a consistent subset from the reduced data set. This subset when used as a stored reference set for the nearest neighbour decision rule correctly clusters all the samples belonging to the original reduced data set used in training the network. The above-mentioned concepts are used to narrow down the samples for determining the BMU. The procedure to determine standard subset of samples is given here. Procedure to obtain standard subset
34
T.N. Nagabhushan and D.S. Vinod
Illustration Consider a data set with 10 samples shown in Table 1 where each sample has four features. The result of the application of data reduction technique is shown in Table 2. To obtain standard subset of samples for this reduced data set, CNN is applied with threshold value Tnear = 1. This resulted in two representative samples in the standard subset as shown in Table 3. Sample 1 in standard subset represents non-empty bins 3 and 4, similarly sample 2 represents non-empty bins 5 and 6. Table 3 Samples in standard subset 1 2
An example to obtain standard subset of samples
Feature values Samples in reduced data set 1.42647, 0.985294, 0.352941, 0 1.426, 1.0147, 0.352, 0 1.38235, 0.823529, 0.352941, 0 1.620, 0.923, 1.62255, 0.911766, 0.916666, 0.25 0.914, 0.25 2, 0.882353, 1.32353, 0.352941
3.3 Farthest neighbour initialisation of neurons To initialise the neurons, we have adopted the concept of farthest neighbour (Gowda, 1984) rather than choosing the weight vectors at random from input space. This is done to ensure that all the samples in the input space are covered and identified in a proper sequence. The procedure to initialise the grid is presented here. Procedure farthest neighbour
5
4
Modified training procedure for SOFM
The reduced data set Ired of d dimension with nred number of samples and standard subset Isub of d dimension with ncon number of samples obtained during pre-processing stage are presented to the SOFM. The initial weight vectors are selected by the farthest neighbour method from the reduced image data set. The algorithm is described here. Algorithm
Experiments
To determine the number of clusters and check the quality of clusters, cluster validity index (Jain and Dubes, 1988) has been used. To measure the performance of the clustering algorithm, we have used Davies Bouldin Index (DBI) (Davies and Bouldin, 1979) and Cluster Tendency Index (CTI) (Sudhanva and Chidananda Gowda, 1992). For good clusters, a minimum value of DBI is preferred and a maximum value of CTI is preferred. We have experimented the modified training scheme on the following data sets: •
Four band synthetic image
•
Harangi 1991 image
•
Harangi 1992 image.
All the experimental simulations have been carried out on having Intel P-4 3.3 GHz, 1 GB RAM personal computer.
A modified training scheme for SOFM to cluster multispectral images Figure 1
5.1 Experiment on synthetic image This image has four spectral bands and the size of the image is 256 × 256. The image has three clusters and the features are derived from Iris flower data set. The samples from three classes of Iris flower data set are randomly picked and placed to form different clusters of the image. To obtain reduced set of samples, we first apply the data reduction technique on the image data set. This yielded nine samples in the reduced data set. In the next stage, CNN rule is applied to the reduced set of samples. This results in five samples in the standard subset, which are representative of samples present in the reduced data set for threshold Tnear = 0.39. The reduced data set and standard subset of samples is presented to the modified SOFM. The simulation parameters are shown in Table 4. The results obtained are shown in Figure 1(c) and (d) and Table 5. The clusters obtained for the conventional SOFM are shown in Figure 1(a) and (b). The cluster composition for both conventional SOFM and modified SOFM is shown in Table 6. Table 4
Simulation Parameters for synthetic image
Number Parameter
Value
1
Number of Samples
65536
2
Number of features
4
3
Size of storage bin array
4
Threshold for data reduction Tdr Number of Iterations
6
Number of weight vectors
7
Threshold to obtain subset of samples Tnear
0.39
8
Initial learning constant η(0)
0.9
9
Initial neighbourhood σ(0)
0.9
10
Time constant for neighbourhood Tne
5.0
11
Time constant for learning rate Tle
500
Figure 1
0.1 1500
(a)
Table 5
(b)
(d)
Result of synthetic image
Number Parameter
Value
1
Samples in the reduced data set
9
2
Samples in the standard subset
5
3 4
Number of clusters Time taken by conventional SOFM
3
5
Time taken for data reduction technique
0.71 second
6
Time taken by the modified training scheme
0.04 second
7
Total time taken by the proposed SOFM
0.75 second
Table 6
3
Clusters produced by SOFM and modified SOFM on synthetic image: (a) clusters produced by SOFM; (b) pie chart for SOFM; (c) clusters produced by modified SOFM; (d) pie chart for modified SOFM (see online version for colours)
Clusters produced by SOFM and modified SOFM on synthetic image: (a) clusters produced by SOFM; (b) pie chart for SOFM; (c) clusters produced by modified SOFM; (d) pie chart for modified SOFM (see online version for colours) (continued)
(c)
5×5
5
35
Clusters obtained by conventional SOFM and modified SOFM on synthetic image Conventional SOFM
Cluster
137.32 seconds
Number of sample
Modified SOFM
Percentage
Number of samples
Percentage
Cluster 1
43323
66.1056
43323
66.1057
Cluster 2
13945
21.2783
11295
17.2348
Cluster 3
8268
12.6159
10918
16.6595
The results are compared with ground truth information in Table 7. It is found that the clusters produced by conventional SOFM have an accuracy of 100%, 95.98% and 71.51% for clusters 1, 2 and 3, respectively. The overall accuracy of clusters obtained is 94.54%, whereas modified SOFM produced clusters with an overall accuracy of 100%. The time taken to obtain the reduced data set is 0.71 s and training time taken by modified SOFM is 0.04 s. The total execution time taken by modified SOFM is 0.75 s whereas the conventional SOFM took 137.32 s. The DBI and CTI values for conventional SOFM are 0.271626 and 34.1612, respectively. The DBI and CTI values for modified SOFM are 0.560911 and 43.9303, respectively.
36 Table 7
T.N. Nagabhushan and D.S. Vinod Comparison of clusters obtained by SOFM and modified SOFM on synthetic image Conventional SOFM
Cluster
Number of samples matched
Modified SOFM
Number of samples mismatched
Number of samples matched
Number of samples mismatched
% error
0
43323
0
0
% error
Cluster 1
43323
Cluster 2
10834
461
4.0815
11295
0
0
Cluster 3
7807
3111
28.4942
10918
0
0
61964
3572
5.4504
65536
0
0
Total
0
Figure 2
5.2 Harangi 1991 image A registered satellite image acquired on 3 February 1991 covering Harangi reservoir and its adjacent area of Coorg district of Karnataka state in India has been considered. The image is acquired from LISS-II B1 sensor of IRS 1A satellite. The size of the image is 260 × 260 and has four spectral bands. Initially, data reduction procedure is applied to get reduced samples with 270 samples. A standard subset of samples containing 25 samples are derived using modified CNN rule. The simulation parameters are shown in Table 8. Table 8
Simulation Parameters for Harangi 1991 image
Number Parameter
Value
1 2 3
Number of samples Number of features Size of storage bin array
67600 4
4 5 6 7 8
Threshold for data reduction Tdr Number of Iterations Number of weight vectors Threshold to obtain subset of samples Tnear
9 10 11
Initial learning constant η(0) Initial neighbourhood σ(0) Time constant for neighbourhood Tne Time constant for learning rate Tle
1 2 3 4 5 6 7
(b)
(c)
(d)
0.9 500 500
Result of Harangi 1991 image
Number Parameter
(a)
25 × 25 2.0 3600 7 3 0.9
The results obtained are shown in Table 9. The clusters produced by modified SOFM are shown in Figure 2(c) and (d), and the cluster composition is shown in Table 10. The clusters produced by conventional SOFM are shown in Figure 2(a) and (b), and the cluster composition is shown in Table 10. Table 9
Clusters produced by SOFM and modified SOFM on Harangi 1991 image: (a) Clusters produced by SOFM; (b) Pie chart for SOFM; (c) Clusters produced by modified SOFM; (d) Pie chart for modified SOFM (see online version for colours)
Value
Samples in the reduced data set 270 Samples in the standard subset 25 Number of clusters 7 Time taken by conventional SOFM 945.77 seconds Time taken for data reduction technique 0.68 second Time taken by the modified training 3.3 seconds scheme Total time taken by the proposed 3.98 seconds SOFM
Table 10
Cluster composition of Harangi 1991 image produced by SOFM and modified SOFM Conventional SOFM
Cluster
Samples
Modified SOFM
Percentage
Samples
Percentage 33.51
Cluster 1
7
0.0104
22654
Cluster 2
11097
16.4157
8539
12.6317
Cluster 3
18580
27.4852
223
0.3299
Cluster 4
11893
17.5932
5
0.0074
Cluster 5
4921
7.2796
5873
8.6879
Cluster 6
13627
20.1583
1735
Cluster 7
7475
11.0577
28571
2.5666 42.26
The four major ground covers generated by modified SOFM are water, forest, coffee and crop shown in Table 11. Water body showing Harangi reservoir is conspicuous among the ground covers. Water, forest and crop are represented by clusters 2, 7 and 5, respectively. Coffee is represented
A modified training scheme for SOFM to cluster multispectral images by clusters 1 and 6. Clusters 3 and 4 are insignificant and cannot be considered as major ground covers. The four major ground covers generated by conventional SOFM are water, forest, coffee and crop shown in Table 11. Water and crop are represented by clusters 7 and 5, respectively. Forest is represented by clusters 4 and 6 and coffee is represented by clusters 2 and 3. Cluster 1 is not significant and hence cannot be identified as a major ground cover. From Table 11, the percentage of ground covers is 11.05%, 37.75%, 43.9% and 7.28% for water, forest, coffee and crop, respectively, for conventional SOFM. In case of modified SOFM, the percentage of ground covers for water, Table 11
37
forest, coffee and crop is 12.63%, 45.22%, 33.11% and 8.68%, respectively. Only a small variation in number of samples per ground cover is observed. The execution time of conventional SOFM is 945.77 s. The time taken to obtain the reduced data set is 0.68 s and training time taken by modified SOFM is 3.3 s hence the total execution time of modified SOFM is 3.98 s. Hence, the execution of modified SOFM is much faster than that of conventional SOFM. The DBI and CTI values for the modified SOFM are 0.96427 and 90.7235, respectively. The DBI and CTI values for conventional SOFM are 0.81755 and 101.363, respectively.
Major ground covers for Harangi 1991 image generated by SOFM and modified SOFM Conventional SOFM
Ground cover Water
Cluster 7
Modified SOFM
Samples
Percentage
7475
11.0576
Cluster
Samples
Percentage
2
8539
12.6317
Forest
4, 6
25520
37.7514
7
28571
42.26
Coffee
2, 3
29677
43.9008
1, 6
24389
36.07
5
4921
7.2795
5
5873
8.6879
Crop
5.3 Experiment on harangi 1992 image A registered satellite image acquired on 27 December 1992 covering Harangi reservoir and its adjacent area of Coorg district of Karnataka state in India has been considered. The image is acquired from LISS-II B1 sensor of IRS 1B satellite. The size of the image is 260 × 260 and has four bands. The data reduction yielded 294 samples in reduced data set and CNN rule yielded 25 samples in the reduced data set. Table 12 indicates the simulation parameter for the data set selected. Figure 3(a) and (c) depicts the cluster generated by SOFM and modified SOFM, respectively. Results obtained is shown in Table 13. Conventional SOFM was tried for 7 clusters and the number of iterations was 3600, whereas modified SOFM produced 10 clusters for 5100 iterations. The cluster compositions are shown in Table 14. The modified SOFM generated four major ground covers namely forest, water, coffee and crop. Forest is represented by clusters 3, 6 and 9. Water, coffee and crop are represented by clusters 4, 2 and 7, respectively. Water body showing Harangi reservoir is prominent among the clusters obtained for all the three methods. Other clusters of insignificant size are not considered for identification of major ground covers. The four major ground covers generated by conventional SOFM are water, forest, coffee and crop as depicted in Table 15. Water and crop correspond to clusters 1 and 2, respectively. Coffee is represented by clusters 4 and 5. Forest is represented by clusters 3 and 7. As the number of samples in cluster 6 is small, it cannot be considered as a major ground cover. From Table 15, the percentage of ground covers is 13.5%, 25.75%, 49.97% and 10.06% for water, forest,
coffee and crop, respectively, for modified SOFM. In conventional SOFM, the percentage of ground covers for water, forest, coffee and crop is 12.55%, 23.81%, 49.97% and 13.18%, respectively, as shown in Table 15. Only a small variation in number of samples per ground cover is observed. The time taken to obtain the reduced data set is 0.65 s and training time taken by modified SOFM is 6.3 s hence the total execution time of modified SOFM is 6.95 s. The execution time of conventional SOFM is 924.23 s hence the execution of modified SOFM is faster when compared with the conventional SOFM. The DBI and CTI values obtained for the modified SOFM are 0.797923 and 280.181, respectively. For the conventional SOFM, the values are 0.814746 and 114.939, respectively. Table 12 Simulation parameters for Harangi 1992 image Number Parameter
Value
1
Number of samples
67600
2
Number of features
4
3
Size of storage bin array
4
Threshold for data reduction Tdr
25 × 25
5
Number of Iterations
6
Number of weight vectors
7
Threshold to obtain subset of samples Tnear
8
Initial learning constant η(0)
0.9
9
initial neighbourhood σ(0)
0.9
10
Time constant for neighbourhood Tne
1000
11
Time constant for learning rate Tle
1000
2.0 3600 or 5100 7 or 10 3
38
T.N. Nagabhushan and D.S. Vinod
Figure 3
Clusters produced by SOFM and modified SOFM on Harangi 1992 image: (a) Clusters produced by SOFM; (b) Pie chart for SOFM; (c) Clusters produced by modified SOFM; (d) Pie chart for modified SOFM (see online version for colours)
Table 13
Result of Harangi 1992 image
Number Parameter
Value
1 2
Samples in the reduced data set Samples in the standard subset
3
Number of clusters
4
Time taken by conventional SOFM
10
Time taken for data reduction technique
0.65 second
6
Time taken by the modify ed training scheme Total time taken by the proposed SOFM
6.3 seconds
Table 14
(c) Table 15
Cluster
(d)
6.95 seconds
Cluster composition for Harangi 1992 image produced by SOFM and Modified SOFM Conventional SOFM
(b)
923.23 seconds
5
7
(a)
294 25
Samples
Percentage
Modified SOFM Samples
Percentage
Cluster 1
8487
12.5547
4
Cluster 2
8914
13.1864
33785
0.0059
Cluster 3
12426
18.3817
11490
16.9970
Cluster 4
19194
29.3935
9131
13.5074
Cluster 5
11939
17.6612
1968
2.9112
Cluster 6
2966
4.3876
2744
4.0592
Cluster 7
3674
7.15
49.97
5.4349
4834
Cluster 8
–
–
453
0.6701
Cluster 9
–
–
3178
4.7012
Cluster 10
–
–
13
0.0192
Major ground covers generated by SOFM and modified SOFM for Harangi 1992 image Conventional SOFM
Ground cover
Modified SOFM
Cluster
Sample
Percentage
Cluster
Sample
Percentage
Water
1
8457
1205547
4
9131
13.5074
Forest
3, 7
16100
23.8165
3, 6, 9
17412
25.7573
Coffee
4, 5, 6
34099
50.44
2
33785
49.97
2
8914
13.18
5, 7
6802
10.06
Crop
Comparing results of Harangi 1991 and Harangi 1992, a small rise in percentage of water in the year 1992 can be observed. A rise in percentage of coffee plantation is also seen. A drastic reduction in the percentage of forest area is also observed. From this, it can be inferred that the cultivation of coffee and other crops has led to the deforestation in a span of one year.
6
Conclusion
In conventional training scheme, the data set is directly applied to train SOFM. This presents a serious limitation in terms of time and memory requirements when clustering multispectral image data. From experiments, it is found that a large memory is required to store clusters of images in conventional SOFM.
We have proposed a modified training scheme for SOFM to overcome the limitations of conventional SOFM. Modifications have been proposed to train SOFM using farthest neighbourhood initialisation of weight vectors followed by BMU selection based on CNN rule and BMU and neighbourhood neurons are updated using reduced data set samples. In the modified SOFM, the original image data is drastically reduced using data reduction technique. Hence, very less memory is required to obtain clusters of images. It is found that the time required to obtain clusters using modified SOFM is significantly less than that of conventional SOFM. Experiments have clearly demonstrated that the modifications introduced to train SOFM have resulted in faster convergence while maintaining improved accuracy among different clusters.
A modified training scheme for SOFM to cluster multispectral images
References Chitroub, S., Houacine, A. and Sansal, B. (2001) ‘Principal component analysis of multispectral images using neural network’, ACS/IEEE International Conference on Computer Systems and Applications, Beirut, Lebanon, pp.89–95. Davies, D.L. and Bouldin, D.W. (1979) ‘A cluster separation measure’, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-1, No. 2, pp.224–227. De Backer, S., Naud, A. and Scheunders, P. (1998) ‘Non-linear dimensionality reduction techniques for unsupervised feature extraction’, Pattern Recognition Letters, Vol. 19, pp.711–720. Gowda, K.C. (1984) ‘A feature reduction and unsupervised classification algorithm for multispectral data’, Pattern Recognition, Vol. 17, No. 6, pp.667–676. Gowda, K.C. and Krishna, G. (1979) ‘The condensed nearest neighbor rule using the concept of mutual nearest neighborhood’, IEEE Transactions on Information Theory, Vol. IT-25, No. 4, pp.488–490. Jagannathan, S., Nagabhushan, P., Gowda, K.C. and Rajangam, R.K. (1996) ‘A number theory based image coding’, Australian and New Zealand Conference on Intelligent Information Systems, Adelaide, Australia, pp.147–150. Jain, A.K. and Dubes, R.C. (1988) Algorithms for Clustering Data, Prentice-Hall, Englewood Cliffs, New Jersey. Kohonen, T. (1982a) ‘Analysis of a simple self-organizing process’, Biological Cybernetics, Vol. 44, pp.135–140. Kohonen, T. (1982b) ‘Self-organized formation of topologically correct feature maps’, Biological Cybernetics, Vol. 43, pp.59–69. Kohonen, T. (1987) ‘Adaptive, associative and self-organizing functions in neural computing’, Applied Optics, Vol. 26, No. 3, pp.4910–4918. Kohonen, T.K. (1993) ‘Things you haven’t heard about the self-organizing map’, Proceeding of IEEE International Conference on Neural Networks, San Francisco, California, pp.1147–1156. Kohonen, T.K., Kangas, J.A. and Laaksonen, T. (1990) ‘Variants of self-organizing maps’, IEEE Transactions on Neural Networks, Vol. 1, No. 1, pp.93–99.
39
Lu, D. and Weng, Q. (2007) ‘A survey of image classification methods and techniques for improving classification performance’, International Journal of Remote Sensing, Vol. 5, No. 28, pp.823–870. Mielikainen, J. and Kaarna, A. (2002) ‘Improved back end for integer PCA and wavelet transforms for lossless compression of multispectral images’, Proceedings 16th International Conference on Pattern Recognition, Vol. 2, Quebec, Canada, pp.257–260. Roli, F., Serpico, S.B. and Vernazza, G. (1996) Fuzzy Logic and Neural Network Handbook, McGraw-Hill, pp.1501–1528. Srinivas, V.V., Shivam Rao, T., Rao, R.A. and Govindaraju, S. (2008) ‘Regional flood frequency analysis by combining self-organizing feature map and fuzzy clustering’, Journal of Hydrolog, Vol. 348, pp.148–156. Subba Reddy, N.V., Nagabhushan, P. and Gowda, K.C. (1996) ‘A neural network based expert system model for conflict resolution’, Australian New Zealand Conference on Intelligent System, Adelaide, Australia, pp.229–232. Sudhanva, D. and Chidananda Gowda, K. (1992) ‘Dimensionality reduction using geometric projections: a new technique’, Pattern Recognition, Vol. 25, No. 8, pp.809–817. Tate, S.R. (1994) ‘Band ordering in lossless compression of multispectral images’, Data Compression Conference, DCC ‘94, Proceedings, pp.311–320. Tran, T.N., Wehrens, R. and Buydens, L.M.C. (2005) ‘Clustering multispectral images: a tutorial’, Journal of Chemometrics and Intelligent Laboratory Systems, Vol. 77, pp.3–17. Vasilyev, S.V. (1997) ‘An optimal data loss compression technique for remote surface multiwave mapping’, Astronomical Data Analysis Software and Systems VII ASP Conference Series, Vol. 145, pp.133–136. Wehrens, R., Buydens, L.M.C., Fraley, C. and Raftery, A.E. (2004) ‘Model-based clustering for image segmentation and large datasets via sampling’, Journal of Classification, Vol. 21, No. 2, pp.231–253. Zhu, C. and Po, L.M. (1996) ‘Partial distortion sensitive competitive learning algorithm for optimal codebook design’, Electronics Letters, Vol. 32, No. 19, pp.1757–1758.