Department of Plant Biology, University of Barcelona, Avda. Diagonal 645, Barcelona, ES08028, Spain; 2 Department of Statistics, University of Barcelona, Avda. Diagonal 645, Barcelona, ES-08028, Spain; * Corresponding author; E-mail: [email protected] Abstract
4
Questions: Is it possible to develop an expert system to provide reliable automatic identifications
5
of plant communities at the precision level of phytosociological associations? How can unreliable
6
expert-based knowledge be discarded before applying supervised classification methods?
7
Material: We used 3677 relevés from Catalonia (Spain), belonging to eight orders of terrestrial
8
vegetation. These relevés were classified by experts into 222 low-level units (associations or
9
subassociations).
10
Methods: We reproduced low-level expert-defined vegetation units as independent fuzzy clusters
11
by using the Possibilistic C-means algorithm. Those relevés detected as transitional between
12
vegetation types were excluded in order to maximize the number of units numerically
13
reproduced. Cluster centroids were then considered static and used to perform supervised
14
classifications of vegetation data. Finally, we evaluated the classifier’s ability to correctly
15
identify the unit of both typical (i.e. training) and transitional relevés.
16
Results: Only 166 out of 222 (75%) of the original units could be numerically reproduced.
17
Almost all the unrecognized units were subassociations. Among the original relevés, 61% were
18
deemed transitional or untypical. Typical relevés were correctly identified 95% of times, while
19
the efficiency of the classifier on transitional data was only 64%. However, if the second
1
20
classifier’s choice was also considered the rate of correct classification for transitional relevés
21
was 80%.
22
Conclusions: Our approach stresses the transitional nature of relevé data coming from vegetation
23
databases. Relevé selection is justified in order to adequately represent the vegetation concepts
25% of the expert-defined vegetation units and 61% of the relevés can be considered of
221
transitional nature following our cluster building criteria.
222
Performance of the vegetation classifier
223
The two non-reproduced associations accounted for 27 relevés. The remaining 3650 relevés
224
belonged to associations represented in the classifier, so they were used to assess its performance.
225
We report detailed result tables on the sensitivity and positive predictive power for each
226
association in App. 1. We show in Table 3 the rates of correct identification computed for the
227
eight datasets independently and altogether. The overall rate of correct association identification
228
for the typical relevés was very high: 95% of relevés were classified into the correct association
229
in the first choice, and 99% taking into account the first and second choices of the classifier (see
230
Table 3). This high rate of success is not surprising, since the relevés of this set were those
231
which, by definition, were closest to cluster centroids. In contrast, the classifier identified the
232
correct association for 64% of the relevés of the transitional set. Nevertheless, if we take into
233
account the transitional nature of these relevés, the percentage of correct identification using the 10
234
first and second choices may be a more realistic measure of performance. Over all
235
phytosociological orders, this latter percentage was 79.5%. Identification of beech forests
236
(Fagetalia sylvaticae) was the least successful (66%) and that of Quercus ilex forests and related
237
communities (Quercetalia ilicis) the most successful (89.3%). When considering both typical and
238
transitional relevé, the estimated overall efficiency of the classifier was 76.3% of correct
239
identification on first choice, and 86.9% considering also the classifier’s second choice.
240
Discussion
241
Reproduction of traditional classifications
242
Several attempts of reproduction of traditional vegetation classifications usually forced the
243
reproduction of all expert-defined units into the classifier (e.g. van Tongeren 1986, Hill 1989, van
244
Tongeren et al. 2008). In the case of Kocí et al. (2003), the use of the Cocktail algorithm
245
(Bruelheide 2000) allowed excluding poorly differentiated units, but their approach was still
246
essentially expert-based (Chytrý 2007). Going a step further, we stressed here the necessity of
247
validating traditional vegetation units through the use of an unsupervised clustering method.
248
Although we tried to maximize the amount of vegetation types that could be numerically
249
reproduced, 25% of the original low-level units turned out to be impossible to stand.
250
Subassociations turned out to be more difficult to reproduce because many of them are
251
traditionally defined as a subclass of an association that shows a tendency towards an
252
ecologically neighbouring association (in other words, they are transitional).
253
Moreover, in previous approaches relevé identification was usually performed using
254
assignment rules that were different from the rules originally used in the classification of training
255
data (e.g. Kocí et al. 2003, Tichý 2005, van Tongeren et al. 2008). We preferred to use the
256
resemblance in species abundance values only, as a simple common criterion for both
257
unsupervised and supervised classification. Not using Cocktail’s species groups but overall 11
258
species composition has the advantage that it allows reproducing units lacking differential species
259
(i.e. ‘basal’ or ‘central’ communities). However, the classifier is not expected to provide accurate
260
results with such units due to their high variability and amount of transitional relevés.
261
Performance of the vegetation classifier
262
Whereas inconsistency in the original classification methods can be avoided by applying
263
numerical clustering, it reappears when attempting to evaluate the efficiency of the classifier
264
because the reference classification is expert-based. That is, the precision in the original
265
assignments may be affecting the percentages of successful identification. In addition, relevés
266
belonging to transitional subassociations were more difficult to classify correctly than relevés
267
belonging to reproduced vegetation units (even if both were represented at the level of
268
association). This occurred because the classifier lacked centroids to represent these units and
269
hence its relevés were assigned to one of the neighbouring units. The high number of
270
unrecognized subassociations in Fagetalia beech forests (see Table 1) may account for the low
271
classifier results on this data set (Table 2). There are other possible sources of low supervised
272
classification efficiency, derived from inconsistencies in the sampling methods that different
273
authors use. Otýpková & Chytrý (2006) showed that smaller plots tend to produce less stable
274
ordinations in data sets of low beta diversity. The lecture of their findings in terms of
275
classification is that relevés from small plots may be easily misclassified because of their higher
276
degree of variability both in species presence and abundance. The same reasoning may be applied
277
to the inconsistent recording of cryptogams.
278
Sampling and the appropriate representation of vegetation types
279
We carefully selected the relevés included in the training set, which certainly is a critical point
280
in our approach and must be justified. Statistically speaking, such relevé selection is still a
281
subjective decision that completely biases sampling and precludes any inference on the validity 12
282
of groups. Hence, one cannot expect to accurately reflect the real patterns of vegetation.
283
Moreover, Cerná & Chytrý (2005) found that selecting plots with diagnostic species as training
284
set resulted in lower efficiency of neural network classifiers compared to using a randomly
285
selected training set. Nevertheless, nowadays vegetation scientists generally agree that vegetation
286
is mainly of continuous nature. Therefore, as long as an optimal vegetation sampling theory is
287
lacking, statistical inference on clustering results will remain a delicate subject (e.g. Rolecek et
288
al. 2007). Meanwhile, vegetation classification should not aim at discovering true vegetation
289
types, but should provide a knowledge basis for performing applied ecological studies. Having
290
this in mind, we considered more important to keep the vegetation concept to be reproduced very
291
clear. We set a specific point in the multivariate space (i.e. the cluster centroid) as the
292
representative of the expert-defined unit. Not including transitional relevés into the centroid
293
definition helped in keeping it as an ideal type. Ensuring that the nomenclatural type relevé (if
294
available) shows a high membership to the unit would be a way to allow using the syntaxon name
295
for the fuzzy cluster.
296
Limitations of the numerical cluster model
297
Note that our numerical cluster model assumes roughly spherical clusters, both when building
298
PCM clusters and when executing the FCM classifier. One of Dale’s (1995) criticisms to FCM
299
was its inability to cope with non-spherical cluster shapes. Although it is possible allow
300
hyperellipsoidal clusters in FCM and PCM algorithms (Krishnapuram & Keller 1993), by taking
301
into account the cluster variance-covariance matrix. Another limitation of our approach is that
302
FCM membership function works better with clusters of similar size. PCM typicality function
303
may be used instead, but at the expense of obtaining values which cannot be interpreted as
304
probabilities.
305
Final remarks and future work 13
306
In our opinion, vegetation scientists should decide whether they would prefer: (1) a vegetation
307
classifier designed as an interface to communicate expert vegetation knowledge to non-experts;
308
or (2) a computer program like the former, but which could also promote the revision of the
309
expert knowledge itself. In the first case the program would simply run supervised classification
310
methods from a knowledge that would be assumed to be true. In contrast, in the second case the
311
system would allow doubting expert knowledge, and even changing his point of view. We
312
believed this second model was more flexible and promising. We implemented our proposals in a
313
set of related computer programs called Araucaria (see App. 2 and
314
http://biodiver.bio.ub.es/vegana/araucaria). One of them allows experts to feed the classifier with
315
new plot data, and see how the current set of PCM clusters “reacts” to this new information.
316
Regarding future developments, we strongly believe that a comparison of vegetation
317
classification methodologies is necessary, not only in terms of efficiency but also aiming a
318
unification of traditional and numerical approaches. Since vegetation classifications are
319
regionally restricted, studying solutions for biogeographical issues (e.g. vicariant units) would be
320
another interesting research topic. Nevertheless, large-scale vegetation expert systems (say valid
321
for all Europe) will certainly be difficult to develop.
322
Acknowledgements
323
We would like to thank Lubomir Tichý and an anonymous reviewer for their very useful
324
comments on a previous version of this manuscript. This study was supported by a Ph.D. grant
325
awarded by the “Comissionat per a Universitats i Recerca” (1999SGR00059), of the
326
“Departament d’Universitats, Recerca i Societat de la Informació de la Generalitat de Catalunya”
327
(2001 FI 00269), and by a research project from the Spanish “Ministerio de Educación y Ciencia”
328
(CGL2006-13421-C04-01/BOS).
14
329
References
330
Bezdek, J. C. 1981. Pattern recognition with fuzzy objective functions. Plenum Press, New York.
331
Bolòs, O. de & Vigo, J. 1984. Flora dels Països Catalans. Vol. 1. Ed. Barcino, Barcelona.
332
Bolòs, O. de, Vigo, J., Masalles, R. M. & Ninot, J. M. 1990. Flora Manual dels Països Catalans.
333
2nd ed. Pòrtic, Barcelona.
334
Braun-Blanquet, J. 1964. Pflanzensoziologie: Grundzüge der Vegetationskunde. Springer.
335
Bruelheide, H. 2000. A new measure of fidelity and its application to defining species groups.
336 337 338 339
Journal of Vegetation Science 11(2): 167-178. Cerná, L. & Chytrý., M. 2005. Supervised classification of plant communities with artificial neural networks. Journal of Vegetation Science 16: 407-414. Chytrý., M. (ed.) 2007. Vegetation of the Czech Republic. 1. Grassland and Heathland
Dale, M. B. 1988. Some fuzzy approaches to phytosociology. Ideals and instances. Folia geobotanica et phytotaxonomica 23: 239-274.
344
Dale, M. B. 1995. Evaluating classification strategies. Journal of Vegetation Science 6:437-440.
345
Davé, R. N. & Krishnapuram, R. 1997. Robust clustering methods: a unified view. IEEE
346 347 348 349
transactions on fuzzy systems 5: 270-293. De Cáceres, M., Oliva, F. & Font, X. 2006. On relational possibilistic clustering. Pattern recognition 39: 2010-2024. Devillers, P., Devillers-Terschuren, J. & Ledant, J.-P. (1991). CORINE biotopes manual.
350
Habitats of the European Community. A method to identify and describe consistently sites
351
of major importance for nature conservation. Data specifications - Part 2. Office for
352
Official Publications of the European Communities. Luxembourg. 15
353
Ejrnæs, R., Bruun, H. H., Aude, E. & Buchwald, E. 2004. Developing a classifier for the Habitats
354
Directive grassland types in Denmark using species lists for prediction. Applied
355
Vegetation Science 7: 71-80.
356 357 358 359 360
Escudero, A. & Pajarón, S. 1994. Numerical syntaxonomy of the Asplenietalia petrarchae in the Iberian Peninsula. Journal of Vegetation Science 5: 205-214. Font, X. 1993. Estudis geobotànics sobre els prats xeròfils de l’estatge montà dels pirineus. Institut d’Estudis Catalans, Barcelona, ES. Font, X. 2008. Mòdul Flora i Vegetació. Banc de Dades de Biodiversitat de Catalunya.
361
Generalitat de Catalunya i Universitat de Barcelona.
362
http://biodiver.bio.ub.es/biocat/homepage.html
363 364 365 366 367 368 369
Hill, M. O. 1989. Computerized matching of relevés and association tables, with an application to the British National Vegetation Classification. Vegetatio 83: 187-194. Hill, M. O. 1996. TABLEFIT version 1.0, for identification of vegetation types. Institute of Terrestrial Ecology, Huntingdon, UK. Jennings, M. 2003. Guidelines for Describing Associations and Alliances of the US National Vegetation Classification. Ecological Society of America. Knollová, I., Chytrý, M., Tichý, L. & Hajek, O. 2005. Stratified resampling of phytosociological
370
databases: some strategies for obtaining more representative data sets for classification
371
studies. Journal of Vegetation Science 16: 479-486.
372
Kocí, M., Chytrý, M. & Tichý, L. 2003. Formalized reproduction of an expert-based
373
phytosociological classification: A case study of subalpine tall-forb vegetation. Journal of
374
Vegetation Science 14: 601-610.
375 376
Krishnapuram, R., & J. M. Keller. 1993. A possibilistic approach to clustering. IEEE transactions on fuzzy systems 1: 98-110.
16
377 378 379 380
Krishnapuram, R. & Keller, J. M. 1996. The possibilistic c-means algorithm: Insights and recommendations. IEEE transactions on fuzzy systems 4: 385-393. Legendre, P. & Gallagher, E. D. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271-280.
381
Legendre, P., & Legendre, L. 1998. Numerical Ecology. 2nd english ed. Elsevier.
382
Marsili-Libelli, S. 1989. Fuzzy clustering of ecological data. Coenoses 4: 95-106.
383
Moraczewski, I. R. 1993. Fuzzy logic for phytosociology: 1. Syntaxa as vague concepts.
384 385 386 387 388
Vegetatio 106: 1-11. Mucina, L. 1997. Classification of vegetation: Past, present and future. Journal of Vegetation Science 8: 751-760. Mucina, L. & van der Maarel, E. 1989. Twenty years of numerical syntaxonomy. Vegetatio 81: 1-15.
389
Noble, I. R. 1987. The role of expert systems in vegetation science. Vegetatio 69: 115-121.
390
Orlóci, L. 1967. An agglomerative method for classification of plant comunities. Journal of
391 392 393
Ecology 55: 193-206. Otýpková, Z. & Chytrý, M. 2006. Effects of plot size on the ordination of vegetation samples. Journal of Vegetation Science 17: 465-472.
394
Podani, J. 1990. Comparison of fuzzy classifications. Coenoses 5: 17-21.
395
Pot, R. 1997. SYNDIAT, SYNtaxonomical DIAgnostics Tool, a computer program based on the
396
deductive method of community identification. Acta Botanica Neerlandica 46: 230.
397
Rao, C. R. 1995. A review of canonical coordinates and an alternative to correspondence analysis
398
using Hellinger distance. Qüestiió (Quaderns d'Estadistica i Investivació Operativa) 19:
399
23-63.
400 401
Rodwell, J. S., Pignatti, S., Mucina, L. & Schaminée, J. H. J. 1995. European Vegetation Survey: update on progress. Journal of Vegetation Science 6: 759-762. 17
402
Rolecek, J., Chytrý, M., Háyek, M., Lvoncik, S. & Tichý, L. 2007. Sampling in large-scale
403
vegetation studies: Do not sacrifice ecological thinking to statistical puritanism. Folia
404
Geobotanica 42: 199-208.
405 406 407 408 409 410 411 412 413
Tichý, L. 2002. JUICE, software for vegetation classification. Journal of Vegetation Science 13: 451-453. Tichý, L. 2005. New similarity indices for the assignment of relevés to the vegetation units of an existing phytosociological classification. Plant Ecology 179: 67-72. van der Maarel, E. 1979. Transformation of cover-abundance values in phytosociology and its efects on community similarity. Vegetatio 39: 97-114. van Tongeren, O. 1986. FLEXCLUS, an interactive program for classification and tabulation of ecological data. Acta Botanica Neerlandica 35: 137-142. van Tongeren, O., Gremmen, N., & Hennekens, S. M. 2008. Assignment of relevés to predefined
414
classes by supervised clustering of plant communities using a new composite index.
415
Journal of Vegetation Science 19: 525-536.
416
Willner, W. 2006. The association concept revisited. Phytocoenologia 36: 67-76.
417
18
Phytosociological order
Short description
Non-reproduced units
Training (typical) rel.
their low-level classification.
Reproduced units
418
Original relevés
Table 1. The eight phytosociological orders studied and results of the numerical reproduction of
Original units
417
Brometalia erecti
mesophytic or slightly xerophytic pastures
30
531
26
4
231
Origanetalia vulgaris
herb communities growing on forest fringes
12
133
10
2
67
Galio-Alliarietalia
megaforb sciophilous communities
13
124
12
1
71
Prunetalia spinosae
shrub communities growing on decideous forest fringes
18
353
16
2
161
Populetalia albae
riverine meso-macroforests growing on wet fluvisols with high water-table
17
199
10
7
107
Quercetalia ilicis
mediterranean woodlands, scrublands and maquis
31
753
25
6
254
Quercetalia pubescentis
submediterranean decideous oak woodlands
41
651
30
11
243
Fagetalia sylvaticae
beech forests
60
933
37
23
286
222
3677
166
56
1420
Total
419 420 421
Table 2: Main mathematical characteristics of the Fuzzy C-means (FCM) and Possibilistic C-
422
means (PCM) clustering algorithms. FCM Fuzzy membership definition
c
!
i =1
c
uij = 1 for all objects j = 1, ..., n c
Optimisation function
PCM
!
i =1
c
n
i =1 j =1
c
Membership function
n
for all objects j = 1, ..., n
c
n
i =1
j =1
J PCM = ! ! (uij ) m eij2 + ! #i ! (1 " uij ) m
J FCM = ! ! (uij ) m eij2
uij = 1 / ! (eij / elj ) 2 /( m "1)
uij > 0
i =1 j =1
(1)
uij = 1 /(1 + (eij2 / "i )1/( m!1) )
(2)
l =1
423 424
19
424
Table 3. Classification efficiency of the numerical classifier at the association level. Column
425
blocks list the efficiency on the typical and transitional relevé sets, as well as the overall
426
efficiency for the represented associations. Ass.: Number of represented associations. %:
427
Percentage of relevés correctly classified; L/U: Lower/upper 95% confidence limits following the
428
binomial distribution. Typical 1st choice Phytosociological order
429
L
Transitional 1st/2nd choice
U
%
L
U
1st choice Rel.
%
L
Represented
1st/2nd choice U
%
L
U
1st choice Rel.
%
L
1st/2nd choice
Ass.
Rel.
%
U
Brometalia erecti
20
231
97.4
94.4 99.0
99.1
96.9 99.9
285
68.8 63.5
74.6
85.6 81.1 89.6
516
81.6 78.4 85.2
91.7 89.1 94.0
%
L
U
Origanetalia vulgaris
10
67
92.5
83.4 97.5
100.0
94.6 100.0
66
39.4 27.6
52.2
78.8 67.0 87.9
133
66.2 57.5 74.1
89.5 83.0 94.1
Galio-Alliarietalia
11
71
94.4
86.2 98.4
97.2
90.2 99.7
53
56.6 42.3
70.2
73.6 59.7 84.7
124
78.2 69.9 85.1
87.1 79.9 92.4
Prunetalia spinosae
9
161
96.3
92.1 98.6
98.8
95.6 99.8
192
72.9 66.0
79.1
85.9 80.2 90.5
353
83.6 79.3 87.3
91.8 88.4 94.4
Populetalia albae
7
107
92.5
85.8 96.7
94.4
88.2 97.9
92
64.1 53.5
73.9
82.6 73.3 89.7
199
79.4 73.1 84.8
88.9 83.7 92.9
Quercetalia ilicis
13
254
99.2
97.2 99.9
99.2
97.2 99.9
487
80.9 79.4
86.7
89.3 88.2 93.8
741
87.2 86.7 91.5
92.7 92.2 95.9
Quercetalia pubescentis
10
243
90.5
86.1 93.9
98.8
96.4 99.7
408
65.7 60.9
70.3
82.1 78.0 85.7
651
75.0 71.4 78.2
88.3 85.6 90.7
Fagetalia sylvaticae
22
286
96.2
93.2 98.1
99.0
97.0 99.8
647
49.0 45.1
52.9
66.0 62.2 69.6
933
63.5 60.3 66.5
76.1 73.2 78.8
Total 102
1420
95.4
94.2 96.4
98.6
97.8 99.1
2230
64.1 62.1
66.2
79.5 77.9 81.3
3650
76.3 75.1 77.9
86.9 86.0 88.2
430 431
20
431
Fig. 1: Example of clustering results of FCM and PCM on relevés belonging to three grassland
432
associations of Brometalia erecti. (a) Classical multidimensional scaling coordinates from Bray-
433
Curtis distances, with the original vegetation units labelled using different symbols (filled circles:
8 subassociations). 9. Methods: We reproduced low-level expert-defined vegetation units as independent fuzzy clusters ... statistical tools such as quadratic discriminant analysis (Ejrnæs et al. 2004) and specially .... Whenever possible, we create one possibilistic fuzzy cluster for each traditional low-level. 134 vegetation unit ...
Forest recovery was further secured by the creation of a protected area ..... ordination of species cover data (App. 1). The Stress ..... Version 3. 0. MjM Software.
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Bullying escolar ...
Position short description: We are seeking a young researcher in agronomy/agroecology/ecology and soil-crop modelling who will work on modelling intercrops ...
... loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. transformada de place de la delta de dirac.pdf. transformada de place de la
There was a problem loading more pages. Retrying... tabla-de-factores-de-conversion-de-unidades.pdf. tabla-de-factores-de-conversion-de-unidades.pdf. Open.
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. CABALLO DE ...
02 estudo-de-viabilidade-de-sistemas-de-informa.pdf. 02 estudo-de-viabilidade-de-sistemas-de-informa.pdf. Open. Extract. Open with. Sign In. Main menu.
La investigación y la educación son partes fundamentales en todas las sociedades para el mejoramiento de las condiciones, bienestar, reconstrucción del caos social, y de las circunstancias que así lo demanden, por otro lado, así como para el desarrol
dad profesional). Revela este nivel un fondo importante de formas aragonesas (extendidas en el español de Aragón y de áreas limÃ- trofes): abortÃn ('abortón de animal'), ansa ('asa'), fuina ('garduña'), lami- nero ('goloso'), paniquesa ('comad
sistema-de-control-de-polizas-de-jdc-jarquin.pdf. sistema-de-control-de-polizas-de-jdc-jarquin.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying ...
el modo en que tratas allá nuestra vida. Page 3 of 62. Laberinto de Fortuna de Juan de Mena.pdf. Laberinto de Fortuna de Juan de Mena.pdf. Open. Extract.