Constructing Dynamic Frames of Discernment in Cases of Large Number of Classes Yousri Kessentini1 , Thomas Burger2, and Thierry Paquet1 1

Universit´e de Rouen, Laboratoire LITIS EA 4108, site du Madrillet, St Etienne du Rouvray, France {yousri.kessentini,thierry.paquet}@univ-rouen.fr 2 Universit´e Europ´eenne de Bretagne, Universit´e de Bretagne-Sud, CNRS, Lab-STICC, F-56017 Vannes cedex, France [email protected]

Abstract. The Dempster-Shafer theory (DST) is particularly interesting to deal with imprecise information. However, it is known for its high computational cost, as dealing with a frame of discernment Ω involves the manipulation of up to 2|Ω| elements. Hence, classification problems where the number of classes is too large cannot be considered. In this paper, we propose to take advantage of a context of ensemble classification to construct a frame of discernment where only a subset of classes is considered. We apply this method to script recognition problems, which by nature involve a tremendous number of classes. Keywords: Dempster-Shafer theory, Dynamic frames of discernment, Data fusion.

1

Introduction

The Dempster-Shafer theory (DST) [1, 2] is a particularly interesting theory to deal with imprecise, conflictive or partial sources of information. The counterpart of this efficiency is its high computational complexity. One of the main reasons for this complexity is related to the state-space (or frame of discernment): When the actual value ω0 taken by a variable W is only known to belong to a set Ω = {ω1 , . . . , ω|Ω| } of possible values, the distributions encoding some knowledge on W are defined over 2|Ω| focal elements (the element of the power set of Ω, noted P(Ω)), leading to an exponential number of elements to deal with. To balance that, several methods exist. The most natural idea is to try to reduce the number of such focal elements, by forcing some mass assignments to 0, so that the remaining ones display a particular structure which is supposed to be relevant with respect to the knowledge encoded (for instance, Bayesian [3], consonant [4], and k-additive mass functions [5]). Another natural idea is to reduce the size of Ω with various low-cost processings, so that the refined modeling of DST is left only to the most interesting possible values for W. In [6], mass functions are defined directly on Ω rather than P(Ω), but it is only possible if Ω is fitted with a partially ordered structure. Finally, in [7], W. Liu (Ed.): ECSQARU 2011, LNAI 6717, pp. 275–286, 2011. c Springer-Verlag Berlin Heidelberg 2011 

276

Y. Kessentini, T. Burger, and T. Paquet

the authors propose to consider coarsened frames, to reduce the computational cost of the following Dempster’s rule. On the contrary, many works consider a problem dual of ours, i.e. constructing an exhaustive frame thanks to multiple evidences based on partial frames, such as in [8] or [9]. In this paper, we consider classification problems (i.e. the variable W is a class variable) where the number of classes involved (i.e. |Ω|) is very large, such as in handwriting word recognition, where a dictionary may contain up to 100, 000 words. To face the corresponding computational issue, we propose to reduce the size of Ω. To do so, we propose to take advantage of a context of ensemble classification to construct a frame of discernment where only a subset of classes is considered. These classes are dynamically selected according to the diversity of the classifiers involved. We propose and compare different strategies to build such a dynamic frame. We show that the proposed strategies considerably reduce the complexity of a DST approach to ensemble classification, while providing a statistically significant improvement of the classification performances with respect to classical probabilistic combination methods. The paper is structured as follows: In Section 2, we recall the basis of handwriting recognition, and we recall some results on ensemble classification in the context of DST. In Section 3, we present four different strategies to define dynamic frames of discernment. Finally, we compare them in Section 4 on Latin and Arabic handwriting datasets, and we discuss the results.

2 2.1

Handwriting Word Recognition Background

One of the most popular technique for automatic handwriting recognition is to use generative classifiers based on Hidden Markov Models (or HMM) [10]. For each word ωi of a lexicon Ωlex = {ω1 , ..., ωV } of V words, a HMM λi , i ≤ V is defined, so that λi best fits a training set made of several different instances of words (these instances are called example words). Practically, this training phase is conducted by using the Viterbi EM or the Baum-Welch algorithm [10]. Then, when a new unknown word ω is considered (a test word from a testing set ), the likelihoods P(ω|λi ), ∀i ≤ V are approximated by the likelihoods provided by the Viterbi decoding algorithm (noted L(ωi ), ∀i), and ω is recognized as ωj for which L(ωj ) ≥ L(ωi ), ∀i ≤ V . Generally, in the evaluation step, the classifier does not provide only the “best” class, but an ordered list of the TOP N best classes. Then, for each value of n ≤ N , a recognition rate can be computed as the percentage of words for which the ground truth class is proposed in the first n elements of the TOP N list. This complete set-up is called an HMM classifier. In order to improve recognition accuracy, it is classical to define several HMM classifiers, each working on different features (then, the likelihood of the q-th classifier for ωi is noted Lq (ωi )), and to combine them [11,12,13,14]. It has been established in [15], that using a set of three classifiers working respectively on the upper contour of the

Dynamic Frames of Discernment

277

pen mark, the lower contour, and the ink density, provides accurate results both on Latin and Arabic datasets. There are several ways to combine these classifiers. The most classical way is to consider the product of the Q likelihoods for each class. It corresponds to the assumptions that the features used by the classifiers are independent, and that the product of the likelihoods1 is the likelihood of the resulting ensemble classification. In the sequel, we refer to this method as the reference probabilistic method (RPM). On the other hand, we have demonstrated in [16,17] the superiority of several evidential combination methods to several classical probabilistic combination strategies, including the RPM (which appears to be the best non evidential method). 2.2

DST Combination of HMM Classifiers

We assume that the reader is familiar with the basic elements of the DempsterShafer theory. Unfamiliar readers should refer to [1, 2], where the following notions are presented: power set, mass function, focal element, vacuous/categoric/ consonant/Bayesian mass functions, conjunctive (or Dempster’s rule of-) combination, pignistic transform and discounting. The combination of several probabilistic classifiers in the DST is a widely studied topic [11, 18, 19, 20, 21, 22, 23, 24, 25], that we have already reviewed in previous works of ours [17, 26]. Here, to combine the results of several HMM classifiers, we use the following procedure inspired from previous works of ours [17]: First, for each of the Q classifiers, we normalize the likelihoods so that they sum up to one over the whole set of classes. Second, a mass function is derived from each of the Q classifiers. Third, the accuracy rates of the classifiers (derived from a crossvalidation procedure) are used to weight each mass function according to the reliability of each classifier. Fourth, the Q mass functions are combined together. Finally, a probabilistic transform is applied, and the so-derived probability values are sorted decreasingly to provide the TOP N list. Concerning the first and second steps, several methods may be used. We have compared several of them in [16, 17], and finally, we consider the use of a sigmoid function for the normalization, and the use of the inverse pignistic transform [4] for the conversion onto a mass function. The inverse pignistic transform converts an initial probability distribution p into a consonant mass assignment. The resulting consonant mass assignment, denoted by p, is built as follows: The elements of Ω are ranked by decreasing probabilities such that p(ω1 ) ≥ . . . ≥ p(ω|Ω| ), and we have p

  ω1 , ω2 , . . . , ω|Ω| = p (Ω) = |Ω| × p(ω|Ω| )

(1)

p ({ω1 , ω2 , . . . , ωi }) = i × [p(ωi ) − p(ωi+1 )] ∀ i < |Ω| p (.) = 0 otherwise. 1

These likelihoods are possibly weighted by the TOP 1 accuracy rate of each classifier, if the information is available after a cross-validation procedure.

278

Y. Kessentini, T. Burger, and T. Paquet

The reason for this choice is manifold: First, it corresponds to the best trade-off between computational complexity and performances. Second, it has no parameter to tune. Third, it provides a consonant mass function, which is interesting for computational as well as epistemological reasons. As a matter of fact, the result of a classifier is an ordered list, the natural representation of which in the DST is a consonant mass function [26]. Then, the probability distribution provided by each classifier can be seen as the pignistic transform of a particular consonant mass function, that is recovered via the inverse pignistic transform. Concerning the third step, it is either possible to use all the TOP N accuracy rates ∀N ≤ |Ωlex |, in a manner similar to that of [17] (which generalizes the method of [11]), or to simply use the TOP 1 accuracy rates, by the application of a classical discounting. In spite of involving less information, we have chosen the second option. The reason is that exactly the same information (only the TOP 1 accuracy rates) can be used to weight the classifiers in the RPM (by multiplying each probability value given by a classifier, by its TOP 1 accuracy rate). On the other hand, the method described in [17] (involving all the TOP N accuracy rates) has no counterpart in the RPM. Hence, by choosing the second option, we guarantee that the probabilistic and DST-based methods remain comparable. In the fourth step, we consider by default a conjunctive combination. The reasons for such a default choice are those which are detailed in [27]. It is also possible to perform differently, as detailed in [28], where different combination are considered depending on the pairwise conflict among the classifiers. Despite its real interest from a performance point of view, the conditions required to make a choice among the different combinations are not adapted to handwriting recognition problems, and the framework proposed in [27] better corresponds to our situation. Finally, the probability transform we use is the pignistic transform, which sounds natural, to remain coherent with respect to the processings of step 2. Hence, if . is is the pignistic transform, and if pq is the probability provided by the qth classifier, then, we consider the following classification procedure: ω∗ = arg max m(ωi ) i

2.3

Q ∩ q=1 [pq ] where m = 

Computational Issues

In handwritten word recognition, the set of classes is of a very high size with respect to the cardinality of the state space in classical DST problems (up to 100, 000 words). When dealing with a lexicon set of V words, the mass functions involved are defined on 2V values. Moreover, the conjunctive combination of two V mass functions involves up to 22 multiplications and 2V additions. Thus, the computational cost is exponential with respect to the size of the lexicon, and 100, 000 worlds are not directly tractable. To remain efficient, even for large vocabularies, it is mandatory either to reduce the complexity, or to reduce the size of the lexicon involved. To do so, as noted in the previous section, consonant mass functions (with only V focal elements) may be considered. In addition, it is also possible to reduce the size of

Dynamic Frames of Discernment

279

the lexicon by eliminating all the word classes which are obviously not adapted to the test word under consideration. Hence, we consider only the few word classes among which a mistake is possible because of the difficulty of discrimination. Consequently, instead of working on Ωlex = {ω1 , ..., ωV }, we use another frame ΩW , defined according to each particular test word W we aim at classifying. That is why we say that such a frame is dynamically defined. This strategy is rather intuitive and simple. On the other hand, to our knowledge, no work has been published on a comparison of the different strategies which can be used to define such frames. This is achieved in this paper. The next section presents several strategies that will be compared in the sequel.

3

Dynamic Frames of Discernment

In this section, we describe several strategies to derive a frame ΩW of reduced size from Ωlex , the latter being too large. Note that, depending on the test word W, the number of classes among which the discrimination remains difficult may vary. Hence, naturally, the size of ΩW may vary accordingly. On the other hand, it is possible to force the frames to be of the same cardinality whatever the test word (by truncating or extending it with useless classes). Hence, there are two options: A fix or a variable cardinality of the frames ΩW , ∀W. In this paper, we have chosen the second option. It does not improve the results, but it helps to have a more standard basis for the comparison of the different strategies, as a poor strategy to select the classes to built ΩW will not be balanced by looser constraints on the acceptance/rejection of the classes. Let us consider Q classifiers. Each classifier q provides an ordered list lq = q {ω1q , ..., ωN } of the TOP N best classes and their corresponding likelihoods noted q L(ωi ), ∀i < N . The different strategies described below take as input the lists lq ∀q ≤ Q and construct a dynamic frame with a controlled size M ≤ N ≤ |Ωlex |. 3.1

Strategy 1: Intersection

Here, the frame ΩW is made of all the words which are common to the output lists lq , ∀q < Q. Obviously, |ΩW | depends on the lists: If the Q classifiers globally concur, their respective lists are similar and an important proportion of their N words are likely to be found in their intersection. On the contrary, if the Q classifiers mostly disagree, very few words belong to the intersection of the lists. Here, we expect |ΩW | to remain constant. Hence, the lists of all the Q classifiers are considered for increasing values of N , until the intersection of the lists is made of exactly M words (draws are randomly sorted). As the algorithm on which the classifiers are based requires having all the probability values of each of the Ωlex words, the lists with N = |Ωlex | are directly available and this iterative strategy has no influence from a computational point of view. Intuitively, the motivation for this strategy is to use the intersection scheme to reduce the number of potential classes for the test word. The idea behind it is that the conjunctive combination is also based on intersections of sets, and that, by discarding all the empty intersections before its computation, unnecessary computations are suppressed while keeping the important ones.

280

3.2

Y. Kessentini, T. Burger, and T. Paquet

Strategy 2: Union

This strategy is exactly similar to the previous one, but the frame of discernment is made of the union of lists rather than on the intersection. Contrary to previous strategie, if the Q classifiers globally concur, their respective lists are similar and very few words belong to the union of the lists. On the contrary, if the Q classifiers mostly disagree, an important proportion of their N words are likely to be found in their union. Hence, we adjust the value of N to control the size of the powerset, in practice a powerset size between 15 and 20 is used. The idea motivating this strategy is the following: If a single classifier fails and provides too bad a rank to the real class, the other classifiers will not balance the mistake when considering the intersection strategy. Then, the union may be preferable. 3.3

Strategy 3: Borda Count

A major problem with the two previous strategies, is that, the rank in each list is not involved in the creation of ΩW . Hence, we propose to use a Borda Count procedure: Each class receives a number of votes corresponding to the sum of its ranks in the Q lists. Then, the M word classes with the smallest number of votes are selected to compose ΩW . From a computational cost point of view, this preprocessing is rather light, as it involves Q × |Ωlex | additions and a call to a sorting function. In practice a powerset size of 15 classes is used. 3.4

Strategy 4: Probabilistic Pre-processing

In spite of its lack of accuracy, the RPM is interesting: First, it is rather cheap from a computational point of view. Second, whatever the value δ, it is possible to achieve an accuracy of 100 − δ% at TOP N if N is great enough. Trivially, if N = |Ωlex |, then, the TOP N accuracy rates is 100%, but most of the time, an accuracy rate of 100% is achieved for smaller values of N . Then, the idea is simply to select for ΩW the M best classes according to the RPM, (among which the real class is likely to belong to), i.e. the M classes for which the product of the likelihood is the greatest, and to use the DST ensemble classification as a tool to refine the decision. Thus, the RPM is used to discard all the classes but M (even if it corresponds to a great number of classes, this is the simple step involving less computation), and afterward, the DST based method is used to discard the remaining M − 1 classes (this discrimination being more complex, more computational resources are allowed to it), so that a single class (hopefully, the right one) remains. In practice a powerset size of 15 classes is used. The four proposed strategies built a dynamic powerset with a fixed size in order to reduce the computational cost. In practice, a powerset size of maximum 20 classes is used. This represents a good compromise between complexity and performance. Note that it is practically impossible to deal with powerset size of 100 due to the high the computational cost.

Dynamic Frames of Discernment

4 4.1

281

Experiments and Results Datasets and HMM Classifiers

Experiments have been conducted on two publicly available databases: IFN/ ENIT benchmark database of arabic words and RIMES database for latin words. The IFN/ENIT [29] contains a total of 32,492 handwritten words (Arabic script) of 946 Tunisian town/village names written by 411 different writers. Four different sets (a, b, c, d) are predefined in the database for training and one set (e) for testing. The RIMES database [30] is composed of isolated handwritten word snippets extracted from handwritten letters (Latin script). In our experiments, 36,000 snippets of words are used to train the different HMM classifiers and 3,000 words are used in the test. The dictionary is composed of 1,612 words. Even if the number of words is rather small with respect to a real size lexicon (up to 100, 000), they are far too numerous, as a frame of discernment of more than 20 classes is not tractable from a computational point of view. Three classifiers are defined, each working on different feature sets: upper contour, lower contour and density, such as described in [15] (see Fig. 1). The TOP 1 and TOP 2 accuracy rates of each of these classifiers is derived from a 10-fold cross-validation on the training sets, which are given in Table 1. This table clearly shows that the two data sets are of heterogeneous difficulty. Moreover, the lower contour is always the less informative feature. Practically, in these experiments, we only use the TOP 1 accuracy rate to weight the different classifiers during their combination, either in the RPM or in DSTbased methods. More precisely, the DST-based method is derived according to the four strategies described above: Intersection (S1), Union (S2), Borda Count (S3) and Probabilistic Pre-Processing, or PPP for short (S4). Practically, we consider for each word, a dynamic frame made of 15 words, in order to have a great enough set, while keeping a reasonable computational complexity.

Fig. 1. DST combination of HMM classifiers

282

Y. Kessentini, T. Burger, and T. Paquet Table 1. Individual performances of the HMM classifiers IFN/ENIT Top 1 Top 2 HMM 1: Upper contour 73.60 79.77 HMM 2: Lower contour 65.90 74.03 HMM 3: Density 72.97 79.73

4.2

RIMES Top 1 Top 2 54.10 66.40 38.93 51.57 53.23 65.83

Results and Discussion

Table 2 displays the results provided by the four strategies including the RPM. First of all, it appears that the worst results are given by the RPM, and that, the DST-based methods are always more efficient. Second, S1 provides the poorest results among the DST-based methods, and the results are rather similar to that of the RPM, whereas all the other strategies provide similar results which are far better to that of the RPM. To us, the most appealing aspect of DSTbased methods with respect to probabilistic ones, lies in the possibility that the various sources of information remain imprecise. In other words, with them, it is possible that the first choice of the classifier is not the good one. Hence, the intersection strategy, by preventing any mistake of a classifier to be balanced by the output of the other classifiers, prevents the combination of the sources to behave as with DST principles. Thus, to us, it seems understandable that S1 has a behavior similar to that of the RPM, rather than to that of the other DST-based strategies. Thus, the various methods/strategies can be divided into two groups: Group 1 is made S2, S3, S4 and group 2 is made of S1 and RPM. More precisely, among group 1, it can be seen that (1) on RIMES, S4 is the best one and S3 is slightly better than S2, (2) on IFN/ENIT, S2 is slightly better than S4, which is in turn better than S3. This would lead us to promote the fourth strategy, based on a probabilistic pre-processing. Nonetheless, this assertion should be motivated. Thus, the next point is to check whether the pairwise differences in the accuracy rates are significant or not. If a difference is significant, it means that the first method is clearly better than the second one. On the contrary, if the difference is not statistically significant, then, the difference of performance is too small to decide the superiority of one method over another (as the results Table 2. Accuracy rates of the various strategies on the two datasets IFN/ENIT Top 1 Top 2 S1: Intersection 80.30 83.90 S2: Union 82.00 86.53 S3: Borda Count 81.17 85.73 S4: PPP 81.83 86.53 RPM 80.07 83.23

RIMES Top 1 Top 2 65.50 74.90 68.30 79.80 68.67 80.13 69.47 80.23 64.80 73.10

Dynamic Frames of Discernment

283

Table 3. The p-values of MacNemar’s test for all the pairwise comparisons on the IFN/ENIT dataset S1 S2 S3 S4 RPM S1: Intersection 6.0 × 10−10 6.6 × 10−7 0.0124 4.0 × 10−5 0.5050 S2: Union . 5.6 × 10−3 0.6400 7.2 × 10−8 S3: Borda Count . . 0.0336 2.8 × 10−3 S4: PPP . . . 6.5 × 10−6 RPM . . . . Table 4. The p-values of MacNemar’s test for all the pairwise comparisons on the RIMES dataset S1 S2 S3 S4 RPM S1: Intersection 6.0 × 10−10 6.0 × 10−10 8.8 × 10−2 7.1 × 10−14 0.2175 S2: Union . 0.4363 1.3 × 10−2 7.6 × 10−10 S3: Borda Count . . 9.1 × 10−2 6.6 × 10−13 S4: PPP . . . 4.8 × 10−16 RPM . . . .

would be slightly different with other training/testing sets). Test of significance is a particular type of statistical hypothesis testing. In our case, the null hypothesis is the equivalence of the methods. Practically, we use MacNemar’s test [31], which is a χ2 test adapted to the comparison of proportions. In Tables 3 and 4, we consider all the pairwise comparisons between two methods, and for each, we compute the p-value, i.e. the probability that the null hypothesis is true. The smaller the p-value, the more the difference of accuracy is likely to be significant. First off all, the p-values confirm our qualitative interpretations of the accuracy rates: On the two datasets S1 and RPM behave similarly, as the probabilities that the differences between the proportions is not significant are rather high (50.50% on IFN/ENIT, and 21.75% on RIMES). Moreover, let us put the methods in decreasing order of performance (S2, S4, S3, S1, RPM for IFN/ENIT and S4, S3, S2, S1, RPM for RIMES), and let us consider the p-values associated to the comparisons of two consecutive methods according to the previous orders (0.6400, 0.0336, 0.0124, 0.5050 for IFN/ENIT and 9.1 × 10−2 , 0.4663, 6.0 × 10−10 , 0.2175 for RIMES). In this setting, we compare each strategy to the 1 or 2 closest other strategies. It can be noted that whatever the dataset, the smallest p-value (i.e. the most significant difference) corresponds to the comparison of the worst strategy of group 1 (S2, S3 and S4) and the best strategy of group 2 (S1 and RPM), which stresses the relevance of these two groups. Amongst group 1, on the two datasets, the strategies are not sorted in the same order with respect to the accuracy rates, and the p-values are rather high, indicating that these methods are roughly equivalent. Nonetheless, S4 appears to be slightly more efficient, and, from our experiments, the latter strategy (Probabilistic Pre-Processing) should be preferred, even if this choice relies on rather

284

Y. Kessentini, T. Burger, and T. Paquet

weak assumptions, as the similarity of the different methods involved requires further experiments for a strong statistical discrimination. Finally, let us point out that the p-value associated to the comparison of S4 and the RPM is so small, that it is almost immaterial. Hence, it proves that, regarding the kind of data involved, the choice of the combination method is no longer questionable, as the DST-based method is definitely more efficient.

5

Conclusion

In this article, we have considered a problem of classifier combinations in the framework of the Dempster-Shafer theory. More precisely, we have considered problems where the set of classes is too large to be considered as a frame of discernment, such as in handwriting word recognition, where a lexicon may contain up to 100,000 words. Thus, we propose to select for each test word, a reduced number of words (those among which the discrimination is the most difficult according to the test word) in the lexicon to build the frame. This frame is dedicated to a particular test word, which led us to call it dynamic. Then, we propose several procedures to select the words of the lexicon to build the dynamic frame. We compare them on 2 different datasets corresponding to Latin and Arabic words, containing respectively 1,612 and 946 word classes each. From our results, the DST provides significantly more accurate results than the reference probabilistic method, in spite of the approximation due to the use of a dynamic frame which does not contain all the words. Thus, our method provides more accurate results while keeping the computational complexity under control. Moreover, among the various strategies to build this dynamic frame, the most efficient one corresponds to the selection of the M words which are ranked the best according to the probabilistic reference method. As a conclusion, probabilistic ensemble classification seems to be an interesting pre-processing to a DST-based ensemble classification on a dynamic frame, when the number of classes in the problem is too great. Further works will include an exhaustive comparison of the various means to take into account information from cross-validation (using various types of discounting, or the method from [17]), and the use of multiple hypothesis testing to make a choice amongst the various strategies described in this article.

References 1. Shafer, G.: A Mathematical Theory of Evidence, Princeton, edn. Princeton University Press, Princeton (1976) 2. Smets, P., Kennes, R.: The transferable belief model. Artificial Intelligence 66(2), 191–234 (1994) 3. Voorbraak, F.: A computationally efficient approximation of Dempster-Shafer theory. International Journal on Man-Machine Studies 30, 525–536 (1989) 4. Dubois, D., Prade, H., Smets, P.: New semantics for quantitative possibility theory. In: Benferhat, S., Besnard, P. (eds.) ECSQARU 2001. LNCS (LNAI), vol. 2143, pp. 410–421. Springer, Heidelberg (2001)

Dynamic Frames of Discernment

285

5. Grabisch, M.: K-order additive discrete fuzzy measures and their representation. Fuzzy Sets and Systems 92, 167–189 (1997) 6. Masson, M.-H., Denoeux, T.: Belief functions and cluster ensembles. In: Sossai, C., Chemello, G. (eds.) ECSQARU 2009. LNCS, vol. 5590, pp. 323–334. Springer, Heidelberg (2009) 7. Denoeux, T., Yaghlane, A.B.: Approximating the combination of belief functions using the fast moebius transform in a coarsened frame. International Journal of Approximate Reasoning 31(1-2), 77–101 (2002) 8. Janez, F., Appriou, A.: Theory of evidence and non-exhaustive frames of discernment: Plausibilities correction methods. International Journal of Approximate Reasoning 18(1-2), 1–19 (1998) 9. Schubert, J.: Constructing and Reasoning about Alternative Frames of Discernment. In: Proceedings of the Workshop on the Theory of Belief Functions (2010) 10. Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989) 11. Xu, L., Krzyzak, A., Suen, C.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst., Man, Cybern. (3) (1992) 12. Kim, J.H., Kim, K.K., Nadal, C.P., Suen, C.Y.: A methodology of combining hmm and mlp classifiers for cursive word recognition. In: International Conference on Pattern Recognition, vol. 2, pp. 319–322 (2000) 13. Prevost, L., Michel-Sendis, C., Moises, A., Oudot, L., Milgram, M.: Combining model-based and discriminative classifiers: application to handwritten character recognition. In: International Conference on Document Analysis and Recognition, vol. 1, pp. 31–35 (2003) 14. Arica, N., Yarman-Vural, F.T.: An overview of character recognition focused on off-line handwriting. IEEE Trans. Systems, Man and Cybernetics, Part C: Applications and Reviews (2), 216–232 (2001) 15. Kessentini, Y., Paquet, T., Hamadou, A.B.: Off-line handwritten word recognition using multi-stream hidden markov models. Pattern Recognition Letters 30(1), 60– 70 (2010) 16. Kessentini, Y., Paquet, T., Burger, T.: Comparaison des m´ethodes probabilistes et ´evidentielles de fusion de classifieurs pour la reconnaissance de mots manuscrits. In: CIFED (2010) 17. Kessentini, Y., Burger, T., Paquet, T.: Evidential ensemble hmm classifier for handwriting recognition. In: Proceedings of IPMU, vol. 6178, pp. 445–454 (2010) 18. Al-Ani, A., Deriche, M.: A new technique for combining multiple classifiers using the Dempster-Shafer theory of evidence. Journal of Artificial Intelligence Research 17(1), 333–361 (2002) 19. Altın¸cay, H.: A dempster-shafer theoretic framework for boosting based ensemble design. Pattern Analysis & Applications 8(3), 287–302 (2005) 20. Burger, T., Aran, O., Caplier, A.: Modeling hesitation and conflict: A belief-based approach for multi-class problems. In: Fourth International Conference on Machine Learning and Applications, pp. 95–100 (2006) 21. Mercier, D., Cron, G., Denoeux, T., Masson, M.-H.: Fusion de d´ecisions postales dans le cadre du mod´ele des croyances transf´erables. Traitement du Signal 24(2), 133–151 (2007) 22. Valente, F., Hermansky, H.: Combination of acoustic classifiers based on dempstershafer theory of evidence. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 4, pp. 1129–1132 (April 2007)

286

Y. Kessentini, T. Burger, and T. Paquet

23. Burger, T., Aran, O., Urankar, A., Akarun, L., Caplier, A.: A dempster-shafer theory based combination of classifiers for hand gesture recognition. In: Computer Vision and Computer Graphics - Theory and Applications. CCIS (2008) 24. Bi, Y., Guan, J., Bell, D.A.: The combination of multiple classifiers using an evidential reasoning approach. Artif. Intell. 172(15), 1731–1751 (2008) 25. Aran, O., Burger, T., Caplier, A., Akarun, L.: A belief-based sequential fusion approach for fusing manual and non-manual signs. Pattern Recognition 42(5), 812– 822 (2009) 26. Burger, T., Kessentini, Y., Paquet, T.: A tutorial on using dempster-shafer theory to combine probabilistic classifiers - application to hand gesture and handwriting recognition. Submitted to Journal of Zhejiang University, Elsevier (2011) 27. Haenni, R.: Are alternatives to Dempster´ıs rule of combination alternatives. Int. J. Information Fusion 3, 237–241 (2002) 28. Quost, B., Masson, M., Denoeux, T.: Classifier fusion in the Dempster-Shafer framework using optimized t-norm based combination rules. International Journal of Approximate Reasoning (2010) 29. Pechwitz, M., Maddouri, S., Maergner, V., Ellouze, N., Amiri, H.: Ifn/enit database of handwritten arabic words. Colloque International Francophone sur l’Ecrit et le Doucement, 129–136 (2002) 30. Grosicki, E., Carre, M., Brodin, J., Geoffrois, E.: Results of the rimes evaluation campaign for handwritten mail processing. In: International Conference on Document Analysis and Recognition, pp. 941–945 (2009) 31. McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)

Constructing Dynamic Frames of Discernment in Cases of Large ...

and Arabic handwriting datasets, and we discuss the results. 2 Handwriting ..... Experiments have been conducted on two publicly available databases: IFN/.

196KB Sizes 1 Downloads 176 Views

Recommend Documents

Constructing Dynamic Frames of Discernment in Cases ...
Artificial Intelligence 66(2),. 191–234 ... Rabiner, L.R.: A tutorial on hidden markov models and selected applications in ... Journal of Artificial Intelligence Re-.

Dynamic Generation of Test Cases with Metaheuristics.pdf ...
test data set in a suitable timeframe and with a. greater coverage than conventional methods. such as ... ultimately communicating, the best solution. that it has ...

Robust Comparative Statics in Large Dynamic Economies - Core
Mar 5, 2014 - operator maps a probability distribution into a probability distribution, so its domain or range is not a lattice in any natural order (Hopenhayn and ...... 13A natural first approach to comparative statics in general equilibrium econom

Frames of reference in social cognition
Jul 30, 2007 - I call this assumption the single-perspective view. We can contrast the .... what I call interaction oriented. One uses .... the chairman of a conference is pointing at you, you .... would forecast for the earning of a company. Despite

Efficient Representations for Large Dynamic Sequences in ML
When the maximal number of elements that may be inserted into ... Permission to make digital or hard copies of part or all of this work for ... Request permissions from [email protected] or Publications Dept., ACM, Inc., fax +1 (212).

A note on constructing large Cayley graphs of given ...
Jul 7, 1997 - to determine the largest order of a graph with given degree and diameter. Many currently known largest graphs of degree ≤ 15 and diameter ≤ 10 have been found by computer search among Cayley graphs of semidirect products of cyclic g

pdf-1832\authenticity-a-biblical-theology-of-discernment-by-thomas ...
pdf-1832\authenticity-a-biblical-theology-of-discernment-by-thomas-dubay.pdf. pdf-1832\authenticity-a-biblical-theology-of-discernment-by-thomas-dubay.pdf.

The Effect of Caching in Sustainability of Large Wireless Networks
today, [1]. In this context, network caching has a key role, as it can mitigate these inefficiencies ... nectivity, and deliver high quality services as the ones already.

A Theory of Dynamic Investment in Education in ...
Academic achievement, which often determines a school's accountability status, does not dramatically change ... for more than one year, ε is modeled as an AR(1) process. 4 ... The trade-off in investment is the current reduction in output (and ...

Epistemological Obstacles in Constructing ... -
in the School of Science and Mathematics Education in the Faculty of Education, University of the Western Cape. .... Words with dual or multiple meaning should also be discussed in mathematics classrooms so that .... EPISTEMOLOGICAL OBSTACLES IN UNDE

Frames in formal semantics - Semantic Scholar
sitional semantics for verbs relating to Reichenbach's analysis of tense ... We will use record types as defined in TTR (type theory with records, [2, 3, 5, 13]) to ...

Efficiency of Large Double Auctions
Similarly let ls(ф) be those sellers with values below Са − ф who do not sell, and let зs(ф) ≡ #ls(ф). Let slb(ф) ≡ Σ д∈lbHфI уд − Са[ sls(ф) ≡ Σ д∈ls HфI ...... RT т'. Z. For и SL, this contradicts υ ≥. Q и1^α

Frames in formal semantics - Semantic Scholar
Labels (corresponding to attributes) in records allow us to access and keep ..... (20) a. Visa Up on Q1 Beat, Forecast; Mastercard Rises in Sympathy. By Tiernan ...