April 1, 2007
13:11
World Scientific Review Volume - 9in x 6in
Chapter 1 A new hybrid fusion method for diagnostic systems
A. Zemirline, L. Lecornu and B. Solaiman ITI Department ENST Bretagne 29285 Brest, France In this work, we present a new fusion method based on fuzzy set theory. This method consists of combining several data and knowledge bases of diagnostic systems. It is characterized by a hybrid fusion, which combines base fusion of data and knowledge of the Case-Based Reasoning diagnostic systems. The fusion method relies on a distortion measure of various diagnostic systems (of case and knowledge bases). This distortion measure is integrated into the diagnostic system in order to improve its performance. It is defined by confidence degrees associated to each parameter that contitutes the case and knowledge bases of diagnostic systems. The confidence degrees are then integrated into the diagnostic system procedure.
1.1. Introduction Nowadays, several institutions and organizations combine homogeneous data coming from different systems and/or produced at different instants. This situation is faced, in particular, by medical institutions, where a new set of data is to be stored regularly which is used, on the one hand, to extract new information and, on the other hand, to update the older versions of data. There are three main types of fusion methods which can be distinguished according to the conceptual level of information:1 data fusion, decision fusion and model fusion. • Data fusion is a fusion process operated on the first conceptual level of information. It consists of combining raw data resulting from several sources or various primitive levels extracted from only 1
ipmu4
April 1, 2007
13:11
2
World Scientific Review Volume - 9in x 6in
A. Zemirline, L. Lecornu and B. Solaiman
one source in order to obtain less noisy data. • Decision fusion is the solution of problem modeling applied to a specific data set. Several data sources and several types of processing can respectively provide a decision for the same problem. Thus, when several sensors observe the same scene or when several independent approaches make it possible to provide a solution then, the decision fusion consists of a comparison of solutions suggested by various systems in order to choose only the most realistic one or to combine these decisions in order to make a more reliable or considered decision. • Model fusion is a concept which combines data processing and artificial intelligence. This model characterizes and represents in a more or less complex way the knowledge that makes up the advanced systems. The model fusion either builds the new knowledge model or adopts a compromise of the precedents. There are some fusion procedures that combine data bases of systems2 and there are others which combine decisions of systems.3 However, no fusion procedure exists that combines the knowledge bases and/or data bases of several systems in order to obtain a more powerful system. In this work, we have several diagnostic systems depending on casebased reasoning and we want to combine them to obtain a better diagnostic system. The objectives of our fusion method are not limited to enriching the diagnostic system case base by increasing the number of cases, rather it also allows the diagnostic system to have more accurate and relevant results for the recognition of the new cases. This is possible by taking into account the distortion measure of various diagnostic systems (of case and knowledge bases). The fusion type that we propose in this article is the hybrid fusion that combines two fusion types: data fusion and knowledge fusion. This hybrid fusion consists of merging several case and knowledge bases of diagnostic systems to obtain only one case base and one knowledge base. In this work, we have tackled the regularly confronted problem of integrating new medical case bases into a diagnostic system. As an example, we take a diagnostic system which applies to medical bases. These medical bases are homogeneous and they contain descriptions of endoscopic lesions. However, each one of these bases has its own features to describe the lesions. This article is organized in the following way: in the second section, we describe the diagnostic system to which our method of fusion is applied.
ipmu4
April 1, 2007
13:11
World Scientific Review Volume - 9in x 6in
A new hybrid fusion method for diagnostic systems
ipmu4
3
In section three, we describe our fusion method. In the fourth section, we analyze the results obtained through our method. We finally conclude in the fifth section. 1.2. Diagnostic system The diagnostic system depends on case-based reasoning (CBR) and is made up of two bases (Figure 1.1): the internal knowledge base and the external knowledge base. Thanks to the internal knowledge, the diagnostic system predicts the classes corresponding to the new case and retrieves the most similar cases.4 This knowledge base is deduced from the case base. The external knowledge base contains the information describing the significance of parameters in the definition of different classes which should be recognized by the diagnostic system. Diagnostic System
Case Base
Internal knowledge External knowledge
Fig. 1.1.
Diagnostic system with the internal and external knowledge bases
1.2.1. The internal knowledge base: This is a knowledge base designed from the case base; no data or external information at the base is introduced for the definition of this knowledge base. The diagnostic systems relies on the internal knowledge base for the classification of the cases and the measure of similarity of the two cases. This knowledge base is made up of membership functions for each class. The procedure for setting up the external knowledge base comprises of 3 steps: the first step brings together the cases of the diagnostic case base according to their class, then it generates a group of cases for each class. The second measures the frequency of the parameters which constitute the cases for each group (i.e. for each class). In the last step, for each class, the
April 1, 2007
13:11
World Scientific Review Volume - 9in x 6in
4
A. Zemirline, L. Lecornu and B. Solaiman
membersip functions are generated. These membership functions attribute a membership degree of the specific class to a parameter. The membership functions to a specific class are built by a histogram of the normalized frequencies method:5 • B = {X1 , · · · , Xn } is a set of n cases. • Ω = {ω1 , · · · , ωm } is a set of m parameters which constitute the cases of B. • C = {c1 , · · · , cp } is a set of p class labels which also constitute the cases of B. • A case X ∈ B is the k + 1 − tuple and its representation is X = {x1 , · · · , xk , c} where xi ∈ {xi1 , · · · , xil } or xi ∈ ∅, xij ∈ Ω and c ∈ C. • A query case X ′ ∈ B is the k − tuple and its representation is X ′ = {x1 , · · · , xk } where xi ∈ {xi1 , · · · , xil } or xi ∈ ∅ and xij ∈ Ω. • fci is the function which gives the standardized frequency of the ωj to the label class ci : fci (ωj ) =
gci (ωj ) maxωk ∈Ω (gci (ωk ))
where: gci (ωj ) =
|{X ∈ B|ci ∈ X, ωj ∈ X}| |{X ∈ B|ci ∈ X}|
Using this membership function, we can calculate the membership degree of a case to a specific class by using the compromise operator such as the geometric mean: qQ |X ′ | ωj ∈X ′ µci (ωj ) qQ (1.1) Degci (X ′ ) = P |X ′ | µ (ω ) ′ c j k ωj ∈X ck ∈C where :
µci : Ω → [0, 1] P ωj → fci (ωj )/ ck ∈C (fck (ωj ))
These functions of membership are used to measure the similarity between two cases. The measure of similarity is made up of diagnostic systems which use case base reasoning. The similarity measure gives the similarity degree between the query case X ′ and the case X. The latter takes into
ipmu4
April 1, 2007
13:11
World Scientific Review Volume - 9in x 6in
A new hybrid fusion method for diagnostic systems
ipmu4
5
account the common parameters and their degree memberships to the class ci such as ci ∈ X. P µA (ωk ) ′ simA (X ′ , X) = Pωk ∈X ∩X ωk ∈X ′ ∪X µA (ωk )
(1.2)
1.2.2. The external knowledge base The external knowledge base is generated from the diagnostic system case base. It is designed to be a fusion interface to a diagnostic system and is used to estimate the divergence of different sources of a diagnostic system. The external knowledge base attributes a linguistic value of uncertainty to each parameter that constitutes the case base in order to define the importance of these items in the characterization of different classes of the diagnostic system case base. The linguistic characterization is, in general, less specific than the numerical one and is more significant. The procedure for setting up the external knowledge base is made up of 3 steps: the first step brings together the parameters according to their class and then generates a group of parameters for each class. The second step measures the frequency of the parameters which constitute the cases for each group (i.e. for each class). In the last step, for each class, the membership functions for several frequency terms are generated. These membership functions attribute a degree of membership of the specific frequency term to each parameter. For each parameter, this step assigns a frequency term to each class. To define these membership functions, we denote the linguistic variable which is characterized by a quintuple:6 (x, T (x), U, G, M ) • x is the name of the linguistic variable. In our application x is equal to the linguistic variable frequency. • T (x) is the set of terms associated with the linguistic value, in which the frequency is represented according to the following set {Never, Exceptional, Rare, Usual, Frequent, Very Frequent, Always}. • U is the universe of discourse and U = {∀ωj ∈ Ω|fci (wj )}. The terms of T (x) are characterized by fuzzy subsets defined by the following functions of membership:7 • K: is the set of centroids of fuzzy sets obtained by the algorithm of fuzzy c-means (FCM)8 which is applied to U such as
April 1, 2007
13:11
6
World Scientific Review Volume - 9in x 6in
A. Zemirline, L. Lecornu and B. Solaiman
K = {0, · · · , Ci−1 , Ci , Ci+1 , · · · , 1}. • µci ,α : corresponds to µci ,α (ωj ) = νci ,α (fci (ωj )). • νi,α : corresponds to the membership function in the linguistic term α. It is built from a set of instance frequencies which belong to the class label i. 1 − (fci (ωj ) − Ci )/(Ci − Ci−1 ) if Ci−1 < fci (ωj ) ≤ Ci νci ,α (fci (ωj )) = 1 − (fci (ωj ) − Ci )/(Ci+1 − Ci ) if Ci ≤ fci (ωj ) < Ci+1 0 otherwise. 1.3. Fusion method Our fusion method involves comparing the importance of parameter characterization of the classes in each diagnostic system. Thereafter, the new knowledge base is produced by merging the external knowledge bases of each diagnostic system. This new knowledge base is called confidence knowledge base and it results from the disparity measure between the various external knowledge bases. It is constituted by attributing a confidence degree to each parameter. Then, the diagnostic system combines this information with the information extracted from the internal knowledge base. This procedure is represented in Figure 1.2 Data fusion
Diagnostic System A
CB A
Internal knowledge External knowledge
Diagnostic System A&B CB A & B
Internal knowledge Confidence knowledge
Diagnostic System B
CB B
Internal knowledge External knowledge
Knowledge Fusion
Fig. 1.2.
Hybrid fusion integrated in a diagnostic system
ipmu4
April 1, 2007
13:11
World Scientific Review Volume - 9in x 6in
A new hybrid fusion method for diagnostic systems
ipmu4
7
Table 1.1. Appearance frequency of the parameter ω in the representation of a class c for Source A and B. Source A Source B parameter ω “very-frequent” “exceptional”
Before passing on to the presentation of the fusion method, we illustrate an example of divergence. We take a parameter ω which is contained in source A (diagnostic system A). It has a high frequency of appearance in the representation of class c which is expressed by a strong degree of membership of the “very-frequent” subset for this class (Tab. 1.1). In source B (diagnostic system B), the same parameter has a very low frequency of appearance in the representation of the same previously mentioned class which is expressed by a strong degree of membership of the “exceptional” subset for class c (Tab. 1.1). We note a great disparity in the consideration of a parameter in the definition of the same class of two distinct sources. This term can be regarded as ambiguous in the description class c. In order to measure the distortion of k bases, we define a global measurement operator of conflict between p sources. This operator applies to p External Knowledge Bases (EKB) deduced from p sources and to another knowledge base which is deduced from the coupling of the aforementioned p external knowledge bases. This external knowledge base is called F for simplification and the p external knowledge bases are henceforth called p knowledge bases. Our operator works in the following way: for parameter t and a class c, it recovers the uncertainty term α from F of which the parameter t has the greatest degree of membership for class c. The linguistic term α is taken as a reference mark for the calculation of disparity of p knowledge bases. Then for each p knowledge base, the operator calculates the membership degree of parameter t to the linguistic term α. Thereafter, we keep the lowest value of the degrees of membership obtained from p knowledge bases. In this context, the goal is to find a knowledge base in the p knowledge bases which gives the representation level of parameter t for class c which is the farthest one from the average of p knowledge bases. Then we deduce the degree of reliability (degree of confidence) of a parameter after the fusion of p knowledge bases. The whole of the above-mentioned procedure is followed by the measurement of conflict of p sources (p external knowledge bases of diagnostic system). This measurement is inspired by the method proposed by Dubois
April 1, 2007
13:11
World Scientific Review Volume - 9in x 6in
8
A. Zemirline, L. Lecornu and B. Solaiman
and Prade.9 In this work, a conflict is defined as the distance that separates the classification of a parameter between the new external knowledge base F (corresponds to the external knowledge base deduced from p external knowledge bases) and p external knowledge bases. In the following sections, we present how the new external knowledge base is set up (§1.3.1). Thereafter, we define how the referent linguistic term is selected (§1.3.2). Then we define the conflict operator (§1.3.3) and finally we present the measurement of confidence (§1.3.4). 1.3.1. Definition of the new external knowledge base which is deduced from the p source coupling: • B = {B1 , · · · , Bp } is set of p case bases. • Bi = {X1i , · · · , Xni } is the case base of n cases. • Ω = {ω1 , · · · , ωm } is a set of m parameters which constitute the cases of Bi ∈ B. • C = {c1 , · · · , cl } is a set of l class labels which also constitute the cases of Bi ∈ B. • A case Xji ∈ Bi is the k + 1 − tuple and its representation is Xji = {x1 , · · · , xk , c} where xi ∈ {xi1 , · · · , xil } or xi ∈ ∅, xij ∈ Ω and x ∈ C. • fci is the function that gives the standardized frequency of ωj to label class ci in all case bases of B: fci (ωj ) =
gci (ωj ) maxωk ∈Ω (gci (ωk ))
where: gci (ωj ) =
|{X∈B1 | ci ∈X and ωj ∈X}| |{X∈B1 | ci ∈X}|
+···+ |{X∈Bl | ci ∈X and ωj ∈X}| |{X∈Bl | ci ∈X}|
The membership functions of this new external knowledge base have the same definition as given in section 1.2.2. Nevertheless, some notations are modified in order to integrate the notion of multisource. • νcl i ,α : represents the function of membership in the linguistic term α for class ci of the knowledge base l. • µlci ,α : corresponds to µlci ,α (ωj ) = νcl i ,α (fci (ωj )). • F : corresponds to the knowledge base deduced from p knowledge bases.
ipmu4
April 1, 2007
13:11
World Scientific Review Volume - 9in x 6in
A new hybrid fusion method for diagnostic systems
ipmu4
9
1.3.2. Selecting the referent linguistic term As we defined previously, the linguistic term α is considered as a reference mark of base F to measure the disparity between p bases. The linguistic term of frequency α for the parameter ωj in class ci is selected as referent if µfci ,α (ωk ) presents the greatest value such as: F µF ci ,α (ωk ) = maxj∈T (x) (µci ,j (ωk ))
1.3.3. Operator of the conflict measure: In p bases, the operator of conflict seeks the lowest value of membership degrees of ωj to the linguistic reference α for class ci . It is based on the T-norm I: 1 p hα ci (ωj ) = I(µci ,α (ωj ), · · · , µci ,α (ωj ))
(1.3)
We note that this operator is strict, i.e, if one base of p bases has the value of membership of ωj to the linguistic referent α equal to zero, we can induce that the value given by operator of conflict is also equal to zero even if this base has the membership value of ωj to the linguistic neighbor at the linguistic referent α which is different from zero. In certain cases, it is necessary to be less strict so we modified the conflict operator in order to integrate the tolerance parameter that is parametrized according to our needs. 1 hα ci (ωj ) = I(maxα−d≤k≤α+d and k∈[0,|T (x)|−1] (µci ,k (ωj )), ··· , maxα−d≤k≤α+d and k∈[0,|T (x)|−1] (µpci ,k (ωj )))
• d is the tolerance index. It is used by the conflict operator in order to take into account the membership of ωj to the referent linguistic term α and also the d linguistic terms close to the referent linguistics term. d is an integer and ∈ [0, |T (x)| − 1]. • α − d indicates the linguistic term that is at position d on the left of the referent linguistic term α. Example: if α is equivalent to the linguistic frequent and T (x) ={never, exceptional, habitual, very frequent, always } then α − 2 indicates the linguistic term “rare”. • α + d indicates the linguistic that is at position d on the right of the referent linguistic term α.
April 1, 2007
13:11
10
World Scientific Review Volume - 9in x 6in
A. Zemirline, L. Lecornu and B. Solaiman
1.3.4. Confidence measure: We define a confidence measurement function of all parameters in the new base: 1 if hα ci (ωj ) ∈ [0, ε] µconf,ci (ωj ) = α α −hci (ωj )/(1 − ε) if hci (ωj ) ∈ [ε, 1[ where : ε ∈]0, 1[, which is used as a threshold in order to estimate the confidence of a parameter from the conflict value. µconf,ci (ωj ) considers the agreement measurement and is completely reliable if the agreement degree is higher than a certain threshold, i.e. the parameter is reliable if its appearance frequencies in the various bases belong to the same linguistic term or a linguistic term close to the referent linguistic which is obtained from F . The integration of the index of confidence into the calculation of the degree of membership makes it possible to take into account certain parameters whose index of confidence is higher than a certain value: qQ 2|X ′ | ωj ∈X ′ µconf,ci (wj )µci (wj ) qQ (1.4) Degci (X ′ ) = P 2|X ′ | ωj ∈X ′ µconf,ci (wj )µci (wj ) ck ∈C
The integration of the index of confidence into the calculation of the similarity of cases : p P µconf,ci (ωk )µci (ωk ) ωk ∈X ′ ∩X ′ p (1.5) simci (X , X) = P µconf,ci (ωk )µci (ωk ) ωk ∈X ′ ∪X 1.4. Evaluation
In this section, we evaluate our knowledge fusion approach on 7 databases from an endoscopic image analysis system10 . This system is an assistant system for decision-making of the endoscopic lesion diagnosis. These bases are made up of endoscopic image description via symbolic terms which are defined by the minimal standard terminology of the SEGE (European Company of Gastro-Enterology). A case in a base represents a description (a set of parameters) of an endoscopic lesion. In all bases, there are 206 parameters and 89 types of lesions, (i.e. 89 label classes). Figure 1.3 gives the results of three types of fusion methods applied to a diagnostic system:
ipmu4
13:11
World Scientific Review Volume - 9in x 6in
A new hybrid fusion method for diagnostic systems
ipmu4
11
• Data fusion method: It is a very simple grouping of data from distinct sources. • Decision fusion Method: This method combines the results of the diagnostic system applied to various sources and takes the result having the greatest reliability. • Hybrid method: This is our method applied to a diagnostic system which takes into account the degree of confidence calculated through our fusion method. 90
80
data fusion decision fusion hybrid fusion
70
60 Prediction rate (%)
April 1, 2007
50
40
30
20
10
0 1000
Fig. 1.3.
2000
3000
4000 Case number
5000
6000
7000
Results of accurate estimates of various case bases.
The test that we carried out to assess the diagnostic systems, which are based on various fusion methods, consists of amalgamating 1000 cases at the starting base at each stage. Our method is the one that presents the best estimates. We note that at the beginning, the three methods present almost the same estimates but immediately after the first fusion, the difference between the estimates increases. In fact, all estimation rates increase and our method gives the highest estimation rate. The progression of the estimation rates slows down during the sixth stage. However, our method remains unaffected. 1.5. Conclusion In this article, we suggest a new fusion method which is applied to the case-based reasoning diagnostic system. This fusion method combines two fusion types: the knowledge base fusion and the data base fusion. The data base fusion merges the case bases of the several diagnostic systems.
April 1, 2007
13:11
12
World Scientific Review Volume - 9in x 6in
A. Zemirline, L. Lecornu and B. Solaiman
The knowledge bases refer to the diagnostic system knowledge base and are used in order to measure the conflict of the diagnostic system case bases. Thereafter, the conflict measure is integrated into the new diagnostic system processes. We have presented experimental results, showing that the proposed method always outperforms the decision fusion method and the data fusion method. The performance gap increases with the problem size. Our fusion method can be extended to measure the distortion of distributed homogeneous systems before they are brought together into one. Then, only the system with the lowest distorsion are brought together. Our method is original as it presents the possibility of its application to any case base.
References 1. B. Solaiman and B. Dasarathy. Information fusion concepts. from information elements definition to the application of fusion approaches. In SPIE proceedings series (SPIE proc. ser.) International Society for Optical Engineering proceedings series, vol. 385, pp. 205–212, (2001). 2. G. Saporta. Data fusion and data grafting. In International meeting on Nonlinear Methods and Data Mining, vol. 38, (2000). 3. D. Dubois, M. Grabisch, H. Prade, and P. Smets, Using the transferable belief model and a qualitative possibility theory approach on an illustrative example: the assessment of the value of a candidate, International Journal of Intelligent Systems. 16(11), 1245–1272 (novembre, 2001). BB. 4. A. Zemirline, L. Lecornu, and B. Solaiman. Data mining system applied in endoscopic image base. In Proceedings of ICTTA’06: International Conference on Information and Communication Technologies: from Theory to Applications, (2006). 5. S. Chauvin. Evaluation des th´eories de la d´ecision appliqu´ees ` a la fusion de capteurs en image satellitaire. PhD thesis, Th`ese de Doctorat d’Universit´e Nantes, (1995). 6. L. Zadeh, The concept of a linguistic variable and its application to approximate reasoning - ii, Information Sciences (Part 2). 8(4), 301–357, (1975). 7. L. Zadeh, Fuzzy sets, Informations and control. 8, 338–353, (1965). 8. J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms (Advanced Applications in Pattern Recognition). (Springer, July 1981). URL http://www.amazon.co.uk/exec/obidos/ASIN/0306406713/citeulike-21. 9. D. Dubois and H. Prade, Combination of fuzzy information in the framework of possibility theory, In Data Fusion in Robotic and Machine Intelligence. pp. 481–505, (1992). 10. J. Cauvin, C. L. Guillou, B. Solaiman, M. Robaszkiewicz, P. L. Beux, and
ipmu4
April 1, 2007
13:11
World Scientific Review Volume - 9in x 6in
A new hybrid fusion method for diagnostic systems
ipmu4
13
C.Roux, Computer-assisted diagnosis system in digestive endoscopy, IEEE Transactions on Information Technology in Biomedicine. 7, 256–262, (2003).