Knowledge-based parameter identification of TSK fuzzy ...

Viewer
Transcript

Applied Soft Computing 10 (2010) 481–489

Contents lists available at ScienceDirect

Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc

Knowledge-based parameter identiﬁcation of TSK fuzzy models Ashutosh Tewari *, Mirna-Urquidi Macdonald Department of Engineering Science and Mechanics, The Pennsylvania State University, 202 EES Building, University Park, PA 16802, United States

A R T I C L E I N F O

A B S T R A C T

Article history: Received 7 February 2008 Received in revised form 16 May 2009 Accepted 13 August 2009 Available online 3 September 2009

Linear/1st order Takagi–Sugeno–Kang (TSK) fuzzy models are widely used to identify static nonlinear systems from a set of input–output pairs. The synergetic integration of TSK fuzzy models with artiﬁcial neural networks (ANN) has led to the emergence of hybrid neuro-fuzzy models that can have excellent adaptability and interpretability at the same time. One drawback of these hybrid models is that they tend to have more black-box characteristics of ANN than the transparency of fuzzy systems. If the quality of training data is questionable then it may lead to a fuzzy model with poor interpretability. In an attempt to remediate this problem, we propose a parameter identiﬁcation technique for TSK models that relies on a-priori available qualitative domain knowledge. The technique is devised for rule-centered TSK models in which the consequent polynomial can be interpreted as the 1st order Taylor series approximation of the underlying nonlinear function that is being modeled. The resulting neuro-fuzzy model is named as apriori knowledge-based fuzzy model (APKFM). We have shown that besides being reasonably accurate, APKFM has excellent interpretability and extrapolation capability. The effectiveness of APKFM is shown using two examples of static systems. In the ﬁrst example, a toy nonlinear function is chosen for approximation by an APKFM. In the second example, a real world problem pertaining to the maintenance cost estimation of electricity distribution networks is addressed. ß 2009 Elsevier B.V. All rights reserved.

Keywords: Neuro-fuzzy systems TSK fuzzy model Parameter identiﬁcation Nonlinear system identiﬁcation Knowledge incorporation Hybrid intelligent systems

1. Introduction Adaptive fuzzy models can be successfully used to approximate nonlinear systems wherein the underlying physical processes are too complex to have a physical model [1–3]. In a fuzzy model, the nonlinear input to output mapping is achieved by quantifying a domain expert’s knowledge in the form of a parametric model. In terms of knowledge quantiﬁcation, two conceptually different fuzzy models exists, namely the Mamdami model and the Takagi– Sugeno–Kang (TSK) model [4]. Over the years, researchers and practitioners have preferred TSK fuzzy model owing to the fact that its output has an explicit functional expression form that makes the identiﬁcation of TSK models from historical process data far less computationally intensive compared to the Mamdami models. The identiﬁcation of a TSK model involves ﬁnding optimal values of two disparate parameter sets viz. (1) premise parameters and (2) consequent parameters. In fuzzy modeling, the space spanned by the input variables is typically partitioned into several overlapping multivariate fuzzy sets. The premise parameters deﬁne the characteristics of these fuzzy sets such as their shapes and spread. The values of these parameters can either be assigned

* Corresponding author. E-mail addresses: [email protected] (A. Tewari), [email protected] (M.-U. Macdonald). 1568-4946/$ – see front matter ß 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2009.08.034

by a domain expert or obtained from historical data using some learning algorithm [5–8]. On the other side, the consequent parameters consist of the coefﬁcients of local linear models associated with each fuzzy set. It is generally difﬁcult for an expert to quantify consequent parameters based on his domain knowledge. Therefore, the estimation of these parameters is usually datadriven and the linear least square estimate (LSE) technique is commonly used for this purpose [6]. A data-driven fuzzy model can also be interpreted as a hybrid model resulting from the merger of artiﬁcial neural networks (ANN) with fuzzy logic. In the hybrid model framework, a TSK model can be viewed as a feed-forward neural network with two hidden layer. Each multivariate fuzzy set assumes the role of a neuron in the ﬁrst hidden layer, hence, the parameters associated with them can be regarded as the hidden layer weights. Similarly, the consequent parameters constitute the weights of the second hidden layer. All the weights are initialized based on the domain knowledge and subsequently ﬁne tuned using some network learning algorithm. Although the integration of ANN with fuzzy logic has proven to be quiet useful, there is also a noticeable downside of such hybrid models. Since, the parameter learning becomes entirely datadriven, it imposes stringent requirements on the quality of training dataset. If the training dataset is insufﬁciently sized or has low signal to noise ratio then the trained model behaves erratically in never-seen input conditions and becomes uninterpretable [9–11]. In other words, the trained fuzzy model loses it consistency with

482

A. Tewari, M.-U. Macdonald / Applied Soft Computing 10 (2010) 481–489

Nomenclature X B R F J C BðrÞ mro mri mri =mriþ

n 1 input vector ½ x1 x2 . . . xn T a set containing indices of all the rules present in the rule-base total number of rules in the rule-base of a fuzzy model/size of set B a subset of B containing of indices of favorable rules total number of favorable rules/size of set F T center of rth rule ½ c1BðrÞ c2BðrÞ . . . cnBðrÞ constant term present in the local linear model of the rth rule coefﬁcients of the ith linear term present in the local linear model of the rth rule the two possible values of mri depending on the location of input with respect to the rule center

Greek letters Frj Gaussian function corresponding to the jth favorable rule j parameters of the Gaussian function (Fr ) deﬁned a j =s ij at jth favorable rule g rj degree of closeness of rth rule center to the center of the jth favorable rule lr normalized ﬁring strength of rth rule of the rulebase domain knowledge and behaves like a black-box. This shortcoming led to the development of various parameter estimation techniques aiming to safeguard the interpretability of hybrid fuzzy models. Yen et al. [12] proposed a combination of global and local identiﬁcation of consequent parameters in order to have a good balance between accuracy and interpretability. Fiordaliso [13,14] proposed an approach for interpretability preservation by subjecting the consequent parameters to certain constraints. In another approach, Bikdash [15] proposed the use of spline-based fuzzy membership functions and rule-centered linear models that can be treated as 1st order Taylor series expansion about the rule centers. This method improved both the interpretability and the estimation capability of fuzzy rules. Abonyi et al. [16,17] and Tien and van Straten [18] proposed techniques wherein the a-priori knowledge is incorporated in TSK fuzzy models by introducing a set of parameters constraints/penalty functions. In this paper, a technique is proposed that integrates the qualitative domain knowledge with the parameter estimation step of TSK models. Quiet often a domain expert is capable of identifying a few regions in the input space where the system output is expected to more than the others. The core idea is to incorporate such type of qualitative information with the learning process of TSK models in an algorithmic framework. The primary motivation is to make the data-driven parameter estimation step robust to low quality training datasets. In other words, the proposed technique would enable a fuzzy model to remain consistent with the domain knowledge even when the training dataset is either insufﬁciently sized and/or has low signal to noise ratio. The proposed fuzzy model will be referred as a-priori knowledge-based fuzzy model (APKFM) in rest of the paper. The APKFMs are built on the framework of rule-centered TSK fuzzy models [15,19]. The advantage of rule-centered fuzzy models, in terms of improved model interpretability, is discussed in Section 2. Section 3, outlines the mathematical formulation of APKFM and describes an efﬁcient scheme to learn its parameters from

historical data. The performance of APKFM is evaluated in Section 4 with two examples of static nonlinear function approximation. 2. Rule-centered TSK fuzzy models In a conventional TSK fuzzy model, the underlying nonlinear function is approximated by several local linear models that are uniquely deﬁned for every rule in the rule-base. Every rule is characterized by a multivariate fuzzy set that deﬁnes a validity region where the corresponding local linear model is applicable. Consider a multiple-input-single-output (MISO) TSK model with total R rules. The rth rule can be represented as IF x1 is Ar1 AND x2 is Ar2 . . . AND xn is Arn ˆ THEN fðXÞ ¼ kro þ kr1 x1 þ kr2 x2 þ þ krn xn

(1)

where kro ; kr1 , etc. are the coefﬁcients of the local linear model deﬁned for the rth rule. X ¼ ½ x1 x2 . . . xn T represents an input vector with n dimensions. Ari represents the linguistic label (medium, small, large, etc.) assigned to the input variable xi. ˆ Lastly, fðXÞ is the linear approximation of the unknown nonlinear function f(X) which is being identiﬁed using the TSK model. Alternatively, the rth rule can also be written in the rule-centered form as shown below. IF x1 is Ar1 AND x2 is Ar2 . . . AND xn is Arn BðrÞ BðrÞ BðrÞ ˆ THEN fðXÞ ¼ mro þ mr1 ðx1 c1 Þ þ mr2 ðx2 c2 Þ þ þ mrn ðxn cn Þ

(2) BðrÞ

BðrÞ T cn

where C ¼ ½ c1BðrÞ c2BðrÞ . . . is a vector representing the rth rule center having the same dimension as the input vector X. B is a set that contains indices of all the R rules, i.e. B ¼ f1; 2; . . . ; Rg, thus, Bfrg ¼ r. The center of a rule is simply the geometric center of the corresponding multivariate fuzzy set. The coefﬁcients, kri , of local linear model shown in (1) have a straight forward relationship with the coefﬁcients, mri , of rule-centered linear models in (2), i.e. 9 kri ¼ mri > = n X 8 i 2 f1; 2; . . . ; ng (3) r r r BðrÞ ko ¼ mo mi ci > ; i¼1

The advantage of rule-centered linear models is that its coefﬁcients ðmri Þ can be interpreted as the coefﬁcients of the 1st order Taylor series approximation of the underlying function about the rule center, C BðrÞ , i.e. d f d f BðrÞ BðrÞ ˆ c Þ þ ðx ðx2 c2 Þ fðXÞ ¼ f ðC BðrÞ Þ þ 1 1 dx1 X¼C BðrÞ dx2 X¼C BðrÞ d f BðrÞ þ þ (4) ðxn cn Þ dxn X¼C BðrÞ If we compare the rule consequent in (2) with the Taylor series expansion in (4), we come up with the following equalities. 9 mro ¼ f ðC BðrÞ Þ = d f ; mri ¼ dxi X¼C BðrÞ

8 i 2 f1; 2; . . . ; ng

(5)

Therefore, if we have some a-priori knowledge about the values of underlying function f(x) and its gradient (df/dxi) at various rule centers, then a technique can be devised that can use this knowledge during the estimation of consequent parameters. The prerequisite from such a technique is twofolds, i.e. (1) It should be able to utilize the type of qualitative knowledge that a domain expert usually possesses. (2) It should be easy to implement without being computationally intensive. Section 3 presents the

A. Tewari, M.-U. Macdonald / Applied Soft Computing 10 (2010) 481–489

formulation of APKFM and highlights how does it satisfy the above mentioned criteria. 3. A-priori knowledge-based fuzzy model (APKFM) In this section the mathematical formulation of 0th and 1st order APKFM is presented. In Section 3.1, we show how a 0th order APKFM can be constructed using an expert’s qualitative domain knowledge. This idea is extended to 1st order APKFM in Section 3.2. Finally, in Section 3.3 the parameter identiﬁcation scheme of APKFM is discussed. It should be noted that the issue of structure identiﬁcation of APKFM is not addressed in this paper. The input space is assumed to be grid partitioned into R rules. 3.1. 0th order APKFM The qualitative knowledge that is intended to be used is assumed to be available in the form of certain known regions of the input space where the model output is expected to be higher. Quiet often it possible to identify favorable regions of the input space based on the domain knowledge (refer the example in Section 4.2). Since, the rule-base is obtained by grid-partitioning the input space, the rules associated with favorable regions can be easily identiﬁed. Such rules are termed as favorable rules. Let there be J favorable rules out of total R rules. We can deﬁne a set FðF BÞ which contains the indices of all the favorable rules. Lets deﬁne a Gaussian basis function F (with parameter a and s) at the center of each favorable rule, i.e.

Frj ¼ a j e

Pn

i¼1

BðrÞ

ððci

Fð jÞ

ci

Þ=s

j 2 Þ i

;

8 j2F

(6)

These Gaussian functions are used to mimic the available knowledge about the favorable rules. If a point in the input space moves from a rule j 2 F to any other rule, r 2 = F, then the model output should decrease according to the deﬁnition of a favorable rule. Clearly, this type of model behavior is captured by Eq. (6), hence the choice of a Gaussian to represent domain knowledge is justiﬁed. Since there can be several favorable rules (depending of the size of set F), the model output at the rth rule can be represented as the weighted geometric mean of the individual Gaussians, i.e. mro ¼

J Y

j j gr

ðFr Þ

(7)

j¼1

For a given input X the output, y, of the 0th order APKFM can be obtained as follows. The input X can be fed to the APKFM to obtain the normalized ﬁring strength (lr) of each rule. These ﬁring strength’s are nothing but the normalized degree of memberships of X to the multivariate fuzzy sets deﬁned by each rule. Thereafter, the output of the 0th order APKFM can be computed using the following equation: y¼

R Y l ðmr0 Þ r

(9)

r¼1

There is a stark difference between Eq. (9) and the conventional way of computing the output of a TSK fuzzy model. In TSK models, the output is computed by taking the weighted arithmetic mean of P the singletons of the ﬁred rules, i.e. y ¼ Ri¼1 lr mro . On the contrary, the output deﬁned in Eq. (9) is the weighted geometric mean of rule consequents (mro ). The rationale of deﬁning the output in this way is that it renders the model output intrinsically linear in terms of parameters aj and s ij . This idea will become clearer in Section 3.3 where we discuss the estimation of these parameters. Eqs. (6), (7) and (9) can be combined to get the following expression for the output of 0th order APKFM. 0 1lr J R g rj Pn Y Y BðrÞ Fð jÞ j 2 ððc c Þ= s Þ j @ ða e i i i i¼1 y¼ Þ A r¼1

(10)

j¼1

It is interesting to note that when the prior domain knowledge is not available, the 0th order APKFM reduces to a conventional 0th order TSK model. Consider a case when we have no information about the existence of favorable input regions. This scenario can be considered as a special case wherein all rules become favorable, i.e. F ¼ B, hence, Eq. (7) reduces to: r

mro ¼ fr because; g rj ¼

1 if j ¼ r 0 if j 6¼ r r

By Eq. (6) we have, fr ¼ ar . Thus, mro ¼ ar , which is equivalent to having a conventional Singleton/0th order TSK model with ar being the singletons. Therefore, the 0th order TSK model can be considered as the limiting case of 0th order APKFM in the event of unavailability of any domain knowledge. In the next section, the formulation of 1st order APKFM is presented. 3.2. Linear/1st order APKFM

where mro is the singleton/consequent parameter of the rth rule of the 0th order APKFM. The symbol g rj represents the weight that signiﬁes the degree of closeness of the rth rule center to the center of jth favorable rule. This weight can be computed using the following formula:

g rj ¼ PJ

483

1

ðDr j =Dri Þ i¼1

(8)

where Drj is the Euclidian distance of the rth rule center to the jth rule center. Clearly, as the rth rule center approaches the jth rule center, g rj approaches 1. In short, the idea to deﬁne the singleton value ðmro Þ of the rth rule as the mixture of Gaussian representations of favorable rules. Such a deﬁnition would capture the domain knowledge in the sense that as a point in the input space moves away from any of the favorable rules, the system output should decrease. The Gaussians have two sets of parameters viz. aj and s ij that characterize these basis functions. For example, s ij governs how fast the model output decays as we move away from jth favorable rule along the ith dimension.

In a 1st order TSK model, the consequent of a rule has n linear parameters ðkr1 ; kr2 ; . . . ; krn Þ as shown in Eq. (1). However, the way we deﬁne a 1st order APKFM, the linear parameters of a rulecentered consequent, mri , can be twice in number. Depending on the number of nearest neighbors of a rule, mri can assume two values viz. mri =mriþ . For illustration, refer to Fig. 1, which shows the input space of a system with two inputs x1 and x2. The input space is partitioned into 9 grid with their centers located on the black dots. Rule number 5 (the centermost rule) has four nearest neighbors. Consequently there are four linear parameters viz. m51 ; m51þ ; m52 ; m52þ associated with it. The position of input decides whether the linear parameter mri is going to assume the value of mri or mriþ . This point is illustrated in Fig. 1 that shows two possible location of an input. When the input corresponds to the location ‘a’ then m51 takes the value of m51þ and m52 of m52þ . On the other hand, if the input is at location ‘b’ then m51 assumes the value of m51 and m52 of m52þ . It can be veriﬁed, visually, from Fig. 1 that there are 24 linear parameters in this system. For a generalized system with n dimensions, the total number of linear parameters can be obtained from the

A. Tewari, M.-U. Macdonald / Applied Soft Computing 10 (2010) 481–489

484

Careful examination of the output of 0th order APKFM (Eq. (10)) reveals that there are two places where the weighted geometric mean is used. Taking logarithm of both side of this equation results in the following equation: 8 0 12 9 BðrÞ Fð jÞ n < = X ci ci j A logðyÞ ¼ lr g logða Þ @ 2 : ; r¼1 j¼1 i¼1 ðs ij Þ R X

J X

j r

(14)

Rearranging Eq. (14) gives the expression in the following equation: J R X X

!

lr g rj logða j Þ

j¼1

r¼1

( J n X R X X

lr g rj ðciBðrÞ ciFð jÞ Þ

i¼1 j¼1

r¼1

2

)

1

¼ logðyÞ

Fig. 1. A two-dimensional input space with 9 rules and 24 local linear models. The black dots signify the locations of rule centers.

following equation: n X n ðn þ iÞ 2ni i

(11)

i¼0

The above equation is derived by considering the input space as a n dimensional hyper cube, wherein each dimension is partitioned using three univariate fuzzy sets resulting in an input space partitioned into 3n grids. Following the above deﬁnition of the 1st order APKFM, for a given input, X, the output, y, can be computed as ( ) R n X X y¼ lr mro þ ðari mriþ þ ð1 ari Þmri Þðxi ciBðrÞ Þ (12) r¼1

i¼1

where ari assumes a value of either 0 or 1 according to the following condition. ) ari ¼ 1 if xi ciBðrÞ > 0 8 r 2 B; 8 i 2 f1; 2; . . . ; ng (13) ari ¼ 0 if xi ciBðrÞ < 0 Apparently a 1st order APKFM has higher complexity in term of number of parameters compared to a 1st order TSK model. Therefore, in the event of training with relatively small sized historical dataset, one would expect APKFM to perform poorly compared to a TSK model. However, it shown in Section 3.3 that we can come up with reasonable prior estimates of parameters mri =mriþ that can be easily incorporated as hard constraints during the estimation process. These constraints reduce the search space and ensure that the estimated parameters are compliant with the domain knowledge. 3.3. Parameter identiﬁcation of APKFM The proposed parameter identiﬁcation technique is three-way staggered in the sense that the parameters are grouped into 3 classes and are estimated sequentially, i.e. one class after another. In the ﬁrst step, the singleton values, mro , of 0th order APKFM are estimated. In step two, the linear parameters, mriþ =mri , of 1st order APKFM are estimated based on the knowledge of mro values obtained in the ﬁrst step. The third step identiﬁes the premise parameters, while keeping the consequent parameters ﬁxed at the value obtained in the ﬁrst two steps. Detailed discussions on these steps are provided as follows. Step 1: Identiﬁcation of 0th order APKFM: In this step, consequent parameters of the 0th order APKFM are identiﬁed.

2

ðs ij Þ

(15)

Clearly, Eq. (15) is linear in unknown parameters log(aj) (J in 2 number) and 1=ðs ij Þ (n J in number). Hence, total number of unknown consequent parameters of the 0th order APKFM is equal to J(n + 1). If M training data-points are available then, using Eq. (15), we can come up with M simultaneous linear equations that can be represented as AY = b. Where A is M J(n + 1) coefﬁcient matrix, Y is J(n + 1) 1 parameter vector and b is M 1 target output vector. Since, the range of parameters can be taken as aj 2 [0 1] and 2 s ij 2 ð0; 1Þ, log(aj) and 1=ðs ij Þ can be bounded as follows. ) 1 < logða j Þ 0 (16) 2 0 < 1=ðs ij Þ < 1 We can easily obtain the optimal solution of the given linear system with bound constraints using quadratic programming. The optimization problem can be posed as: T arg min T : Y T ðAT AÞY ð2b AÞ Y Y

(17)

If the matrix ATA is nonsingular then a unique solution exists for the above objective function. A variety of methods are available to perform quadratic minimization with bound constraints [20]. Once we have the optimum values of aj and s ij , we can use Eqs. (6) and (7) to determine mro values of all the R rules. Step 2: Identiﬁcation of 1st order APKFM: In this step we identify the linear parameters mriþ =mri . For a given input X, the output of 1st order APKFM is given by Eq. (12). Since, mro values have already been determined in the step 1, Eq. (12) can be rearranged in the following manner. n X R X

R X

i¼1 r¼1

r¼1

lr ðari mriþ þ ð1 ari Þmri Þðxi ciBðrÞ Þ ¼ y

lr mro

(18)

Eq. (18) is linear in parameters mriþ =mri , thus resulting in a system of M simultaneous linear equations from M training datapoints. Since linear parameters mri can be interpreted as the 1st order derivatives (refer Eq. (5)), the bounds on the values of mriþ =mri can be estimated by determining the slope of line joining rth and its neighboring rule centers along the ith dimension. This is illustrated in Fig. 2, which represents the same two-dimensional system as shown in Fig. 1. The vertical axis plots the mro values (obtained in step 1) of all the 9 rules. The idea is to use the information present in Fig. 2 to estimate the bounds on the values of m51 ; m51þ , etc. It can be veriﬁed by observing Fig. 2 that m51 ; m51þ ; m52 will assume positive values, while m52þ will be a negative number. Thus, 0 would be the lower bound for m51 ; m51þ ; m52 and the upper bound for m52þ . The other bound would be the same as the slope of line joining the 5th rule center and the corresponding neighboring rule center. Thus, the

A. Tewari, M.-U. Macdonald / Applied Soft Computing 10 (2010) 481–489

485

accuracy. On the other hand, a model’s interpretability is a measure of the consistency of a trained model with respect to the domain knowledge. Unfortunately, there is no standard method of measuring the interpretability. However, a common way to evaluate interpretability is to observe the output surface generated by a trained model and look for regions with inconsistencies. 4.1. 2D nonlinear function A two-dimensional function given by Eq. (19) is chosen as the ﬁrst example for this comparative study. The function output, f(x1, x2), was corrupted by adding a Gaussian noise, e, with zero mean and some nonzero variance. 2

f ðx1 ; x2 Þ ¼ ð2x2 0:75x21 Þ þ 0:5ð1 þ x1 Þ2 þ e

Fig. 2. Illustration of various local models about a rule center. The number of local models associated with a rule is equal to the number of its nearest neighbors.

upper bound of m51 would be (0.28–0.55)/(0.25–0.50) = 1.08. Similarly, the bounds on the other slope values can be found as m51þ ¼ ½0 0:52;

m52 ¼ ½0 0:88;

m52þ ¼ ½1:36 0

These constraints signiﬁcantly reduce the search space of the optimization problem. The linear system generated using Eq. (18) along with the above mentioned bounds can be solved using quadratic programming to obtain optimal values of mriþ =mri . Step 3: Identiﬁcation of Premise Parameters: This step involves the determination of optimal premise parameters while keeping the consequent parameters ﬁxed. There is a plethora of methods available [6,11,21–23], based on classical or evolutionary optimization techniques that can be used to implement this step. In this analysis, simple gradient-descent based backpropagation algorithm is used mainly because of its ease of implementation. However, a variety of faster options like, conjugate gradient, Quasi–Newton, Levenberg–Marquardt (LM), etc. and can be used in lieu of the gradient-descent algorithm. The LM algorithm could be the fastest option to update the premise parameters provided the fuzzy model is moderately sized (up to a few hundred premise parameters). Steps 1–3 constitute one epoch and several epochs may be need to get a desired result. In our analysis, the performance of the model was evaluated on a validation dataset after the completion of each epoch. The validation dataset and training datasets were disjoint. The training process was terminated as soon the validation error hit the minimum. This simple and commonly used scheme known as early stopping [24] prevents the over-ﬁtting of the model on the training data. 4. Function approximation using APKFM In this section, we highlight the distinct advantages of APKFM in terms of its function approximation capability. APKFM’s performance is benchmarked against ANFIS (adaptive network-based fuzzy inference system) [6]. ANFIS is a very popular and commonly used tool for fuzzy model identiﬁcation and is built on the framework of conventional TSK model, which justiﬁes its choice for benchmarking. To compare the performance, two different criteria, namely (1) prediction accuracy and (2) interpretability are taken in consideration. Prediction accuracy measures how well a trained model is able to perform on a never-seen/test dataset. It can be quantiﬁed using root mean squared error (RMSE). A model with low RMSE on test dataset is considered to have good prediction

(19)

4.1.1. Construction of fuzzy models The domain of each input was partitioned into 3 univariate fuzzy sets with linguistic labels low, medium and high. Thereafter, a grid partitioned rule-base was generated resulting in a total 32 = 9 rules. The structures of APKFM and ANFIS were quite similar and the differences are outlined in Table 1. The formulation of APKFM also requires the knowledge of favorable rules, which are identiﬁed based on an expert’s domain knowledge. Let’s assume that the following domain information was available to us prior to the formulation of APKFM: System output is High WHEN x1 is High AND x2 is Low OR vice versa: This qualitative information enabled us to identify two locally favorable rules (J = 2) with following linguistic representations. 1: x1 is low AND x2 is high:

2: x1 is high AND x2 is low:

4.1.2. Training of fuzzy models The input–output data needed for the training of fuzzy models was generated from Eq. (19) by choosing the range of both inputs as (0 3]. The model output is obtained by uniformly sampling the data-points from these input ranges. Additionally, the output is corrupted by adding a Gaussian noise. A noise with high variance yielded an training data with poor signal to noise ratio. Another variable that had a profound effect on the quality of training data was the number of input–output pairs. Clearly, a training dataset with fewer pairs contains less information and therefore is poor in quality. The idea was to generate different training datasets with varying degree of noise level (sL) and number of data-points (M) and use them to compare the performances of competing modeling techniques. Five different standard deviations values were chosen viz. 0, 0.067R, 0.133R, 0.2R and 0.267R, where R represents the range of model output (roughly 0–30 in this example). For each noise level we created datasets of three different sizes, i.e. with 60, 100 and 140 data-points. In this way, training datasets with 15 possible combinations of noise level and size were generated. Table 1 Properties of 1st order TSK model and APKFM used for approximating the twodimensional nonlinear function.

Number of fuzzy sets/input Membership function type Number of rules Number of premise parameters Number of consequent parameters

1st order ANFIS

1st order APKFM

3 Triangular 9 18 27

3 Triangular 9 18 33

486

A. Tewari, M.-U. Macdonald / Applied Soft Computing 10 (2010) 481–489

Fig. 3. Plots of mean RMSE vs. noise level for a trained ANFIS (left) and APKFM (right). The competing fuzzy models were trained with the datasets with 3 different sizes viz. 60, 100 and 140. The error bars represent 95% CI of the RMSE values.

4.1.3. Comparison of accuracy Fig. 3 shows the plots of root mean square error (RMSE) vs. the noise level (sL) for ANFIS (left) and APKFM (right), respectively. The plots were generated for three different training dataset sizes (M) viz. 60, 100 and 140. The validation datasets used were 1/3rd the size of training datasets. To evaluate the performance, the RMSE values were computed on never-seen and noise-free test datasets of size 100. To generate the error bars, experiment was repeated several times (40) for every combination of sL and M. The error bars represent 95% conﬁdence interval. Some key observations are: 1. Superior robustness of APKFM is clearly evident compared to ANFIS. ANFIS was very sensitive to the noisy training data, which

is evident by the sudden rise of RMSE with the increase in the noise level. This sensitivity was understandably more pronounced when the training dataset size was smaller. In contrast, APKFM turned out to be robust towards the noisy data. Even when the training dataset size was small, the RMSE value increased gradually with the noise level. 2. The error bars, indicating that the uncertainty associated with ANFIS prediction, were smaller when M was higher and sL was lower. As soon as this situation reversed, the uncertainty in ANFIS prediction increased sharply. On the other hand, the error-bar size did not increase signiﬁcantly in case of APKFM indicating towards the lower uncertainty in its prediction.

Fig. 4. The output surface generated by ANFIS (left column) and APKFM (right column) when training with a datasets with increasing noise levels. The dataset size is ﬁxed at 100 and the sL values were chosen to be 0, 0.133R and 0.267R. R represents the range of the output which is 30 in this example.

A. Tewari, M.-U. Macdonald / Applied Soft Computing 10 (2010) 481–489

4.1.4. Comparison of interpretability As per the knowledge of author, there is no standard measure of the interpretability of a model, hence, it is compared by observing the output surface of the ﬁnal model. In Fig. 4, the output surfaces of trained ANFIS (left column) and trained APKFM (right column) are shown. The top surfaces correspond to the case when the training data is noise-free. The subsequent plots correspond to the models trained with datasets with higher noise levels viz. sL = 0.133R and sL = 0.267R. The size of training datasets was ﬁxed at 120. When the training dataset was noise-free the ANFIS generated an output surface with an excellent resemblance with the actual surface. However, as the data became noisier some imperfections began to emerge on the output surface (see encircled regions). These imperfections were clearly manifested by the over-ﬁtting on noisy training datasets which made the model behavior inconsistent with the domain knowledge. On the contrary, APKFM did an excellent job in terms of keeping the interpretability of the model intact even at high noise levels. The output surface was smooth and remained consistent with the domain knowledge in the entire input space. 4.2. Maintenance cost estimation model To further corroborate the advantage of APKFM in a real world scenario, we picked up a practical problem requiring estimation of the maintenance cost of the electricity distribution networks in Spanish towns. Over the years in Spain, the maintenance cost was estimated based on the exact measurement of the total length of the distribution lines. This method provided excellent estimates but suffered from the requirement of extensive resources. Therefore, an indirect method was needed which could provide reasonable estimates without being prohibitively exorbitant and time consuming. An approach proposed by Cordon et al. [25] proved to be quite effective and highly economical in this regard. In this approach, the maintenance cost was estimated from fours easily measurable characteristic of a town given in Table 2. The idea was to develop a data-driven model that takes these four characteristics as inputs and computes the maintenance cost as the output. In [25] this problem is modeled by different techniques viz. linear regression, 2nd order polynomial regression, neural networks, Mamdami fuzzy model, TSK fuzzy model, etc. Out of these techniques, it is shown that the TSK model provided the best prediction accuracy. In this analysis, the same dataset was used as in the above mentioned study and the performances ANFIS and APKFM are compared. 4.2.1. Construction of fuzzy models The ANFIS and APKFM were built in the similar manner as was done in the previous example of nonlinear function approximation. Each input domain was divided into 3 triangular univariate fuzzy sets with linguistic labels low, medium and high. Therefore, the input space is partitioned into 34 = 81 grids, resulting in 81 rules in the rule-base. The APKFM’s structure was similar to that of TSK model except for the modiﬁed rule consequents. For the

Table 2 Description of different inputs considered in the maintenance cost estimation model. Symbol

Meaning

x1 x2 x3 x4 y

Total length of all the streets in the town Total area of the town Total area occupied by buildings Total energy supply to the town Maintenance of the medium voltage line

487

formulation of APKFM, we needed some qualitative information that could help us identify favorable rules. Let that qualitative information be in the form of following statement: ‘‘The model output increases as the inputs x1 ; x2 ; x3 and x4 increase’’ The validity of this statement can be easily veriﬁed if we look at the description of our model inputs as given in Table 2. Clearly, the maintenance cost of the distribution network is expected to rise if any of these inputs value increases. This qualitative information enabled the identiﬁcation of the most favorable region of the input space having the following linguistic representation: x1 is high AND x2 is high AND x3 is high and x4 is high This region of the input space coincides with the validity region of the rule number 81. Once the favorable rule is identiﬁed, the methodology presented in Section 3 can be followed to construct the APKFM. 4.2.2. Training of fuzzy models The total data consisting of 1042 input–output pairs was randomly split into training dataset (742), test dataset (150) and validation dataset (150). The premise and consequent parameters were updated sequentially in one epoch as discussed in Section 3.3. Validation dataset was used to monitor the progress of training. 4.2.3. Comparison of accuracy and interpretability The prediction accuracies are compared in Fig. 5, which plots the target output vs. predicted output for the two models. Clearly, both techniques yielded models with excellent accuracy. The coefﬁcient of determination value, R2, for the ANFIS prediction (0.995) was slightly better than that of APKFM prediction (0.983). However, the main strength of APKFM proved to be its superior interpretability in the entire input space. This is illustrated in Fig. 6 which compares the output surface generated by the trained models. The inputs x1 and x4 were ﬁxed at a value of 0.75 and 0.5, respectively, while the surfaces were generated by spanning the remaining two inputs in the entire input space. One thing that stands out in Fig. 6 is that the APKFM produced a surface which was much smoother compared to that of ANFIS. This is a desirable result because most of the real world process show smooth transitions from one state to another and the output seldom changes abruptly with the change in the input conditions. ANFIS resulted in an output surface which was not consistent with domain knowledge in some regions of input space. If we observe the encircled regions in Fig. 6, we see that the output is decreasing with the increase in the inputs x3. According to the domain knowledge the model output should never be decreasing with the input x3 because it would suggest that less resources are needed to maintain a town that has higher percentage of area occupied by buildings. However, this inconsistency was not shown by APKFM as evident from its output surface. One question that is worth investigating here is that why the ANFIS was unable to capture the domain knowledge despite showing good prediction accuracy. The answer to this question lies in the distribution of training data-points in the input space. Fig. 7 shows the spread of training data-points in the input space for the current problem. Clearly, there were certain regions of the input space where the distribution is sparse, as shown by the encircled regions in Fig. 7. If we carefully examine Fig. 6, we observe that the inconsistent regions of output surface correspond to these sparsely populated regions of input space. As a consequence, ANFIS performed poorly because it did not have sufﬁcient data-points to learn from in these regions. This observation indicates towards the poor extrapolation capability of ANFIS, which has also been highlighted by several

488

A. Tewari, M.-U. Macdonald / Applied Soft Computing 10 (2010) 481–489

Fig. 5. Plots of predicted output vs. actual output resulted for ANFIS (left) and APKFM (right), respectively. ANFIS performs marginally better than APKFM in terms of prediction accuracy as evident from the R2 values of the ﬁtted linear models.

Fig. 6. Output surfaces generated by different models after training with 742 data samples. The inputs x1 and x4 are ﬁxed at a value of 0.75 and 0.5, respectively, while x2 and x3 were varied to generate these surfaces. The encircled regions signify that the output is not consistent with the domain knowledge.

Fig. 7. The distribution of training data-points in the input space. The encircled regions have a sparse distribution of points. These regions corresponds to those input subspaces where the model output may become inconsistent with the domain knowledge.

other authors [26–28]. On the contrary, APKFM showed relatively better extrapolation capability and was able to maintain the consistency of the model even in the regions with few data-points. This ability of APKFM is the direct consequence of the incorporation of domain knowledge in its parameter values.

5. Conclusion Hybridization of fuzzy logic with neural networks provides a powerful tool for system identiﬁcation and is commonly referred as neuro-fuzzy hybridization in the ﬁeld of artiﬁcial intelligence. A

A. Tewari, M.-U. Macdonald / Applied Soft Computing 10 (2010) 481–489

neuro-fuzzy system provides a platform to combine the humanlike reasoning style of fuzzy logic with the learning capability of neural networks. Despite being quite successful, the main challenge with neuro-fuzzy systems remains to be balancing the trade-off between accuracy and interpretability. Usually one of these conﬂicting properties prevails in a model. For instance, when the expert knowledge is insufﬁcient and the model parameters are to be learned from historical process data, a neuro-fuzzy model may turn into a black-box model with very little interpretability. This problem is more prevalent in a widely used class of neurofuzzy models called Takagi–Sugeno–Kang (TSK) fuzzy models. In this paper, a-priori knowledge-based fuzzy models (APKFM) are proposed with the main objective of incorporating qualitative domain knowledge in the consequent parameter values of the TSK models. The primary goal is to construct fuzzy models with a good combination of accuracy and interpretability without being overly biased towards any one of these properties. The core idea is a two step identiﬁcation of consequent parameters (1) estimation of the constant term ðmro Þ of consequent polynomial using a mixture of Gaussian basis functions and (2) constrained optimization of the remaining linear parameters ðmri Þ of the consequent polynomial, wherein the constraints are obtained from the ﬁrst step. The domain knowledge is incorporated in a fuzzy model in an algorithmic framework without signiﬁcantly increasing the computational requirement of the overall learning process. The advantage of APKFM is illustrated using two examples (1) approximation of a toy nonlinear function and (2) a real world problem of the maintenance cost estimation of electricity distribution networks. The strength of APKFM lies in its superior interpretability without signiﬁcantly impacting the model’s prediction accuracy. It was shown that the output of trained APKFM extrapolates very well even in those regions of input spaces where the training data-points were sparsely distributed. Acknowledgements The authors are grateful for the support received during 2 years from the National Science Foundation (contract BES-0423882). We are also thankful to Dr. Judith Todd (Head of the Dept.), Engineering Science and Mechanics, for the 2-year support as a department teaching assistant for one of the authors. References [1] J.M. Mendel, Uncertain Rule-based Fuzzy Logic Systems Introduction and New Directions, Prentice Hall PTR, Upper Saddle River, NJ, 2001. [2] O. Nelles, Nonlinear System Identiﬁcation, Springer, Berlin, 2002. [3] T. Chen, M.-J.J. Wang, Fuzzy set approach for yield learning modeling in wafer manufacturing, IEEE Transactions on Semiconductor Manufacturing 12 (1999) 252–258.

489

[4] T. Takagi, M. Sugeno, Fuzzy identiﬁcation of systems and its applications to modeling and control, IEEE Transactions on Systems, Man and Cybernetics SMC-15 (1985) 116–132. [5] H. Nomura, I. Hayashi, N. Wakami, A learning method of fuzzy inference rules by descent method, IEEE International Conference on Fuzzy Systems (1992) 203–210. [6] J.-S.R. Jang, ANFIS: adaptive-network-based fuzzy inference system, IEEE Transactions on Systems, Man and Cybernetics 23 (1993) 665–685. [7] F.J. Moreno-Velo, I. Baturone, R. Senhadji, S. Sanchez-Solano, Tuning complex fuzzy systems by supervised learning algorithms, in: The 12th IEEE International Conference of Fuzzy Systems, vol. 1, 2003, 226–231. [8] J.J. Buckley, Y. Hayashi, Fuzzy neural networks: a survey (invited review), Fuzzy Sets and Systems 66 (1994) 1–13. [9] M. Setnes, R. Babuska, H.B. Verbruggen, Rule-based modeling: precision and transparency, IEEE Transactions on Systems, Man & Cybernetics Part C: Applications and Reviews 28 (1998) 165–167. [10] A. Riid, E. Rustern, Transparent fuzzy systems in modeling and control, Interpretability Issues in Fuzzy Modeling (2003) 452–476. [11] P. Lindskog, Methods, algorithms and tools for system identiﬁcation based on prior knowledge, Ph.D., Department of Electrical Engineering, Linkoping University, 1996. [12] J. Yen, L. Wang, W. Gillespie, Global-local learning algorithm for identifying Takagi–Sugeno–Kang fuzzy models, in: IEEE International Conference on Fuzzy Systems, 1998, 967–972. [13] A. Fiordaliso, Analysis improvement of Takagi–Sugeno fuzzy rules using convexity constraints, in: Proceedings of the International Conference on Tools with Artiﬁcial Intelligence, 1998, pp. 232–235. [14] A. Fiordaliso, Constrained Takagi–Sugeno fuzzy system that allows for better interpretation and analysis, Fuzzy Sets and Systems 118 (2000) 307–318. [15] M. Bikdash, Highly interpretable form of Sugeno inference systems, IEEE Transactions on Fuzzy Systems 7 (1999) 686–696. [16] J. Abonyi, R. Babuska, M. Setnes, H.B. Verbruggen, F. Szeifert, Constrained parameter estimation in fuzzy modeling, in: IEEE International Conference on Fuzzy Systems, 1999, 951–956. [17] J. Abonyi, R. Babuska, H.B. Verbruggen, F. Szeifert, Incorporating prior knowledge in fuzzy model identiﬁcation, International Journal of Systems Science 31 (2000) 657–667. [18] B.-T. Tien, G. van Straten, Incorporation of qualitative information into T–S fuzzy model, in: Annual Conference of the North American Fuzzy Information Processing Society—NAFIPS, 1997, 148–153. [19] L.J. Herrera, H. Pomares, I. Rojas, O. Valenzuela, A. Prieto, TaSe, a Taylor seriesbased fuzzy system model that combines interpretability and accuracy, Fuzzy Sets and Systems 153 (2005) 403–427. [20] M.S. Bazaraa, H.D. Sherali, C.M. Shetty, Nonlinear Programming: Theory and Algorithms, Wiley-Interscience, 2006. [21] J. Botzheim, E. Lughofer, E.P. Klement, L.T. Koczy, T.D. Gedeon, Separated antecedent and consequent learning for Takagi–Sugeno fuzzy systems, in: IEEE International Conference on Fuzzy Systems, 1999, 2263–2269. [22] M. Mannle, Parameter optimization for Takagi–Sugeno fuzzy models—lessons learnt, in: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2001, pp. 111–116. [23] J.-S.R. Jang, E. Mizutani, Levenberg–Marquardt method for ANFIS learning, in: Annual Conference of the North American Fuzzy Information Processing Society— NAFIPS, 1996, 87–91. [24] T. Zhang, B. Yu, Boosting with early stopping: convergence and consistency, Annals of Statistics 33 (2003) 1538–1579. [25] O. Cordon, F. Herrera, L. Sanchez, Solving electrical distribution problems using hybrid evolutionary data analysis techniques, Applied Intelligence 10 (1999) 5–24. [26] J.S.R. Jang, Input selection for ANFIS learning, in: IEEE International Conference on Fuzzy Systems, 1996, 1493–1499. [27] P.-Q. Li, X.-R. Li, Interpolation and extrapolation ability of fuzzy neural network load modeling, Gaodianya Jishu/High Voltage Engineering 34 (2008) 1155–1160. [28] K. Kosanovich, A. Gurumoorthy, E. Sinzinger, M. Piovoso, Improving the extrapolation capability of neural networks, in: IEEE International Symposium on Intelligent Control—Proceedings, 1996, pp. 390–395.

Knowledge-based parameter identification of TSK fuzzy ...

Available online 3 September 2009. Keywords: ... identification of consequent parameters in order to have a good .... signifies the degree of closeness of the rth rule center to the center ..... quadratic programming to obtain optimal values of mr.

Download PDF

710KB Sizes 4 Downloads 230 Views

Report

Knowledge-based parameter identification of TSK fuzzy ...

Recommend Documents