A Discriminative Method For Semi-Automated Tumorous Tissues Segmentation of MR Brain Images Yangqiu Song1 , Changshui Zhang2 , Jianguo Lee1 and Fei Wang1 1,2 State Key Laboratory of Intelligent Technology and Systems, Department of Automation, Tsinghua University, Beijing, China, 100084 1

{songyq99,lijg01,feiwang03}@mail.tsinghua.edu.cn 2

[email protected]

Abstract This paper introduces a discriminative method for semiautomated segmentation of the tumorous tissues. Due to the large data of 3D MR brain images and the blurry boundary of the pathological tissues, the segmentation is difficult. A non-parametric Bayesian Gaussian process is proposed to be used for the semi-supervised mode. This discriminative method uses both labeled data and a subset of unlabeled data sampling from 2D/3D images to classify the remains, which is called inductive problem. We propose the prior of traditional Gaussian process to be based on graph regularization and develop a new conditional probability named Extended Bernoulli Model to realize the induction. Fast algorithm to speed up the training phase is also implemented. Experimental results show our approach produces satisfactory segmentations corresponding to the manually labeled results by experts.

1. Introduction For the practical usage of MR (Magnetic Resonance) image segmentation work, which can assist the clinical diagnosis and evaluate the therapy results, it is a hot topic in the medical imaging analysis field [11]. Computer assisted brain tumor segmentation is one of the interesting and hard issues. There are mainly two problems. First, automatic measurement of the volume and variation of these tumorous tissues is not easy. The cerebral tumorous tissues in MR brain images may vary in size, shape and location. They are usually accompanied with edema. Other tissues, such as hemorrhage, necrosis and cystic components, may also appear in the tumorous region [16]. Thus, the boundaries of the tumorous tissues may be rather blurry. Second, there are a great many pixels (such as 256 × 256 × 124) for 3D MR images. Consequently, segmentation will be high computational complexity and large memory requirements.

This problem could be solved by applying the 2D methods sequently, since even experts segment the tissues slice by slice. However, this will lose some geometric information. There have been different methods to segment the MR images [11]. Here, the segmentation task is regarded as a recognition problem. In general, one could use the supervised classification or the unsupervised clustering methods. Supervised methods may produce good results. However, they require the scrupulous labeling work by doctors or experts, which is time consuming and costly. Unsupervised methods could do the segmentation automatically, but sometimes it is difficult to produce good result fully automatically. Thus, the idea of combining supervised and unsupervised methods is brought forward, which is the semisupervised method [22]. One class of the semi-supervised methods is based on graph or manifold analysis [1, 18, 20]. The advantage of graph or manifold based methods is they can use only few labeled and lots of unlabeled examples to get a satisfactory accuracy. Some approaches have applied the graph-based methods to image segmentation and MR image analysis. However, most of them are unsupervised [4] or transducitve [6] (which means working only on the observed labeled and unlabeled data set). Thus, both of them should do the segmentation slice by slice, due to the large number of data and the computational complexity. When the tumorous tissues are very small in the image, the unsupervised methods will probably attempt to segment the normal tissues, such as: gray matter, white matter and the cerebrospinal fluid. Besides, transductive methods need to label some data for each slice. An intuitional idea to improve these problems is that, by labeling only some data in one image, the computer classifies all the others. In this paper, we propose a method based on graph analysis and interpret it in a discriminative Bayesian framework. This is an inductive method, which uses the labeled data in one image and a subset of unlabeled data to clas-

sify the remains. It can segment 3D data by sampling the unlabeled data from 3D images rather than by segmenting 2D images sequently. We propose the prior of traditional Gaussian process [14] to be based on graph regularization and develop a new conditional probability named Extended Bernoulli Model (EBM) to realize the induction. We use a fast algorithm to solve the problem of the large number of the training data. In section 2, we will review some related works. Section 3 will goes into technical details about our semi-supervised Gaussian process induction. Experimental results will be given in section 4. Finally we will discuss our methods and conclude in section 5.

2. Model Comparision First, we introduce some notations. We denote the input data point as a feature vector xi (i = 1, 2, ..., N ), and ∆ N XN = {xi }i=1 is the observed data set include both labeled and unlabeled data. The label of the input data point is denoted by ti . The label set of all the training data is tN = (t1 , t2 , ..., tN )T . Most of the algorithms in Bayesian framework focus on the joint distribution P (xi , ti ). For the algorithms that do not need to predict for the unobserved data, there are mainly two way to model this probability: a) : b) :

P (XN , tN ) = P (XN |tN )P (tN ) P (XN , tN ) = P (tN |XN )P (XN )

2.2. Discriminative Model Descriptive models are efficient for segmenting images. However, when the boundary of the object is blurry, discriminative may be better. The essence of discriminative models is they classify each points based on features. While descriptive model define P (tN ) and P (XN |tN ) in the Bayesian framework (1), the discriminative method firstly models the probability P (tN |XN ), then models the probability P (XN ) as a regularization term. Note that, in discriminative models, we call P (XN ) as the prior and P (tN |XN ) as the likelihood. Supervised method such as SVM (Support Vector Machine) [16]; transductive graph based method [6] are all discriminative. Instead of using the direct process xi → ti , we use a latent variable yi to generate the process xi → yi → ti . yi = y(xi ) is a function of xi , and yN = (y1 , y2 , ..., yN )T is the latent variable vector of input data. Then we could impose some noisy functional relationship on the process yi → ti . We model the latent variable yi to be a Gaussian random field (GRF) [20] and use P (yN ) as the datadependent prior instead of P (XN ). This is a Gaussian process, since for any finite selection of points, P (yN ) is Gaussian [14]. The graphical model of our discriminative model is shown in Fig.1 (b). In the following section, we will present the details of Gaussian process.

(1)

This two mechanisms are mainly related to the following models.

2.1. Descriptive Model The first equation in (1) can be generative model or the descriptive model [19]. Here, we discuss the descriptive model since it is more related to our work. Descriptive model is constructed based on statistical descriptions of the natural images, such as intensity, texture, texton, 2D curves and so on [19]. The Markov Random Field (MRF) model [5], the active contour and deformable models [10, 17], and the level set method [9] can be regard as descriptive models. MRF is a typical descriptive model. MRF first models the joint probability P (tN ) which equals to a Gibbs distribution, and then formulates the probability P (XN |tN ) as Gaussian or others. The graphical model of MRF is shown in Fig.1 (a). Boykov et al [2] develop a semi-supervised graph cuts method separating an object of interest from the background using an interactive way. Their approach has a clear cost function that can be explained as a context of MAP-MRF estimation. Other interactive methods based on graph cut such as [8, 12] are all related to the MRF model, and their main differences are the interactive manners and the prior forms.

(a) MRF

(b) GP

Figure 1. Graphical Models.

3. Method: Semi-Supervised Induction In MR segmentation problem, each voxel xi is modeled with six dimensional features: three spatial features and three intensity’s features of T1, T2 and PD weighted images. As we mentioned in setction 1, we want to find whether a voxel, which is not in the observed training set, is tumorous or normal. This is equal to estimate the label tRN +1 of a new point xN +1 . For computing P (tN +1 |tN ) = P (tN +1 |yN +1 )P (yN +1 |tN )dyN +1 , we only need to calculate the posterior: R P (yN +1 |tRN ) = P (yN +1 |yN )P (yN |tN )dyN (2) = P (t1N ) P (yN +1 |yN )P (tN |yN )P (yN )dyN However, to calculate the high dimension integral is not easy. One way to solve this has been presented in the

Ψ(yN ) = − log P (tN |yN ) − log P (yN )

(3)

P (tN ) is omitted for it is a constant unrelated to yN . P (yN ) is the prior, which is based on graph regularization. P (tN |yN ) is the new conditional probability Extended Bernoulli Model (EBM). This model can naturally represent the margin. Laplace approximation method [14] is used again for estimating yN under a fixed margin. Then, we relabel the training set according to the estimated yN , and stop to train new Gaussian processes until there are no labels of points flipping. Therefore, to calculate yN +1 , we define the function: Ψ(yN , yN +1 ) = − log P (tN +1 |yN +1 ) − log P (yN +1 ) (4) which is minimized only with respect to yN +1 . In the following, we will firstly present the training phase, which includes how to model the new prior P (yN ), and the new conditional probability P (tN |yN ). Then, we will give the induction formulation.

3.2. The Likelihood In this section, we present a new noise function of the process y → t, which is named as Extended Bernoulli Model (EBM): 1−λ I{t6=0} + λI{t=0} 1 + exp(−ty)

P (t|y) =

NCNM

1 0.9

Hard Margin

0.7

t=1

t = −1

0.6

P(t=−1|y)

t=0

0.7

p(t|y)

0.5 0.4 0.3

0.5

P(t=0|y) 0.4 0.3

−15

−10

−5

2

}

(5)

where:

∆ = I − S is called normalized graph Laplacian in spectral graph theory [3], and the prior P (yN ) defines a Gaussian 1 1 random field (GRF) [20] on the graph. S = D− 2 WD− 2 , (W) P ij = Wij and D = diag(D11 , ..., DN N ). Dii = j Wij , and Wij is called the weight function associated with the edges on graph, which satisfies: Wij > 0 and Wij = Wji . It can be viewed as a symmetric similarity measure between xi and xj . The covariance matrix K is relative to the inverse matrix of the normalized graph Laplacian, so the covariance between two points is depend on all the other training data including both labeled and unlabeled [21]. In contrast, most of the traditional Gaussian processes adopt the covariance based on “local” distance information [21].

15

0 −5

20

P(t=−1|y)

0.6

P(t=0|y) 0.4

P(y|t=1)

P(y|t=−1)

0 −5

0

y

P(t=−1|y)

P(t=1|y)

0.5 0.4

P(t=0|y)

0.3 0.2

P(y|t=0)

P(y|t=1)

P(y|t=−1)

P(y|t=0)

0.1

5

(c) λ = 0.4, µy = 1.5

5

Soft Margin

0.7

P(t=1|y)

0.3

0

y

(b) λ = 0.4, µy = 0

Hard Margin

0.1

(6)

10

0.5

0.2

K−1 = ∇∇yN (− log P (yN )) , ∆

5

P(t|y) and P(y|t)

1 exp{− Z

0.6

P(t|y) and P(y|t)

P (yN ) =

0

y

(a) Ordered Category Model 0.7

T yN K−1 yN

P(y|t=0)

0.1

0.1 0 −20

P(y|t=1)

P(y|t=−1)

0.2

0.2

In Gaussian process, P (yN ) is the prior. We use the graph or manifold regularization based prior, which is also used for other graph based methods [1, 18, 20]:

P(t=1|y)

0.8

0.6

3.1. The Prior

(7)

where I is the indicative function: I{ti =0} means if ti = 0, I = 1, and if ti 6= 0, I = 0. In the traditional two-class case, ti ∈ {−1, 1}. Here, for the semi-supervised problem, we attempt to extend the labels of the labeled to the unlabeled, whose labels are set to zeros initially. Thus, if ti = 0, the probability P (ti = 0|y) ≡ λ. The factor λ makes the function P (ti |yi ) with respect to ti is a probability, which means P (ti = 1|yi ) + P (ti = −1|yi ) + P (ti = 0|yi ) ≡ 1. As Fig.2 shows, this model can be considered as a degenerated Ordered Category Model [7], where the variance of the probability P (ti = 0|yi ) is infinite. We regard the range as the margin, where P (ti = 0|y) is larger than P (ti = 1|y) and P (ti = −1|y). In the margin, the difference of P (ti = 1|y) and P (ti = −1|y) is smaller than the outside. Thus, the margin of EBM represents the uncertainty of a point belonging to a class. The parameter λ controls this margin.

P(t|y) and P(y|t)

original Gaussian process classification’s paper [14]. Since the posterior (2) can be approximated as Gaussian by using Laplace approximation method [14], we could first approximate P (yN |tN ) as Gaussian and estimate yN . Then, we use the estimated yN instead of the integral. By computing the mode of posterior P (yN |tN ) as the estimate of yN , which is the negative logarithm of P (yN |tN ) = P (yN , tN )/P (tN ), we define the function of yN as:

0 −5

0

y

5

(d) λ = 1/3, µy = 1.5

Figure 2. Illustration of EBM (Extended Bernoulli Model).

Moreover, Fig.2 (b)-(d) show how the label and the margin affect the training and prediction phase. In Gaussian process, we can assume that each latent variable are also conditionally Gaussian: P (yi |yN −{i} , XN ) = N (µi , σi )

(8)

where µi and σi are related to the input space (see the graphical model Fig.1 (b)). For Bayesian formulation, we have P (ti |yi )P (yi ) = P (yi |ti )P (ti ). The probabilities P (ti ) and P (ti |yi ) are both constant for ti = 0, so the posterior probability P (yi |ti = 0) only depends on the prior P (yi ).

On the other side, the mean and the variance of P (yi |ti = 1) and P (yi |ti = −1) are affected by the likelihood. If µi is near zero (shown in Fig.2 (b)), the posterior is fully affected by the label ti . When the label is ti = ±1, it will have a positive or negative estimated yi . If the label is ti = 0, by maximizing the posterior probability, we will get a zero estimated yi . This is why we choose a graph regularization based prior. As mentioned previously, each covariance between two points of this prior is related to all the training data. Thus, if there are very few labeled data in the training set, µi (xi is unlabeled) will be affected by the labeled data more than the one choosing the traditional prior. Then, µi could be non-zero (shown in Fig.2 (c)), which would lead to a non-zero estimated yi . Furthermore, by comparing Fig.2 (c) and Fig.2 (d), we can see that, the margins do not affect the estimation of the latent variables. The estimated yi is the same in spite of what margin model is imposed on the process y → t. However, the point whose latent variable yi falling inside the margin will get a label zero, which makes it remains unlabeled. This will make these points do not contribute in the prediction function (see in the prediction phase). Therefore, the classification boundary will be changed. Thus, EBM have modeled the unlabeled data. It can remove the effect of the outliers by choosing a proper margin. However, this margin can be large, and then it may also remove the geometric information of the unlabeled data. Finally, by taking derivatives of − log P (tN |yN ) with respect to yN , we denote the gradient and the Hessian as:   ti α = ∇yN (− log P (tN |yN )) = − 1+exp(t  i y2 i ) i  ti exp(ti yi ) Π = ∇∇yN (− log P (tN |yN )) = diag (1+exp(t 2 y )) i i (9) Note that the parameter λ dose not affect the gradient and the Hessian. In the training phase, the unlabeled elements in the gradient vector α and the Hessian matrix Π are zeros.

3.3. Training Phase: The Laplace Approximation Now, we can use the Laplacian approximation for the posterior probability with a fixed margin. By differentiating equation (3) with respect to yN , we have: ∇Ψ = α + g,

∇∇Ψ = Π + K−1

(10)

To find the minimum of Ψ in equation (3), the NewtonRaphson iteration [14] is adopted: yN new = yN − (∇∇Ψ)−1 ∇Ψ

(11)

Since ∇∇Ψ is always positive definite, (3) is a convex probˆ N , ∇Ψ will be zero lem. When it converges to an optimal y vector. The posterior probability P (yN |tN ) can be approxˆN . imated as Gaussian, being centered at the estimated y

The covariance of the posterior is ∇∇Ψ. After the Laplace ˆ N will depend approximation, the latent variable vector y on all the training data. We relabel the training set using a fixed margin, and use the new labeled data to training another Gaussian process. We repeat the procedure until there are no labels flipping in the training set.

3.4. Prediction Phase: Induction In the prediction phase, we minimize equation (4) to compute the latent function yN +1 of a new point xN +1 . The objective function (4) can be rewritten as: Ψ(yN , yN +1 )

=

NP +1

log (1 + exp (−ti yi ))

i=1 + 12 log |KN +1 |

−1 T + 21 yN +1 KN +1 yN +1 (12) The new point xN +1 is unlabeled, and the label is iniˆ N has been estimated tially set to zero. Since the optimal y in the training phase, we only minimize equation (12) with respect to yN +1 , which leads to:

yˆN +1 = kT K−1 y ˆN = kT (I − S)ˆ yN

(13)

ki = WN +1,i = exp(−||xN +1 − xi ||2 /2σ 2 ) is the approximate covariance of a new given point and the ith training point [14]. Note that when we have minimized equation (3), the gradient in equation (10) is: ∇Ψ = α + K−1 yN = 0. Thus, equation (13) can be rewritten as: yˆN +1 = −kT α ˆ

(14)

where α ˆ = −K−1 y ˆN . According to equation (9) and (14), we can see that, if the latent variable of the point in the training set satisfies yi > 0, and the point is outside the margin, it have a positive weight −αi ki in the prediction function. On the contrary, if a point outside the margin satisfies yi < 0, the weight is negative. Moreover, αi tends to be zero with very large yi , and to be near ±1/2 when yi is nearly outside the margin. Furthermore, if there is a margin, the points falling inside the margin will have the weight of zeros (equation (9)). Therefore, these points will not affect the classification result in the prediction phase.

3.5. Speed up of the training algorithm Although we have a clear and compact formulation to realizing the semi-supervised induction, a major problem is that the training computational requirement is scaling to O(N 3 ) with respect to the number of the training data size N . A simple way to reduce the O(N 3 ) computational requirement and the O(N 2 ) memory requirement is to introduce the mechanism of expressing the solution as sampled examples from the training set [15] [4]. In this paper, we follow these two papers to speed up our semi-supervised training phase.

Figure 3. “Patient 1” Segmentation Results. a) Original T1 weighted image of slice 60 and the human interactive hints of object and background. b) Original T2 weighted image. c) Original PD weighted image. d) Handmin. e) Handmid. f) Handmax. g) The result of GVF Snake (Initial contour: yellow dot line. Final result: blue solid line). h) The result of Lazy snapping. i) The result of SVM . j) The result of spectral clustering using nystro¨ m method. k) The unpost-processed SSGPI result. l) The post-processed SSGPI result. 1

1

In [4], the authors prove that the matrix D− 2 WD− 2 can be decomposed as UΛUT , where Λ is an M × M matrix, M is the sampling size. In the training phase of Gaussian process, it requires the computer to solve the inverse of ∇∇Ψ when it plays the Newton-Raphson iteration to find the minimum of Ψ (11). Here we rewrite the Hessian ma1 1 trix as ∇∇Ψ = Π + K−1 = Π + I − D− 2 WD− 2 = Π + I − UΛUT . Therefore, by applying the Woodbury formula [15], we have: (∇∇Ψ)−1 = (Π + I)−1 +(Π + I)−1 U(I − ΛUT (Π + I)−1 U)−1 ΛUT (Π + I)−1 (15) Thus, the computational requirement of each iteration is O(M 2 N ), because it need only to compute the SVD of an M × M sub-matrix.

make use of both intensity features and spacial features to construct the edge of a graph G = (V, E). We take each voxel as a node and define the edge weight Wij between voxel i and j as: −kxc (i) − xc (j)k2 −kxp (i) − xp (j)k2 ∗ exp 2σc2 2σp2 (16) where xc is the vector of intensities of T1, T2 and PD images, and xp is the coordinate vector (x, y, z) of the voxel, σc and σp are the parameters of our algorithm. All the vectors has been normalized to be in [0, 1].

Wij = exp

4.2. The Golden Standard

In this section we present some experimental results of our method. We apply our algorithm to the real MR images of three patients [16]. Each patient sequence consists of 124 slices of 256×256 pixels which is 0.94×0.94×1.5 mm3 for T1 and 0.47×0.47×5 mm3 for T2 and PD of voxel size. The 3D T1, T2 and PD images have been preprocessed by registration and the extra-cranial tissues has been removed [16]. The intensity inhomogeneous effect has also been dealt with in the pre-processing phase. The background pixels are ignored in the computation. Typical T1, T2 and PD images of patient 1 are show in Fig.3 (a)-(c).

To evaluate of the segmentation results, we introduce the golden standard as the segmentation reference. Suppose there are N experts which give independent segmentations by hand, denoted as Ai , i = 1, 2, ..., N , the following three methods to get the golden standard are used: Minimum Area: We denote it as the Handmin, and AHandmin = A1 ∩ A2 ∩ ... ∩ AN . Maximum Area: We denote it as Handmax, and AHandmax = A1 ∪ A2 ∪ ... ∪ AN . Majority Voting: If most of the experts figure that the voxel is belonged to a pathological region, we regard it as pathological tissue. We denote it as Handmid. The region of Handmid is between the other two. Fig.3 (d)-(f) show typical slices of “patient 1’ with the three golden standard.

4.1. Graph Construction

4.3. Performance Evaluation

In order to apply the graph regularization based prior and the framework to the segmentation problem of an image, we

In order to compare with the other results, we use the same index definition presented by [16] to evaluate the re-

4. Experimental Results

(17)

In the following context we will show all the experiments with respect to these two criterions. The MDR is low means our classification result covers most of the region of the golden standard. The FDR is low means that our result is mainly in the region of the golden standard. This two criterions limit the result to consistent with the golden standard.

X: 0.15 Y: 0.15 Z: 0.1825

X: 0.07 Y: 0.15 Z: 0.5154

X: 0.15 Y: 0.07 Z: 0.1957

0.8 X: 0.1 Y: 0.1 Z: 0.1761

0.2 0.18 0.16 0.14

X: 0.07 Y: 0.15 Z: 0.1091

0.12 0.1 0.16

0.14

0.16 0.14

X: 0.07 Y: 0.07 Z: 0.1134

0.12

FDR (handmid )

# F alse N egatives M DR = # T rue P ositives + # F alse N egatives # F alse P ositives F DR = # T rue P ositives + # F alse N egatives

will cause the MDR descending and the FDR raising. As the figure shows, the intensity’s information is the dominant discriminative features. If we want to find an optimal parameter, we must consider the balance of both MDR and FDR case.

MDR (handmid)

sults, which are called Missing Detection Rate (MDR) and False Detection Rate (FDR):

0.6 0.4

X: 0.07 Y: 0.07 Z: 0.1858

0.16 0 0.06

0.1

Sigma (position)

4.4. A 2D Example

0.08

0.08 0.06

0.06

0.1

Sigma (color)

Sigma (color)

0.12

Table 1. The MDR and FDR of “Patient 1” with SSGPI. Unlabeled Data

200

500

1000

2000

5000

8000

Handmin MDR

0.187

0.167

0.177

0.161

0.182

0.202

Handmid MDR

0.254

0.238

0.246

0.231

0.253

0.258

Handmax MDR

0.317

0.306

0.298

0.297

0.322

0.327

Handmin FDR

0.0296

0.0074

0.0112

0.0120

0.0005

0.0019

Handmid FDR

0.0239

0.0051

0.0079

0.0085

0.0001

0.0009

Handmax FDR

0.0143

0.0013

0.0037

0.0015

0

0

Parameters: Although the parameters could be find vai ML (Maximizing the Likelihood) estimation, we will show that it is not suitable for the tumorous tissue segmentation problem. Fig.4 shows how the MDR and FDR vary with the two parameters σc and σp . Increasing σc will cause the MDR raising and the FDR descending, and Increasing σp

0.12 0.1

0.08

0.14 0.16

(a) MDR

Sigma (position)

0.06

(b) FDR

Figure 4. The MDR and FDR change with different parameters under the Handmid golden standard.

Running Time: We test the nystr¨om method that speeds up the training phase. The result is shown in Fig.5. Both the training and prediction time are linear with respect to the training data size N . The labeled number is 48 positive and 123 negative. For adding 2000 unlabeled data, the training time is 1.13 seconds and prediction time is 39.99 seconds of MATLAB codes computed on Pentium IV 2.4GHz CPU. Patient 1Training Time

3

Patient 1 Prediction Time

160 140

2.5

Time (seconds)

120

Time(seconds)

First, the algorithm is applied to the 2D image of slice 60 of “Patient 1”. All the experiments are evaluated by 10 times’ average. The labeled object and background points is fixed, and the unlabeled points are randomly selected from slice 60. Fig.3 (a) shows the interactive human hints for the segmentation. In addition, to remove the speckles of our segmentation result, morphological open and close operators are included as the post-processing technique. These operators will remove the isolated regions and merge the connected regions. Unlabeled Data: We test the algorithm with 200, 500, 1000, 2000, 5000 and 8000 additive unlabeled data. TABLE (1) shows that the average MDR and FDR results with respect to the Handmin, Handmid and Handmax golden standards. The MDR being compared with Handmax is larger than the ones compared with Handmin and Handmid case, and the FDR is smaller then them. This is because the region of Handmax is larger then the other two’s. The MDR is is getting a minimum at the number of 2000, and the FDR is getting a minimum at the number of 5000. This is because we use the same parameters σc = 0.1 and σp = 0.1. When we add more unlabeled data, the σp will not remain as a good discriminative value.

0.14

X: 0.15 Y: 0.07 Z: 0.01168

0.08

0.1

0.12

X: 0.15 Y: 0.15 Z: 0.01592

X: 0.1 Y: 0.1 Z: 0.02724

0.2

2

1.5

100 80 60 40

1 20 0.5

0

1000

2000

3000

4000

5000

# Unlabelled Data

6000

7000

8000

(a) “Patient 1” Training Time.

0

0

1000

2000

3000

4000

5000

# Unlabelled Data

6000

7000

8000

(b) “Patient 1” Prediction Time.

Figure 5. “Patient 1” training time and prediction time.

Comparison: We firstly compare the segmentation results with the methods: SVM, spectral clustering, graph cut and active contour. Then, we show some numerical comparison with the MDR and FDR criterions. First, we test the GVF (Gradient Vector Flow) Snake algorithm [17]. The results after 80 iterations is shown in Fig.3 (g). We adopt a different kind of human hints for the GVF Snake algorithm, and the result takes many false detected voxels. This is because the tumorous tissue is diffused to the normal tissues. The active contour based method will have problems when the boundary of the object is not clear. Moreover, the result of graph cut based Lazy snapping method [8] is shown in Fig.3 (h). The results of SVM using only the interactive hints as the training data is shown in Fig.3 (i). Using only the same labeled information is unfair for Lazy snapping and SVM, since our method is added with unlabeled data. If there are more human hints and the post-processing modifications, the Lazy

Table 2. Comparison With Other Methods. See Context In Section 4 For The Contractions’ Meaning. “Patient 1”

Slice 60

“Patient 2”

Slice 52

“Patient 3”

Slice 81

MDR

FDR

MDR

FDR

MDR

FDR

SVM

0.2744

0.0084

0.2433

0.0492

0.2333

0.0460

SC (Nystr¨om)

0.3383

0.1192

0.2394

0.1432

0.1914

4.9087

GVF Snake

0.2898

0.0361

0.0981

0.2087

0.5373

0.0294

SSGPI

0.2400

0.0023

0.1836

0.0150

0.2594

0.0047

snapping will give better results. However, this tells us that our method could do better than they could with fewer hints. Finally, the result of spectral clustering using nystro¨ m method [4] is shown in Fig.3 (j), and the unpost-processed and post-processed SSGPI (Semi-Supervised Gaussian Process Induction) methods are shown in Fig.3 (k) and (l). The unsupervised method spectral clustering algorithm is tend to classify normal tissues as pathological, which will make the FDR very high. Our approach is closely related to the spectral analysis, and overcomes its problems by using some human hints in a semi-automated method. The result is more robust and correct. For numerical comparison, we pick up the following methods: 1) Supervised method SVM [16]. Different from the result in Fig.3 (i), SVM uses one or more typical slices of MR images to segment the other slices. 2) Unsupervised method spcectral clustering (SC) which uses the nystro¨ m method [4]. 3) The interactive method GVF snake algorithm [17] which uses the same interactive hints as Fig.3 (g). 4) SSGPI which is proposed in this paper. The results are shown in TABLE (2). The parameters of spectral clustering in three patients are σc1 = σp1 = 0.15, σc2 = 0.1 σp2 = 0.15 and σc3 = σp3 = 0.1. The parameters of SSGPI in three patients are σc1 = σp1 = 0.1, σc2 = σp2 = 0.08 and σc3 = σp3 = 0.06. All of them are compared with the golden standard Handmid. Since the spcectral clustering algorithm and ours use sampling methods, we test these two for 10 times. Thus, the values are the average MDR and FDR indexes of the post-processed results. We can see that, the FDR of our algorithm is the best of all. This means the segmentation results of SSGPI is mainly in the region of the hand-guided results. The MDR of our algorithm is also competitive with other methods. For “Patient 2” and “Patient 3”, GVF Snake and the spectral clustering present the best results respectively. However, the FDR of them are very high. The reason is that, for “Patient 2” and “Patient 3”, the boundaries of the tumorous tissues are more blurry, (shown in Fig.6). Especially for the spectral clustering algorithm tested on “Patient 3”, the region of the tumorous tissues is very small. We find that the spectral clustering algorithm tries to segment the normal tissues, such as: gray matter, white matter and the cerebrospinal fluid, rather than the pathological ones.

4.5. 3D Segmentation Results: We also test our proposed method on the 3D segmentation problem. Fig.6 shows the segmentation results of three patients. The first column of the three sub-figures is the original T1 weighted image, and the labeled hints are plotted on the top left. The middle column is the hand-guided ground truth Handmid. The parameters used in three patients are also σc1 = σp1 = 0.1, σc2 = σp2 = 0.08 and σc3 = σp3 = 0.06. To segment the 3D data, we use an extra weight multiplied to the z axes. The weights of three patients are 0.42, 0.1 and 0.1. We present slice 60 of “Patient 1”, slice 52 of “Patient 2” and slice 81 of “Patient 3” for giving the interactive hints of object and background, and use slice 50 of “Patient 1”, slice 44 of “Patient 2” and slice 93 of “Patient 3” for testing the 3D segmentation results. The unlabeled data are randomly selected between 3D images. The results are shown in the end column of Fig.6. Therefore, our inductive way can directly handle the 3D image data, while traditional spectral graph based methods (which are unsupervised or transductive) will solve this by segmenting slice by slice. The advantage of our approach is that we only need to label one slice of image to gain the whole 3D image segmentation.

5. Conclusion and Discussion In this paper, we interpret the Bayesian framework of induction problem as a non-parametric discriminative model. It can induce the unseen data out of the training set. This makes the 3D semi-automated segmentation possible. The training time of our algorithm is fast. The prediction time is also accectable. However, it is not suitable for an interactive way. Hyper-parameters estimation maybe helpful. However, the labeled size of a semi-supervised problem is always small. Therefore, the estimation based on the evidences of examples is not robust. Furthermore, the MDR and FDR will go to the inverse direction when the parameters are changing. It is not easy to balance these two rates. Using the result of average on the parameters’ space by MCMC (Markov Chain Monte Carlo) method may be a better choice. However, it is very time consuming. Compared with the others’ results, our algorithm is competitive with or better than the other results. It shows con-

(a) Patient 1, slices 60, 50.

(b) Patient 2, slices 52, 44.

(c) Patient 3, slices 81, 93.

Figure 6. 3D Segmentation Results. The left columns of each sub-figure are the original T1 weighted images. The middle columns are the hand-guided segmentation results (Handmid). The right columns are the results of SSGPI.

servative in judging whether a voxel is belonged to a pathological tissue. The false detection rate is low while the missing detection rate is acceptable. One advantage is that it does not attempt to demarcate a large size of useless region, which will confuse the user in the interaction. Our approach only requires the user to label the most certain region in one of the slices. In the feature, we plan to do some fast sparse algorithm and a C++ implementation, which could make our algorithm used in an interactive way.

References [1] M. Belkin, P. Niyogi and V. Sindhwani: On Manifold Regularization. AI & Statistics, pp. 17-24, 2005 [2] Y. Boykov and M. Jolly: Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D images. ICCV, Vol. I, pp. 105-112, 2001. [3] F. Chung: Spectral Graph Theory. No. 92 in Tegional Conference Series in Mathematics. American Mathematical Society, 1997. [4] C. Fowlkes, S. Belongie, F. Chung and J. Malik: Spectral Grouping Using The Nystr¨om Method. IEEE Trans. on PAMI, Vol. 26(2), pp. 214-225, 2004. [5] S. Geman and D. Geman: Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images. IEEE Trans. on PAMI, Vol. 6(6), pp. 721-742, 1984 [6] L. Grady and G. Funka-Lea: Multi-Label Image Segmentation for Medical Applications Based on Graph-Theoretic Electrical Potentials. Workshop on CVAMIA and MMBIA of ECCV, pp. 230-245, 2004. [7] N. D. Lawrence and M. I. Jordan: Semi-Supervised Learning Via Gaussian Processes. NIPS, Vol. 17, pp. 753-760, 2005. [8] Y. Li, J. Sun, C. Tang and H. Shum: Lazy Snapping. SIGGRAPH, pp. 303-308, 2004. [9] R. Malladi, J.A. Sethian and B. Vemuri: Shape Modeling With Front Propagation: A Level Set Approach. IEEE Trans. on PAMI, Vol. 17(2), pp. 158-175, 1995. [10] T. McInerney and D. Terzopoulos: Deformable Models in Medical Image Analysis: A Survey. Medical Image Analysis, Vol. 1(2), pp. 91–108, 1996.

[11] D. L. Pham, C. Xu and J. L. Prince: Current methods in medical image segmentation. Annual Review of Biomedical Engineering, Vol. 2, pp. 315-337, 2000. [12] C. Rother, V. Kolmogorov and A. Blake: “Grab Cut” - Interactive Foreground Extraction using Iterated Graph Cuts. SIGGRAPH, Vol. 23(3), pp. 309-314, 2004. [13] J. K. Udupa and S.Samarasekera: Fuzzy Connectedness And Object Definition: Theory, Algorithm And Applications In Image Segmentation. Graphical Models and Image Processing, Vol. 58(3), pp. 246-261, 1996. [14] C. Williams and D. Barber: Bayesian Classification With Gaussian Processes. IEEE Trans. on PAMI, Vol. 20(12), pp. 1342-1351, 1998. [15] C. Williams and M. Seeger: Using the Nystro¨ m Method to Speed Up Kernel Machines. NIPS, Vol. 13, pp. 682-688, 2001. [16] Q. WU, W. Dou, Y. Chen and J. Constans: Fuzzy Segementaion of Cerebral Tumorous Tissues in MR Images via Support Vector Machine and Fuzzy Clustering. IFSA. [17] C. Xu and J. L. Prince: Snakes, Shapes and Gradient Vector Flow. IEEE Trans. on Image Proc., Vol. 7(3), pp. 359–369, 1998. [18] D. Zhou, O. Bousquet, T.N. Lal, J. Weston and B. Sch o¨ lkopf: Learning with Local and Global Consistency. NIPS, Vol. 16, pp. 321-328, 2003. [19] S.C. Zhu: Statistical Modeling and Conceptualization of Visual Patterns. IEEE Trans. on PAMI, Vol. 25(6), pp. 691-712, 2003. [20] X. Zhu, Z. Ghahramani and J. LaKerty: Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. ICML, Vol. 20, pp. 912-919, 2003. [21] X. Zhu and Z. Ghahramani (2003): Semi-Supervised Learning: From Gaussian Fields to Gaussian Processes. Technical Report. [22] X. Zhu (2005): Semi-Supervised Learning Literature Survey. Technical Report.

A Discriminative Method For Semi-Automated ...

In the feature, we plan to do some fast sparse algorithm and a C++ implementation, which could make our algo- rithm used in an interactive way. References. [1] M. Belkin, P. Niyogi and V. Sindhwani: On Manifold Regu- larization. AI & Statistics, pp. 17-24, 2005. [2] Y. Boykov and M. Jolly: Interactive Graph Cuts for Optimal.

493KB Sizes 1 Downloads 201 Views

Recommend Documents

A Discriminative Method For Semi-Automated ...
ti. ). For the algorithms that do not need to predict for the unobserved data, there are mainly two ways to model this probability: a) : P(XN. ,tN. ) = P(XN. |tN. )P(tN. ).

A Discriminative Method For Semi-Automated Tumorous ... - CiteSeerX
Unsupervised methods are difficult to produce good result fully au- .... Tumorous Tissues in MR Images via Support Vector Machine and Fuzzy Clustering. IFSA.

A novel discriminative score calibration method for ...
For training, we use single word samples form the transcriptions. For evaluation, each in- put feature sequence is the span of the keyword detection in the utterance, and the label sequence is the corresponding keyword char sequence. The CTC loss of

A Discriminative Method For Semi-Automated Tumorous ... - CiteSeerX
A Discriminative Method For Semi-Automated. Tumorous Tissues Segmentation of MR Brain Images. Yangqiu Song, Changshui Zhang, Jianguo Lee and Fei ...

A Discriminative Learning Approach for Orientation ... - Semantic Scholar
... Research Group. Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany ... 180 and 270 degrees because usually the document scan- ning process results in ... best fit of the model gives the estimate for orientation and skew.

A Discriminative Latent Variable Model for ... - Research at Google
attacks (Guha et al., 2003), detecting email spam (Haider ..... as each item i arrives, sequentially add it to a previously ...... /tests/ace/ace04/index.html. Pletscher ...

A Discriminative Learning Approach for Orientation ... - Semantic Scholar
180 and 270 degrees because usually the document scan- ning process results in .... features, layout and font or text-printing technology. In Urdu publishing ...

A COMPARATIVE STUDY OF DISCRIMINATIVE ...
Center for Signal and Image Processing, Georgia Institute of Technology. 75 Fifth ... we call cross-layer acoustic modeling in that the model discrimina- tion is often at ..... lated cross-layer error cost embedded on the WSJ0 LVCSR database.

A Generative-Discriminative Framework using Ensemble ... - Microsoft
sis on the words occurring in the middle of the users' pass-phrase in comparison to the start and end. It is also interesting to note that some of the un-normalized ...

DISCRIMINATIVE TEMPLATE EXTRACTION FOR DIRECT ... - Microsoft
Dept. of Electrical and Computer Eng. ... sulting templates match closely to in-class examples and distantly ... Dynamic programming is then used to find the optimal seg- ... and words, and thus to extract templates that have the best discrim- ...

Discriminative Reordering Models for Statistical ...
on a word-aligned corpus and second we will show improved translation quality compared to the base- line system. Finally, we will conclude in Section 6. 2 Related Work. As already mentioned in Section 1, many current phrase-based statistical machine

Discriminative pronunciation modeling for ... - Research at Google
clinicians and educators employ it for automated assessment .... We call this new phone sequence ..... Arlington, VA: Center for Applied Linguistics, 1969.

A Generative-Discriminative Framework using Ensemble ... - Microsoft
e.g. in text dependent speaker verification scenarios, while design- ing such test .... ts,i:te,i |Λ) (3) where the weights λ = {ai, bi}, 1 ≤ i ≤ n are learnt to optimize ..... [2] C.-H. Lee, “A Tutorial on Speaker and Speech Verification”,

Compacting Discriminative Feature Space Transforms for Embedded ...
tional 8% relative reduction in required memory with no loss in recognition accuracy. Index Terms: Discriminative training, Quantization, Viterbi. 1. Introduction.

Bonus play method for a gambling device
Mar 14, 2006 - See application ?le for complete search history. (Us). (56) ... Play and/0r apply ..... As shoWn, there are numerous variations on the theme that.

Hybrid Generative/Discriminative Learning for Automatic Image ...
1 Introduction. As the exponential growth of internet photographs (e.g. ..... Figure 2: Image annotation performance and tag-scalability comparison. (Left) Top-k ...

Discriminative Models for Information Retrieval - Semantic Scholar
Department of Computer Science. University ... Pattern classification, machine learning, discriminative models, max- imum entropy, support vector machines. 1.

Continuous Space Discriminative Language Modeling - Center for ...
When computing g(W), we have to go through all n- grams for each W in ... Our experiments are done on the English conversational tele- phone speech (CTS) ...

Discriminative Models for Semi-Supervised ... - Semantic Scholar
Discriminative Models for Semi-Supervised Natural Language Learning. Sajib Dasgupta .... text classification using support vector machines. In. Proceedings of ...

A DISCRIMINATIVE DOMAIN ADAPTATION MODEL ...
domain data, our proposed model aims at transferring source domain labeled .... Suppose that there exist n instances in a d-dimensional space, we have the ...

A GENERATIVE-DISCRIMINATIVE FRAMEWORK ...
cation or mis-verification functions [11, 12] of these discriminative measures. Although such ..... pendent Speaker Verification - Field Trail Evaluation and Simu-.

Discriminative Models for Semi-Supervised ... - Semantic Scholar
and structured learning tasks in NLP that are traditionally ... supervised learners for other NLP tasks. ... text classification using support vector machines. In.

Continuous Space Discriminative Language Modeling - Center for ...
quires in each iteration identifying the best hypothesisˆW ac- ... cation task in which the classes are word sequences. The fea- .... For training CDLMs, online gradient descent is used. ... n-gram language modeling,” Computer Speech and Lan-.

DISCRIMINATIVE TEMPLATE EXTRACTION FOR DIRECT ... - Microsoft
Dept. of Electrical and Computer Eng. La Jolla, CA 92093, USA ... sulting templates match closely to in-class examples and distantly to out-of-class .... between frames and words, and thus to extract templates that have the best discrim- ...