Interactive color image segmentation with linear ...

Viewer
Transcript

Machine Vision and Applications DOI 10.1007/s00138-008-0171-x

ORIGINAL ARTICLE

Interactive color image segmentation with linear programming Hongdong Li · Chunhua Shen

Received: 18 January 2008 / Accepted: 15 September 2008 © Springer-Verlag 2008

Abstract Image segmentation is an important and fundamental task for image and vision understanding. This paper describes a linear programming (LP) approach for segmenting a color image into multiple regions. Compared with the recently proposed semi-definite programming (SDP)-based approach, our approach has a simpler mathematical formulation, and a far lower computational complexity. In particular, to segment an image of M × N pixels into k classes, our method requires only O((M N k)m ) complexity—a sharp contrast to the complexity of O((M N k)2n ) if the SDP method is adopted, where m and n are the polynomial complexity of the corresponding LP solver and SDP solver, respectively (in general we have m n). Such a significant reduction in computation readily enables our algorithm to process color images of reasonable sizes. For example, while the existing SDP relaxation algorithm is only able to segment a toy-size image of, e.g., 10 × 10 to 30 × 30 pixels in hours time, our algorithm can process larger color image of, say, 100 × 100 to 500 × 500 image in much shorter time. Keywords Interactive image segmentation · Linear programming · Object cutout H. Li (B) · C. Shen Australian National University, Canberra, ACT 0200, Australia e-mail: [email protected] C. Shen e-mail: [email protected] H. Li · C. Shen NICTA (National ICT Australia), Canberra Research Lab, Canberra, ACT 2601, Australia

1 Introduction The task of segmenting a color image into multiple meaningful regions is of central importance to image processing and computer vision. It has many practical applications in realworld problems. For example, in visual tracking or video surveillance the input images are generally required to be segmented into two regions: foreground object and background. This is basically a binary segmentation problem. Many highlevel image understanding tasks (e.g., object recognition) often rely on the partitioning of image into meaningful and color-homogeneous regions. There has been a large body of work published on the subject of image segmentation. Various algorithms have been proposed, for example, the normalized cut [2], mean shift [1], graph cut [4,7], belief propagation [8], and convex optimization [9–11] being some of the most popular methods of choice. Different algorithms have different motivations and mathematical origins. Our work to be presented in this paper belongs to the convex optimization family. In particular, the algorithm we are going to describe is largely inspired by the recently proposed semi-definite programming (SDP) algorithm, appeared in a series of papers such as [10,11]. Our principle motivation is to improve significantly the computational efficiency of the SDP algorithm. To this end, we propose a much simpler convex formulation, which is in fact a linear programming (LP) formulation, to the color image segmentation problem. Besides works faster, such an LP approach has also the benefit of higher flexibility in that prior knowledge (e.g., user-interactions), if it is linearly representable, can be easily incorporated into the computation. In this paper we will demonstrate this by experiments on interactive image segmentation.

123

H. Li, C. Shen

1.1 Previous work on SDP-based image segmentation Modern convex programming (e.g., SDP or LP) is a powerful tool for solving various optimization problems in sciences and engineering. It has also attracted many attention of the computer vision community. Paper [11] shows that the problem of multi-class image labeling (i.e., segmentation or partition), after certain approximation/relaxation, can be formulated as a convex SDP problem. By solving the resulting SDP, one can easily obtain an approximate globally optimal segmentation. Experiments on both 1D signals and 2D images have received convincing success. However, the SDP-based approach suffers from a serious practical issue. That is, compared with modern LP solvers, general off-the-shelf SDP solvers usually have a much higher computational complexity. Such a computational complexity issue manifests itself in two equally important aspects. The first is the polynomial degree of the complexity, and the second is the largest problem size of being numerically solvable. Firstly, while in theory both LP and SDP have polynomial complexities, to solve two problems with roughly the same size, an LP solver is typically much faster than an SDP solver. Formally, let us assume that N is the problem size (e.g., the number of unknown variables) of both the problems. Suppose that the polynomial degree of the complexities for an LP solver and an SDP solver are m and n respectively, then the LP’s complexity and the SDP’s complexity are O(N m ) and O(N n ). In general, we have N m N n . Secondly, LP solvers are much more mature than SDP solvers. Even a state-of-the-art SDP solvers (e.g., SeDuMi, CSDP) are only able to solve a relatively small problem (with a few thousands variables and constraints). This has largely hampered the wider application of the SDP algorithms. In contrast, modern LP solvers (e.g., CPLEX, MOSEK) are able to solve problems with millions of variables. In the context of image segmentation, there is a third issue that makes the high complexity more problematic. For segmenting an image of M × N pixels into k classes, the SDP formulation in [11] results in an SDP problem of size (M N k)2 , i.e., having quadratic complexity. For instance, if one wants to segment a very small image of 10 × 10 pixels into k = 3 classes, the SDP formulation in [11] generates 90,000 variables. To solve such an SDP problem has already been beyond the capability of current best general SDP solvers like CSDP or Sedumi. Some remedies have been proposed to salvage the SDPbased approach. One popular solution is to simply sub-sample the image, i.e., only process 1% of all pixels, see [10]. Another solution is to pre- and over-segment the image using some other algorithms to get a much-reduced number of socalled super-pixels, and then apply the SDP to the set of super-pixels. Both approaches result in a much smaller SDP

123

problem. However, neither of them has truly overcome the computational complexity problem—what they do is simply reducing the problem size. By doing so the effective image resolution is also lost. Even with the above remedies, the reported computational result is far from satisfactory. By using a special SDP solver (e.g., PENNON SDP) with the sub-sampling pre-processing technique, [11] reported that it has managed to segment a 32× 32 image into three classes in about 4 h Clearly, processing such a toy-size problem in such a long time is not practical for most real applications. 1.2 Other related work on color image segmentation Colors play a significant role in human visual perception. This paper uses color information exclusively as the cue for image segmentation. Extend it to other features (e.g., texture) should be easy. There are several classical algorithms that are popularly adopted for color image segmentation. One simple idea is to employ a simple k-means clustering on all the pixel values in a proper color space (e.g., RGB space or CIELab space, etc.). An improved version of the clustering idea is through Gaussian Mixture Model estimation. The EM algorithm is commonly used for finding the unknown parameters of a GMM. The Normalized-Cut (N-cut) method is an important image segmentation algorithm [2]. It performs well, has sound theoretic foundations and a simple implementation, hence has received much attention. Yet another commonly adopted algorithm is the MeanShift method [1]. Mean-shift proves to be very efficient in detecting multiple modes existing in a color feature space, each mode corresponding to a cluster of color pixels. Both the N-Cut and Mean-Shift algorithms have common drawbacks. They all ignore the local coherency among neighboring pixels, which is believed to be crucial in image analysis. These algorithms belong to the so-called global approach, which means that they operate directly on a bag of orderless color feature vectors. None of them takes into account the local consistency issue, hence they often yield erroneous (e.g., over-segmentation) results. To exploit such local coherence information, sophisticated algorithms using delicate graph structures have been used for image segmentation. For example, Graphcut(G-Cut) and BeliefPropagation (BP) based on Markov Random Field (MRF) image model have been applied to the problem of image segmentation. Very successful results have been obtained [4,7,8]. These two algorithms (Graph-cut and Belief-Propagation) represent the state of the art methods for image segmentation. However, both algorithms involve complicated optimization procedures, which are often non-convex. For example, in Graph-cut, ad hoc local swap operations are used; while in

Interactive color image segmentation with linear programming

the belief propagation algorithm, iterative local message passing is necessary. The Graph-cut algorithm is able to converge to the true optimum when the energy function is sub-modular [4]. In addition, graph-cut works remarkably fast; to process an 512 × 512 it only needs about one second on a moderate PC. Compared with the graph cut, our LP algorithm is slower. However, we gain the advantage of having more flexibility. While offering an (approximated) global optimality we do not require the objective function to be sub-modular. Moreover, additional linear constraints may be easily incorporated. Our method requires the user to pre specify some scribbles (or seeds points) as prior knowledge. In this sense, our method is an instance of semi-supervised learning or transductive learning technique in machine learning field. In fact, this learning technique has also been applied to the interactive image segmentation problem [15,16]. However, the computational devices used by the transduction techniques are different form our LP. Image segmentation is also closely related to natural image matting [13]. Compared with segmentation, the matting is a more general image labeling task where the pixel labels to be determined can take any real values between 0 and 1 (in stead of some discrete class labels). Again, to computational device used by matting, e.g., the spectral (eigen-) technique in [13] is different from ours. 1.3 Our contribution In view of the aforementioned aspects that hampered the practical application of the SDP algorithm, in this paper we want to develop a globally optimal full-resolution all-pixelwise color image segmentation algorithm. We propose a much simpler LP-based algorithm where the image segmentation task is naturally formulated as linear optimization under linear constraints. The solution can be found easily by an off-the-shelf LP solver. Linear programming is a mature mathematical technique (more mature than SDP) and widely adopted by researchers from both academia and industry. The formulation of our LP algorithm is much simpler than the SDP counterpart [9,11]. Solving an LP problem is usually much faster than solving an SDP problem of the same size. With our formulation, the resulting problem size is only linear in the image size. That is, to segment an image of size M × N into k classes we only need to solve an LP problem of size (M N k)—this forms a sharp contrast to the quadratic complexity of the SDP formulation. Below we show a numerical example, in order to provide an impression of the resultant linear complexity. Using our algorithm, to segment a 200 × 150 size image into three classes only requires solving an LP problem of 90,000 variables, which is a trivial task for any modern LP solver. Currently available industrystrength LP solvers such as CPLEX is able to solve a linear

system with a million variables and constraints in realistic time. In addition to all these above benefits, we also provide an LP algorithm specifically for solving the 2-class segmentation problem (i.e., k = 2, foreground/background separation or object cutout). By substituting the sum-to-one condition in advance, we further reduce the problem size to M N , instead of 2M N . Detail is given below. This has even significant practical indication—two-class segmentation is a common task in computer vision. In order to segment a 512 × 512 size color image, we only need to solve an LP of 260,000 variables. As a matter of example, in this paper we only illustrate experimental results for the two-class segmentation situation. Our LP formulation is natural and simple. It allows the user to easily incorporate different prior knowledge about the image (and the segmentation) into the optimization procedure. For example, both the local pair-wise pixel Markov Random fields (MRF) relationship and global histogram constraint can be easily embedded.

2 Mathematical formulation: k-class segmentation In this section, we derive our LP formulation for k-class image segmentation problem. The goal is to segment an input RGB color image (denoted by I) with M × N pixels into k classes. Color image segmentation is usually based on certain optimization criterion, according to the color measurement, and the spatial continuity of neighboring pixels’ labels. In the work, these two properties are captured by a two-term MRF energy which serves as the objective-function to be optimized (minimized). Ci (Ii ) + Vi j (Ii , I j ), (1) E(I) = i

i, j

where the first term, known as the data-term, sums over all pixels, and the second term, known as the neighborhood term, sums over all pair-wise neighborhoods. Ii is a feature vector of pixel i to be explained below. This type of energy function is popularly adopted by many vision algorithms such as in the graph cut algorithm [3,7]. The first term is known as a data-term, capturing the color consistency, and the second term is a separation-term, describing the MRF neighborhood dependency. We convert the input RGB color image into the CIE-Lab space. The reason for such a conversion is that: distance in the Lab space is more close to human’s color perception. Denote the color value (i.e., [L , a, b]) at pixel i as Ii . We then compute for each pixel an associate feature vector. This feature vector is used to characterize local properties of the

123

H. Li, C. Shen

pixel, e.g., intensity, chromaticity, and texture, etc. These properties can be computed by applying some low-level image filters, such as a Gabor texton filter, etc. In this paper, for simplicity we directly use the color values of the central pixel under consideration and of its 4-neighboring pixels. Thus it forms a 15-dimension feature vector. We use Ii to denote the feature vector. We introduce a k-dimensional {0, 1} binary indicating vector xi to express the class label of pixel i, namely, xi = [xi1 , xi2 , . . . , xik ], where xia are {0, 1} binary variables with xia = 1 if pixel i belongs to class a, and xia = 0 otherwise. Clearly, we must have a xia = 1, i.e., the sum-to-one constraint. We stack all xi together to form a tall vector X . With these notation, the energy function is expressed as: E(I) =

MN k

Ci (a)xia +

k k

Vi j (a, b)xia x jb .

(2)

i, j a=1 b=1

i=1 a=1

Under some general conditions the above energy function can be reduced to a (very neat) form of constrained trace minimization: min Trace(C X T + P X D X T ) X

s.t.

Xe = e k

n

xia ∈ {0, 1}, ∀ 0 ≤ i ≤ M N , 1 ≤ a ≤ k,

(3)

Mathematically, this leads to Trace(LY )

min Y

s.t. Y 0, and rank(Y ) = 1.

(6)

The attempt to exactly solve the above optimization problem under the rank-1 constraint proves to be extremely difficult. Therefore, a “relaxation” trick is suggested, which simply abandons the rank-1 constraint (i.e., an example of convex relaxation). After relaxation, the above problem becomes a typical convex SDP. Solving the SDP, one obtains image segmentation. We now examine the complexity of the resulting SDP. As there are (M × N × k) dimensions in vector X , consequently there will be (M × N × k)2 unknown entries in Y that are to be computed. Therefore, the problem size is quadratic, and the overall computational complexity is polynomial in the problem size, which is O((M N k)2n ). This complexity is very high even when the image size is moderately large. For example, with the state-of-the-art SDP solvers it is impossible to segment a 512 × 512 image into k = 2 classes, because that would involve solving an SDP of (512 × 512 × 2)2 = 274 × 109 variables.

3 Our LP formulation: k-class

(4) (5)

where e is a vector with all-one elements, P and D are some (known) coefficient matrices (cf. [11], and the task is to find the best X . The problem is a quadratic, non-convex, integer programming problem. Due to the integer constraints to solve it exactly is very hard. Recent progress in mathematical optimization theory has offered a very powerful approach—convex relaxation—to approximately solve the integer programming problems. This approach is very promising, and forms the foundation of the present work.

To reach a purely linear formulation, our key observation is that: under mild condition we can replace the second term in Eq. 2, which is now quadratic in X , with a linear term. As such, we obtain the following energy function: E(I ) =

MN k i=1 a=1

Ci (a)xia +

k

wi j |xia − x ja |,

(7)

i, j a=1

2.1 A brief review of the SDP approach As we mentioned before, this paper is largely inspired by recent research on SDP-based image segmentation, for example [11]. In this SDP-based method, the product of X X T of Eq. 3 is replaced by a single positive semi-definite (PSD) matrix Y . Then the trace optimization problem becomes a linear optimization in the augmented unknown matrix Y , subjecting to the PSD constraint.

123

Fig. 1 Interactively build the GMM models for foreground and background from some user specified strokes

Interactive color image segmentation with linear programming

MN k i=1 a=1

s.t. ∀i

(Ii − µa )xia +

k

wi j |xia − x ja |,

(8)

i, j a=1

xia = 1.0,

(9)

Denote the kth Gauss mode (components) of the object region f by f k with mean µk and covariance f,k . In experiments we fix the number of Gaussian modes to 5, i.e., k = 1, . . . , 5. Similarly, the kth background GMM modes is denoted bk with (µbk , b,k ). These two GMM models can be estimated by, for example, an EM algorithm, from two sets of userspecified scribbles in the foreground region and background region, respectively. We use scribble in our experiments, hence our algorithm is a semi-supervised color image segmentation algorithm.

Convergence curve (objective functions vs. iteration) 3.5 Primal obj function Dual obj functiton 3

Objective functions

where w(i, j) is a coefficient depending on the colors of pixel i and j. Possible choices of w include, e.g., wi j = I −I 2 exp − i2σ 2j or Ii − I j −1 . In essence, this formula is obtained by replacing the quadratic L-2 norm by a linear L-1 norm. Note that our approach is different from the linearization approach used by the SDPrelaxation where an unknown matrix Y = X X T is introduced. Such an L-1 approximation is justified by the Pott’s model in deriving MRF model for image, and is adopted by the graph cut algorithm [7] too. We will show by experimentation that the L-1 approximation does not compromise the final segmentation quality evidently. Using the Markovian condition, we only need to consider the pair-wise interaction among the 4-neighboring pixels. In addition, we assume that we have already k prototype color feature vectors (denoted by µ1 , . . . , µk ) to represent each the pixel-class a ∈ {1, . . . , k}. Now the energy minimization can be written as:

a

xia ∈ {0, 1}.

2.5 2 1.5 1

(10)

This is a typical Linear-Programming with (M N k) integer variables xia . One can efficiently solve it with any available LP solver (e.g., Matlab’s Linprog, MOSEK or CPLEX, etc.) by simply dropping the integrity constraints. After the LP computation, the integer solution can be recovered by applying a simple rounding process.

0.5 0 1

2

3

4

5

6

7

8

9

# Iterations

Fig. 2 Converge curves of the primal and dual objective functions

4 The 2-class LP formulation

Computation time (LP solver) vs. Image size (number of pixels) 140

120

LP computation time (seconds)

The two-class segmentation problem is of particular interest in real applications, for example, to segment an objectof-interest from its background in the video surveillance scenario. In this section we will show that for the special problem setting we can further reduce the size (complexity) of the resulting LP to (M N ), i.e., image size. This enables our algorithm to process images of larger size. For instance, by a modern LP solver our algorithm is capable of solving a one-million-pixel image in realistic time. Below we will briefly describe our two-class LP-based segmentation algorithm. Following from the derivation of our general k-class formulation it is easy to understand how it works.

100

80

60

40

20

0

4.1 Foreground and background modeling For a given image we use the Gaussian Mixture Model (GMM) to describe the object and the background regions.

0

1

2

3

4

5

number of pixels

6

7

8

9 4

x 10

Fig. 3 Computation time (MOSEK’s Linprog) versus number of pixels

123

H. Li, C. Shen

4.2 Compute the δ-distances To obtain the GMM models for foreground and background, one can either use a supervised learning process, or via user interaction. We choose the second approach—hence an interactive image segmentation method. Our method requires the user to simply draw a few strokes on the image to be segmented, indicating the intended foreground and background regions. Figure 1 illustrates an example. Such an interactive way has been adopted by many methods such as the Grab-cut algorithm [7] and some image matting algorithms [8]. Fig. 4 Some of our interactive color image segmentation results (from left to right: input image, user-specified strokes, segmented foreground region, segmented background region)

123

A potential difficulty with using only a few strokes is that the estimated GMM models may not faithfully represent the true color distributions of the whole image regions. To overcome this, in this paper we do not directly use the GMM probability functions. Instead, Mahalanobis distances between a given color and each of the Gauss modes are used. This approach increases the robustness of the segmentation, because it allows for accounting for visual occlusions and illumination changes. We define the foreground distance of a pixel i, denoted d f (Ii ), as the minimum Mahalanobis distance from the pixel

Interactive color image segmentation with linear programming

value to each of the foreground Gauss modes: 1/2 d f (Ii ) = min((Ii − µk )T −1 . f,k (Ii − µk )) f

f

k

(11)

Similarly, we define the pixel’s background distance as: −1 db (Ii ) = min((Ii − µbk )T b,k (Ii − µbk ))1/2 . k

(12)

Combining these two, we define the so-called δ-distance as δ(Ii ) = db (Ii ) − d f (Ii ).

(13)

Note that while both d f (Ii ) and db (Ii ) are non-negative scalars δ(Ii ) may take non-positive value.

For the two-class segmentation problem, the variables to be minimized are binary pixel labels xi , with xi = 1 if it is a foreground pixel and xi = 0 otherwise. Now the problem is a binary 0–1 programming problem: Find the best 0–1 variables xi , i = 1, 2, . . . , M N , such that the energy function is minimized. Solving a general 0–1 Integer Programming problem exactly is a hard problem (in fact, it has proved to be NPhard). Rather than seeking the exact global optimal solution, if we settle for an approximate sub-optimal solution, then there exist many efficient algorithms for finding those approximate optimal solutions. Relaxation is one of the efficient algorithms, and is adopted in the paper. Specifically, we relax the 0–1 constraints into bound constraints 0 ≤ xi ≤ 1. Consequently, this leads to a simple LP with (M N ) variables:

xi

MN i

δi xi + λ

MN

wi j |xi − x j |,

(15)

where σ is a user-specified parameter. Such a weight formula is inspired by the colorization algorithm [5]. It is used to embody the belief that: two neighboring pixels should have consistent labels, unless their difference (in feature/color) is too large. To choose the value of σ we adopt the method of [3], namely, σ = 2|Ii − I j 2 , where · denotes the expectation over all pixels. Now we have established a standard LP problem. With the LP there is no need for a good initialization, and no risk of local minima. As having a similar formulation, our 2-class algorithm can be seen as a concrete realization of the excellent pure theoretic work of [12]. 4.4 Algorithm description

4.3 Form an LP problem

min

In our algorithm the weight wi j is computed by Ii − I j 2 , wi j = exp − 2σ 2

The proposed LP-based image segmentation algorithm (for k = 2 case) is summarized in Algorithm-1.

(14)

i

s.t. ∀i, i = 1, . . . , M N ; ∀ j ∈ N (i), ∀xi ∈ L , xi = bi , 1 ≥ xi ≥ 0. where N (i) is the 4-neighbors of pixel i, λ is a user-specified regularization parameter (we use 0.2 in our experiments), L is the set of labeled pixels (i.e., scribbles), and bi the corresponding labels. The linear cost function (to be minimized) consists of two linear terms. The first term is the date term. The second term is the sum of weighted label-inconsistency. From above we know that the weight in the local term must be positive. Therefore we can effectively minimize the energy by minimizing its upper bounds.

Fig. 5 More segmentation results: input and segmentation

123

H. Li, C. Shen

4.5 Remarks – Unlike many other optimization-based image segmentation methods such as Graph-cut or belief propagation,

Fig. 6 Comparison with normalized-cut: input (left column), the N-cut results (middle column), and our LP segmentation results (right column). The numbers in the brackets are execution time

123

using the proposed LP form we are guaranteed to find the true global optimum (up to an approximation because of the relaxation). There is no need for an initial guess, and no risk of local minima.

Interactive color image segmentation with linear programming 1. Convert the image to CIE-Lab space. Construct feature vector Ii at each pixel. 2. Use user-specified scribbles to estimate the foreground GMM and background GMM. 3. Compute the delta-distance δi at every pixel i, and the weights wi j for each of its 4-neighbor pixels. 4. Establish the LP formulation based on Eq. 14. 5. Solve this LP problem using any LP solver, and round the results to 0–1 variables. 6. Output the rounded labels as the segmentation result.

Algorithm 1: Linear programming image segmentation

– Since polynomial-time algorithms exist and widely adopted for solving LP problems, our algorithm is also computationally efficient. – Due to the LP formulation, we can easily incorporate prior knowledge about the scene into the segmentation process. For example, by letting z i = z j we can specify that two pixels i and j have the same label.

5 Experiments To verify the effectiveness of our new LP algorithm we have conducted image segmentation experiments on real images. These test images are taken from the Berkeley Segmentation Benchmark Dataset [6]. For each input image we specify the foreground region and background region by a few strokes. Our LP-based algorithm is implemented in Matlab. We use MOSEK’s Linprog function as the LP solver. This solver is in fact a Primal-Dual Interior Point methods. We found that our algorithm converges very quickly for images of moderate size. Usually, in about 10–12 iterations the LP converges. Figure 2 gives an example of convergence curves (i.e., Primal and (neg-) dual objective functions vs. iteration).

To test the timing performance we resize the input image into different sizes. For segmenting an image of 100 × 100 pixels, the LP solver costs only 2.9 s on a moderate PC (Intel P4, 2.8 Ghz, 1 G RAM). The computation time for a 30 × 30 image and a 200 × 200 image are 0.7 and 22.0 s, respectively. In contrast to this, it takes the SDP algorithm ([11]) 4 h to process a 30 × 30 image. Running our algorithm repeatedly on images of different sizes we obtain a timing curve as shown in Fig. 3. To segment an image of 300 × 300 size the LP solves takes only about 10 minutes. Image of such a big size is certainly beyond the capability of the best SDP solver available. It is for this reason, we do not provide any comparison with SDP, as the latter simply does not work for big images. To visually evaluate the segmentation performance, we run our algorithm on different images. Figure 4 shows some of our results. More results are shown in Fig. 5. Here the user wanted to extract a zebra or panda from a clutter background. Clearly, the user’s intention of foreground/background has been well captured, and the segmentation results are satisfactory. We also compared our LP-algorithm with the original normalized cut algorithm [2]. Such a comparison seems to be unfair to the N-cut algorithm, because the original version of the N-cut algorithm is not able to incorporate user’s prior knowledge (e.g., the scribbles) into the computation (note there has been later modifications that have taken this into account, for example [14]). To compensate for this, we allow the N-cut algorithm segment the image to multiples regions (we used k = 6 in the experiments below), and evaluate the accuracy of the segmentation boundaries. Figure 6 gives some comparison results. All the test image are normalized to size 160 × 160. The Matlab code for normalized-cut algorithm is adapted from [2] but modified to deal with color images. It is clear that our new algorithm outperforms the N-cut algorithm in terms of both accuracy and computation time.

Fig. 7 Sample frames of video-cutout (http://www. freefoto.com)

123

H. Li, C. Shen

5.1 Video object cutout The proposed still-image object cutout algorithm can be easily extended to video object cutout. The user is only required to specify some scribbles in key frame(s). The segmentation result of the key frame is then propagated into other frames without further human interaction. Some sample resulting frames are given in Fig. 7. 6 Conclusion This paper has described a simple approach for color image segmentation. The problem is formulated naturally as a Linear Program. Such a formulation is easy to understand and easy to implement. The computation is efficient thanks to our novel and simpler LP formulation. At present we used only the off-the-shelf LP solver (based on the interior point method). It would be worthwhile trying other LP solvers to compare their computational performance. Moreover, due to the special structure of the problem, e.g., every pixel is using at most two neighboring constraints, there should other more specific options that will take this into account, and offer more efficient computation for the task. Another promising future research direction is to directly optimize the solver using some fast algorithms for special Linear program problem. In fact, the best performed graphcut algorithm is a special algorithm for solving the max-flow LP problems. In addition, in future work we will consider whether the same LP framework will allow a partial update scheme, similar to the concept of dynamic graph-cut [17]. If so, then our algorithm will be made even faster for the dynamic video cut tasks.

4. Boykov, Y., Jolly, M.: Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. Proc. ICCV-2001 1(July) 105–112 (2001) 5. Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. ACM Transactions on Graphics. SIGGRAPH 2004, pp. 689– 694 (2004) 6. Martin, D. et al.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. Proc. ICCV-2001 2, 416–423 (2001) 7. Rother, C., Kolmogorov, V., Blake, A.: GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. SIGGRAPH 23(3), 309–314 (2004) 8. Wang, J., Cohen, M.: An iterative optimization approach for unified image segmentation and matting. Proc. ICCV 1, 936–943 (2005) 9. Keuchel, J., Schnorr, J. et al.: Binary partitioning, perceptual grouping, and restoration with semidefinite programming. IEEE Trans. PAMI 25, 1364–1379 (2003) 10. Keuchel, J. et al.: Hierarchical Image Segmentation based on Semidefinite Programming Pattern Recognition. In: Proc. of 26th DAGM Symposium, LNCS, August. Springer, Berlin (2004) 11. Keuchel, J.: Multiclass Image Labeling with Semidefinite Programming. In: Proc. ECCV’06, Graz, pp. 454–467 (2006) 12. Kleinberg, J., Tardos, E.: Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov random fields. J. ACM 49, 616–639 (2002) 13. Levin, A., Rav-Acha, A., Lischinski, D.: Spectral Matting. IEEE Trans. PAMI (2008) 14. Yu, S.X., Shi, J.: Segmentation given partial grouping constraints. IEEE Trans. PAMI 26(2), 173–183 (2004) 15. Duchenne, O., Audibert, J., Keriven, R., Ponce, J., Segonne, F.: Segmentation by transduction. In: Proc. CVPR (2008) 16. Cui, J., Yang, Q., Wen, F., Wu, Q., Zhang, C., Van Cool, L., Tang, X.: Transductive Object Cutout. In: Proc. CVPR (2008) 17. Kohli, P., Torr, P.: Dynamic graph cuts for efficient inference in Markov random fields. IEEE Trans. PAMI 29(12), 2079–2088 (2007)

Author biographies

Acknowledgment NICTA is funded through the Australian Government’s Backing Australia’s Ability initiative, in part through the Australian Research Council. We wish to thank the anonymous reviewers for their very helpful comments

Hongdong Li is an academic staff with the Computer Vision Group, Research School of Information Sciences and Engineering (RSISE) of ANU (The Australian National University). He is also a Senior Researcher with NICTA (National ICT Australia), Canberra Labs. His main research interests are computer vision, pattern recognition and image processing.

References

Chunhua Shen received the B.Sc. and M.Sc. degrees from Nanjing University, China, in 1999 and 2002, respectively, and the Ph.D. degree from the School of Computer Science, University of Adelaide, Australia, in 2005. He is currently a Researcher with NICTA, Canberra Labs. He is also an Adjunct Research Fellow at the Australian National University; and Adjunct Lecturer at the University of Adelaide. His main research interests include statistical pattern analysis and its application in computer vision.

1. Georgescu, B., Shimshoni, I. Meer, P.: Mean Shift Based Clustering in High Dimensions: A Texture Classification Example. ICCV (2003) 2. Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI (2000) 3. Blake, A., Rother, C., Brown, M., Perez, P., Torr, P.: Interactive image segmentation using and adaptive gmmrf model. Proc. ECCV-2004 2, 428–441 (2004)

123

Interactive Image Segmentation with Multiple Linear ...