What Is Optimized in Tight Convex Relaxations for Multi ...

Viewer
Transcript

What Is Optimized in Tight Convex Relaxations for Multi-Label Problems? Christopher Zach Microsoft Research Cambridge, UK

Christian H¨ane ETH Z¨urich, Switzerland

Marc Pollefeys ETH Z¨urich, Switzerland

[email protected]

[email protected]

[email protected]

Abstract

mentation model [12, 1]. Using the principle of biconjugation to obtain tight convex envelopes, [5] obtains a convex relaxation of multi-label problems with generic (but metric) transition costs in the continuous setting. Subsequent discretization of this model to finite grids yields to strong results in practice, but it was not fully understood what is optimized in the discrete setting. In this work we close the gap between convex formulations for MRFs and continuous approaches by identifying the latter methods as non-linear (but still convex) extensions of the standard LP relaxation of Markov random fields. This insight has several implications: (a) it becomes clearer why the model proposed in [5] is tighter than other relaxations proposed for similar labeling problems, and (b) a wider range of optimization methods becomes applicable, especially after obtaining equivalent convex programs utilizing redundant constraints. Thus, the results obtained in this work are of theoretical and practical interest.

In this work we present a unified view on Markov random fields and recently proposed continuous tight convex relaxations for multi-label assignment in the image plane. These relaxations are far less biased towards the grid geometry than Markov random fields. It turns out that the continuous methods are non-linear extensions of the local polytope MRF relaxation. In view of this result a better understanding of these tight convex relaxations in the discrete setting is obtained. Further, a wider range of optimization methods is now applicable to find a minimizer of the tight formulation. We propose two methods to improve the efficiency of minimization. One uses a weaker, but more efficient continuously inspired approach as initialization and gradually refines the energy where it is necessary. The other one reformulates the dual energy enabling smooth approximations to be used for efficient optimization. We demonstrate the utility of our proposed minimization schemes in numerical experiments.

2. Background In the following section we summarize the necessary background on discrete and continuous relaxations of multilabel problems.

1. Introduction Assigning labels to image regions e.g. in order to obtain a semantic segmentation, is one of the major tasks in computer vision. The most prominent approach to solve this problem is to formulate label assignment as Markov random field (MRF) incorporating local label preference and neighborhood smoothness. Since in general label assignment is NP-hard, finding the true solution is intractable and approximate ones are determined. One promising approach to solve MRF instances is to relax the intrinsically difficult constraints to convex outer bounds. There are currently two somewhat distinct lines of research utilizing such convex relaxations: the direction, that is mostly used in the machine learning community, is based on a graph representation of image grids and uses variations of dual block-coordinate methods [10, 9, 16, 15] (usually referred as message passing algorithms in the literature). The other set of methods is derived from the analysis of partitioning an image in the continuous setting, i.e. variations of the Mumford-Shah seg-

2.1. Notations In this section we introduce some notation used in the following. For a convex set C we will use ıC to denote the corresponding indicator function. i.e. ıC (x) = 0 for x ∈ C and ∞ otherwise. We use short-hand notations [x]+ and [x]− for max{0, x} and min{0, x}, respectively. Finally, for an extended real-valued function f : Rn → R∪{∞} we denote its convex conjugate by f ∗ (y) = maxx xT y − f (x).

2.2. Label Assignment, the Marginal Polytope and its LP Relaxation In the following we will consider only labeling problems with unary and pairwise interactions between nodes. Let V be a set of V = |V| nodes and E be a set of edges connecting nodes from V. The goal of inference is to assign labels Λ : 1

V → {1, . . . , L} for all nodes s ∈ V minimizing the energy Elabeling (Λ) =

X

X

θsΛ(s) +

s∈V

Λ(s),Λ(t)

θst

,

(1)

(s,t)∈E

X

θsi xis +

s,i

X

ij ij θst xst

(2)

s,t,i,j

P subject to normalization constraints i∈{1,...,L} xis = 1 for each s ∈ V (one label P ij and jmarginalP needs toibe assigned) ization constraints j xij = x and st s i xst = xt . In geni eral, enforcing xs ∈ {0, 1} is NP-hard, hence the corresponding LP-relaxation is considered, ELP-MRF (x) =

X

θsi xis +

s,i

s.t.

X

XX s,t

ij ij θst xst

i,j

i xij st = xs ,

X

(3) i xji st = xt

j

j

xij st

xs ∈ ∆,

≥0

∀s, t, i, j,

where ∆ denotes the unit (probability) simplex, ∆ := {x : P i i i x = 1, x ≥ 0}. There are several corresponding dual programs depending on the utilized (redundant) constraints. If we explicitly add the box constraints xij st ∈ [−1, 1] the corresponding dual is ∗ ELP-MRF (p) =

X s

+

X X min θsi + pist→s + pits→s i

XX s,t

i,j

min

t∈Nt (s)

ij 0, θst

−

|pist→s

t∈Ns (s)

+ pjst→t | ,

where we defined Nt (s) := {t : (s, t) ∈ E} and Ns (t) := {s : (s, t) ∈ E}. The particular choice of (redundant) box constraints xij st ∈ [−1, 1] in the primal program leads to an exact penalizer for the usually obtained capacity constraints. Different choices of primal constraints lead to different duals, we refer to Section 3.3 for further details.

2.3. Continuously Inspired Convex Formulations for Multi-Label Problems In this section we briefly review the convex relaxation approach for multi-label problems proposed in [5]. In contrast to the graph-based label assignment problem in Eq. 3, Chambolle et al. consider labeling tasks directly in the (2D) image plane. Their proposed convex relaxation is formu-

s,i

s,i

s.t.

· where θs· are the unary potentials and θst are the pairwise ones. Usually the label assignment Λ is represented via indicator vectors xs ∈ {0, 1}L for each s ∈ V, and 2 xst ∈ {0, 1}L for each (s, t) ∈ E, leading to

EMRF (x) =

lated as the primal-dual saddle-point energy X X ECCP-I (u, q) = θsi (ui+1 − uis ) + (qsi )T ∇uis s uis

≤

ui+1 s ,

u0s

= 0,

uL+1 s

j−1

X

qsk 2 ≤ θij

= 1, uis ≥ 0

∀s, i, j,

k=i

(4)

which is minimized with respect to u and maximized with respect to q. Here u is a super-level function ideally transitioning from 0 to 1 for the assigned label, i.e. if label i should be assigned at node (pixel) s, we have ui+1 = 1 and s uis = 0. Consequently, u ∈ [0, 1]V L in the discrete setting of a pixel grid. q ∈ R2V L are auxiliary variables. The stencil of ∇ depends on the utilized discretization. We employ forward differences for ∇ as also used in [5] (unless noted otherwise). θij are the transition costs between label i and j and can assumed to be symmetric w.l.o.g., θij = θji and θii = 0. At this point we have a few remarks: 1. The saddle-point formulation in combination with the quadratic number of “capacity” constraints Pj−1 ij k k=i qsk k2 ≤ θst makes it difficult to optimize efficiently. In [5] a nested, two-level iteration scheme is proposed, where the inner iterations are required to enforce the capacity constraints. In [14] Lagrange multipliers for the dual constraints are introduced in order to avoid the nested iterations, leading to a primal-dualprimal scheme. In Section 3.1 we will derive the corresponding purely primal energy enabling a larger set of convex optimization methods to be applied to this problem. 2. The energy Eq. 4 handles triple junctions (i.e. nodes where at least 3 different phases meet) better than the (more efficient) approach proposed in [17]. Again, by working with the primal formulation one can give a clear intuition why this is the case (see Section 3.2). 3. The energy in Eq. 4 can be rewritten in terms of (soft) indicator functions xs per pixel, leading to the equivalent formulation (see the supplementary material or [14]): X X ECCP-II (x, p) = θsi xis + (pis )T ∇xis (5) s,i

s,i

s.t. pis − pjs 2 ≤ θij , xs ∈ ∆

∀s, i, j,

x and p are of the same dimension as u and q.

3. Convex Relaxations for Multi-Label MRFs Revisited In this section we derive the connections between the standard LP relaxation for MRFs, LP-MRF, and the saddle-

point energy ECCP-II , and further analyze the relation between ECCP-II , and a weaker, but more efficient relaxation. We will make heavy use of Fenchel duality, minx f (x) + g(Ax) = maxz −f ∗ (AT z) − g ∗ (−z), where f and g are conex and l.s.c. functions, and A is a linear operator (matrix for finite dimensional problems). We refer e.g. to [3] for a compact exposition of convex analysis.

3.1. A Primal View on the Tight Convex Relaxation It seems that the saddle-point formulation in Eq. 4 and Eq. 5, respectively, were never analyzed from the purely primal viewpoint. Using Fenchel duality one can immediately state the primal form of Eq. 5, which has a more intuitive interpretation (detailed in Section 3.2): X X X Etight (x, y) = θsi xis + θij kysij k2 (6) s,i

s.t. ∇xis =

X

s

j:j
ysji −

i,j:i
X

j:j>i

ysij , xs ∈ ∆

∀s, i,

where ysij ∈ R2 represents the transition gradient between a region with label i and the one with label j. ysij is 0 if there is no transition between i and j at node (pixel) s. The last set of constraints are the equivalent of marginalization constraints linking transition gradients ysij and label gradients ∇xis and ∇xjs . The derivation of Eq. 6 is given in the supplementary material. Since xis ∈ [0, 1] we have that ∇xis ∈ [−1, 1]2 and we can safely add the additional constraints ysij ∈ [−1, 1]2 to obtain an equivalent convex program. We obtain the interpretation that e.g. (ysij )1 = 1 iff there is a horizontal transition from label i to label j, and (ysij )1 = −1 if the reverse is the case (analogously for the vertical direction). Consequently, the ysij variables correspond to signed binary pseudo-marginals, and proper pseudo-marginals can be obtained by setting (component-wise) ij xij s := [ys ]+

ij xji s := −[ys ]− P ii i ij for i < j. xii s is e.g. given by xs = xs − j:j6=i xs . Thus, the primal program equivalent to Eq. 6, but purely stated in terms of non-negative pseudo-marginals, reads as X X X

ji E(x) = θsi xis + θij xij (7) s + xs 2 s,i

s.t.

∇xis

=

X

j:j6=i

and

s

xji s

−

i,j:i
X

j:j6=i

xij s ,

xs ∈ ∆,

xij s

≥ 0.

This is very similar to the standard relaxation of MRFs (recall Eq. 3 after eliminating xii st in the marginalization constraints), the only difference being the smoothness terms, which is

ji ij ji θij xij instead of θij xij s + xs 2 s + θ xs .

ij ji ij ij ji Note that θij xij s + θ xs is equivalent to θ kxs + xs k1 ij ji (the anisotropic L1 norm), since xs , xs ≥ 0. Hence the primal model Eq. 7 can be seen as isotropic extension of the standard model Eq. 3 for regular image grids. Further, we have a complementarity condition for every optiij T ji ij ji mal solution xij s : (xs ) xs = 0, i.e. (xs )1 (xs )1 = 0 and ij ji (xs )2 (xs )2 = 0. It is easy to see that if the complementarity conditions do not hold, the overall objective can be lowered by subtracting the componentwise minimum from xij s and xji s (and therefore satisfying complementarity) without affecting the marginalization constraint. Hence, we can also ji replace θij kxij s + xs k2 in the primal objective by

ij

ij xs θ ji . xs 2

Finally, observe that all primal formulations have a number of unknowns that is quadratic in the number of labels L. This is not surprising since the number of constraints on the dual variables is O(L2 ) per node.

3.2. Truncated Transition Costs If the transition costs θij have no structure, then one has to employ the full representations Eq. 6 or 7. In this section we consider the important case of truncated smoothness costs, i.e. θij = θ∗ if |i − j| ≥ T for some T , and θij < θ∗ if |i − j| < T . The two most important examples in this category are the Potts smoothness model (T = 1), and truncated linear costs with θij = min{|i − j|, θ∗ }. It is tempting to combine the transition gradients corresponding to “large” jumps from label i to label j with |i − j| ≥ T into one vector ysi∗ , where the star ∗ indicates a wild-card symbol, i.e. X X ysi∗ = ysij − ysji . j:j−i≥T

j:i−j≥T

Thus, we can formulate a primal program using at most O(T L) unknowns per pixel, X X X Efast (x, y) = θsi xis + θij kysij k2 s,i

s

i,j:i
θ∗ X X i∗ kys k2 + 2 s i X s.t. ∇xis = ysji − j:i−T
(8) X

j:i
ysij − ysi∗

and xs ∈ ∆. Since a large jump is represented twice via y i∗ and y j∗ , the truncation value appears as θ∗ /2 above. For the truncated linear smoothness cost the number of required unknowns reduces further to O(L): X X θ∗ X i∗ ky k2 Efast (x, y) = θsi xis + kysi,i+1 k2 + 2 s,i s s,i s,i s.t. ∇xis = ysi−1,i − ysi,i+1 − ysi∗ .

(9)

(a) Input image

(b) Forward differences (c) Forward differences

(d) Staggered grid

(e) Staggered grid

(f) Geo-cut

Figure 1. The triple junction inpainting example. (b) and (d) use the weaker relaxation Efast , and (c) and (e) are the results of Etight . The geo-cut solution with a 32-neighborhood is shown in (f).

These models generalize the formulation proposed in [17] beyond the Potts smoothness cost. It is demonstrated in [5] that Eq. 8 is a weaker relaxation than Eq. 5 if three regions with different labels meet (see also Fig. 1). Before we analyze the difference between those models, we state an equivalence result: Observation 1 If we use the 1-norm k·k1 in the smoothness term instead of the Euclidean one (i.e. we consider the standard LP relaxation of MRFs using horizontal and vertical edges), the formulations in Eqs. 6 and 8 are equivalent. More generally, one can collapse the pairwise pseudomarginals for standard MRFs on graphs in the case of truncated pairwise potentials, leading to substantial reductions in memory requirements. We presume this fact has probably been used in the MRF community, but for completeness we provide a proof in the supplementary material. The situation is different in the Euclidean norm setting. In the following we consider the Potts smoothness cost. If we use forward differences for the gradient and compare the smoothness costs assigned by Eq. 8 and Eq. 5 for the discrete label configurations, we find out that for triple junctions the formulation in Eq. 8 underestimates the true cost: if label i is assigned to a pixel s, and labels j and k are assigned to the forward neighbors (see Fig. 2), then we have ysi∗ = (−1, −1)T , ysj∗ = (1, 0)T and ysk∗ = (0, 1)T , and the smoothness contribution of s according to Eq. 8 is √

1

−1 + 1 + 0 = 1 + 2 2 −1 2 0 2 −1 2 2

(see also Fig. 2(a)). On the other hand, the transition gradients according to Eq. 5 are ysij = (−1, 0)T and ysik = (0, −1)T , and its smoothness contribution is

−1

+ 0 =2

0

−1 2 2

(cf. Fig. 2(b)). It seems that Eq. 8 is a weaker model than Eq. 5 due to the different cost contributions, but the deeper reason is, that the former formulation cannot enforce that all adjacent regions have opposing boundary normals. In the model Eq. 8 (Efast ) only interface normals ysi∗ with respect to a particular label are maintained, whereas the tighter formulation Eq. 5 (Etight ) explicitly represents transition gradients ysij for all label combinations (i, j). Another way to

express the difference between the formulations is, that Efast penalizes the length of segmentation boundaries (thereby being agnostic to neighboring labels), and Etight accumulates the length of interfaces between each pair of regions separately (i.e. label transitions have the knowledge of both involved labels, see also Fig. 2(c)). The two models are different (after convexification) when using a Euclidean length measure, but not when using an anisotropic L1 length measure. One might ask how graph cuts with larger neighborhoods (geo-cuts [4]) compare with the continuously inspired approaches Eq. 6 and Eq. 8 for the Potts smoothness model. Since in this case geo-cuts will approximate the interface boundary similar to Eq. 8, similar results are expected (which is experimentally confirmed in Fig. 1(f)). In Fig. 1(d) and (e) we illustrate the (beneficial) impact of using a staggered grid discretization (instead of forward differences) for the gradient ∇.

3.3. The Dual View A standard approach for efficient minimization of MRF energies is to optimize the dual formulation instead of the primal one. Recalling Section 2.2 we observe that the dual energies have a number of unknowns that scales linearly with the number of labels (and nodes), but a quadratic num∗ ber of terms (recall ELP-MRF ). Consequently, block coordinate methods for optimizing the dual are very practical, and those methods are often referred as message passing approaches (e.g. [9, 16, 10, 15]). Thus, we consider in this section dual formulations of the tight convex relaxation Eq. 6 and the more efficient, but weaker one Eq. 8. The dual energy of Etight can be derived (via Fenchel duality) as X ∗ Etight-I (p) = min{div pis + θsi } s.t. kpis − pjs k2 ≤ θij , s

i

(10)

with the divergence div = −∇T consistent with the discretization of the gradient. Note that we have redundant constraints on the primal variables ysij ∈ [−1, 1] × [−1, 1] (since xis ∈ [0, 1]). One could compute the dual of θij kysij k2 + ı{kysij k∞ ≤ 1}, but because √ of its ra2 seems to dial symmetry the constraint kysij k2 ≤

(a)

(b)

(c)

Figure 2. Three regions meet in one grid point. (a) The situation as handled in Efast . (b) How Etight sees this situation. (c) The different counting of region boundaries. Top row: Efast simply sums the lengths of region boundaries. Bottom row: Etight considers interfaces between each pair of regions separately.

∗ be more appropriate. Via x 7→ θ|x| + ı[0,B] (x) (y) = maxx∈[0,B] {xy − θ|x|} = B max{0, |y| − θ} and the radial symmetry of terms in ysij we obtain for the dual energy in this setting X ∗ min{div pis + θsi } Etight-II (p) = i

s

+

X X √ s

i,j:i
2 min 0, θij − kpis − pjs k2 ,

(11)

∗ which has the same overall shape as ELP-MRF in Section 2.2. In contrast to Eq. 10 the dual energy Eq. 11 uses an exact penalizer on the constraints and always provides a finite value, which can be useful in some cases (e.g. to compute the primal-dual gap in order to have a well-established stopping criterion when using iterative optimization first-order methods). We finally state a variant of the dual energy, X X ∗ div pis + θsi − qs − qs + Etight-III (p, q) = s

+

s,i

X X √ s

i,j:i
2 min 0, θij − kpis − pjs k2 ,

(12)

Eq. 12 is much easier to smooth than Eq. 10 (which can be smoothed via a numerically more delicate log-barrier) or Eq. 11 (where the exact minimum can be replaced by a soft-minimum, e.g. using log-sum-exp). We discuss appropriate smoothing of Eq. 12 and corresponding optimization in Section 4. For completeness we also state the dual of the weaker relaxation Eq. 8 in the constrained form: X ∗ Efast (p) = min{div pis + θsi } (13) s

i

s.t. kpis − pjs k2 ≤ θij kpis k

∗

≤ θ /2

∀s, ∀i, j : |i − j| < T

∀s, i.

In the dual the constraints set in Eq. 13 is a superset of the constraints in the tight relaxation Eq. 10, hence we have

∗ ∗ Efast ≤ Etight-I for their respective optimal solutions (recall that the dual energies are maximized with respect to p). In contrast to LP-MRF formulations we have non-linear capacity constraints in the duals presented above. Thus, optimizing these dual energies (in particular Eq. 10) via block coordinate methods is more difficult, and deriving message passing algorithms appears not promising. In the supplementary material we present the detailed derivations of the dual energies stated above and report additional forms of the dual energy.

3.4. First-Order Optimality Conditions In order to ensure optimality of a primal-dual pair and to construct e.g. the primal solution from the dual ones, we state the generalized KKT conditions (see e.g. [3], Ch. 3): if we have the primal energy E(x) = f (x)+g(Ax) for convex f and g, and a linear map A, the dual energy is (subject to a qualification constraint) E ∗ (z) = −f ∗ (AT z) − g ∗ (−z). Further, a primal dual pair (x∗ , y ∗ ) is optimal iff x∗ ∈ ∂f ∗ (AT z ∗ ) and Ax∗ ∈ ∂g ∗ (−z ∗ ). For the tight relaxation Eq. 10 these conditions translate to (x∗ )s ∈ ∂ max{− div(p∗ )is − θsi } and i

(p∗ )is − (p∗ )js ≤ θij . (y ∗ )ij s ∈ ∂ı 2

The first condition means, that − div(p∗ )js − θsj < maxi {− div(p∗ )is − θsi } for a label j implies (x∗ )js = 0 (label j is strictly not assigned in the optimal solution at s). The second condition states, that k(p∗ )is − (p∗ )js k2 < θij implies (y ∗ )ij s = 0 (there is no transition between label i and j at pixel s). If k(p∗ )is − (p∗ )js k2 = θij we have ∗ i ∗ j (y ∗ )ij s ∝ (p )s − (p )s . These generalized complementary slackness constraints can be used to set many values in the primal solution to 0. The second part of the KKT conditions, Ax∗ ∈ ∂g ∗ (−z ∗ ), just implies that the primal solution has to satisfy the normalization and marginalization constraints.

4. Scalable Optimization Methods The primal (Eqs. 6 and 7) and dual (Eqs. 10 and 11) programs of the tight relaxation are non-smooth convex and

concave energies, and therefore any convex optimization method able to handle non-smooth programs is in theory suitable for minimizing these energies. The major complication with the tight convex relaxation is, that it requires either a quadratic number of unknowns per pixel in the primal (in terms of the number of labels) or has a quadratic number of coupled constraints (respectively penalizing terms) in the dual. The nested optimization procedure proposed in [5] is appealing in terms of memory requirements (since only a linear number of unknowns is maintained per pixel, although the inner reprojection step consumes temporarily O(L2 ) variables), but as any other nested iterative approach it comes with difficulties determining when to stop the inner iterations. On the other hand, the methods described in [11, 14] have closed form iterations, but require O(L2 ) variables. This is also the case if e.g. Douglas-Rachford splitting [8] (see also the recent survey in [7]) is applied either on the primal problem Eq. 6 or on the always finite dual Eq. 11. We propose two methods for efficiently solving the tight relaxation: the first one addresses truncated smoothness costs (Section 3.2) and starts with solving the efficient (but slightly weaker) model Eq. 8. It subsequently identifies potential triple junctions and switches locally to the tight relaxation until convergence. The second proposed method applies a forward-backward splitting-like method on a smoothened version of the dual energy Eq. 11, and gradually reduces the smoothness parameter (and the allowed time step).

4.1. Subsequent Refinement of the Efficient Model Our first proposed method to solve the tight convex relaxation in an efficient way is based on the intuition given in Section 3.2: the weaker relaxation Eweak can only be potentially strengthened where three or more phases meet, i.e. at pixels s such that ysi∗ 6= 0 for at least three labels i. For these pixels the weaker model underestimates the true smoothness costs and does not guarantee consistency of boundary normals (recall Fig. 2). For a pixel s let As denote the set of labels with ysi∗ 6= 0, and at potentially problematic triple junctions we have |As | ≥ 3. The underestimation of the primal smoothness translates to unnecessarily strong restrictions on pis for i ∈ As , i.e. all constraints kpis k ≤ θ∗ /2 are strongly active for i ∈ As (recall that ysi∗ 6= 0 is a generalized Lagrange multiplier for kpis k ≤ θ∗ /2). Consequently, replacing the constraints kpis k ≤ θ∗ /2 by the weaker ones of the corresponding tight relaxation kpis − pjs k ≤ θ∗ for all i ∈ As allows the dual energy to increase. In the primal this means, that for active labels i the indiscriminative transition gradient ysi∗ is substituted by explicit transition variables ysij (for j > i) and ysji

(for j < i). The marginalization constraint of Efast (Eq. 8) X X ∇xis = ysji − ysij − ysi∗ j:i−T
j:i
is replaced by one in Eq. 6, X X ∇xis = ysji − ysij j
j>i

for active labels i ∈ As . After augmenting the energy for the problematic pixels, a new minimizer is determined. In practice most problematic pixels are fixed after the first augmentation step, but not all, and there is no guarantee (verified by experiments) that a global solution of the tight model Eq. 6 is already reached after just one augmentation. Hence, the augmentation procedure is repeated until no further refinement is necessary. This approach is guaranteed to find a global minimum of the tight relaxation: Observation 2 If for a primal solution (x∗ , y ∗ ) of the augmenting procedure the set of active labels As = {i : (y ∗ )i∗ s 6= 0} has at most two elements for all pixels s ∈ Ω (i.e. at most two different labels meet at “non-augmented” pixels), then x∗ is also optimal for Etight . This means that all potential triple or higher-order junctions have been addressed by the augmentation steps. The correctness of this observation can be shown by verifying the first-order optimality conditions, i.e. that (x∗ , y ∗ ) together with the dual variables p∗ forms an optimal primal-dual pair (recall Section 3.4, see the supplementary material for details). On planar grids at most four regions can meet in a single node (only 3 if ∇ is discretized via one-sided finite differences), one expects the augmentation procedure to terminate with only few pixels being enhanced. In theory, more phases could meet in a single pixel, since we have to allow fractional values for xis . In a few cases (pixels) we observed As = {1, . . . , L}. In practice only a few augmentation steps are necessary leading to a ≈ 10% increase of memory requirements over the efficient model Eq. 8. We use the primal-dual method [6] for minimization. See Figs. 3(ac) and 4(a,b) for the intermediate results and energy evolution, respectively. All methods reach relatively fast a solution that is visually similar to the fully converged one, but achieving a significantly small relative duality gap (e.g. < 0.01%) is computationally much more expensive for all methods.

4.2. Smoothing-Based Optimization Recall that the dual energies of the tight relaxation (Eq. 10 or 11) have only O(L) unknowns per pixel, but a quadratic number of constraints/terms in the objective. In terms of efficient memory use, a purely dual or primaldual method is desirable. Chambolle et al. [5] utilize a

(a) Efast

(b) After 1 augmentation

(c) After 2 augmentations

(d) Smooth optimization

(e) Exact solution Etight

Figure 3. Stereo result using absolute color differences and the Potts discontinuity model. We want to emphasize, that not the quality of the obtained disparity map, but the equivalence between (c), (d) and (e) is of importance.

primal-dual method requiring the projection into the nontrivial feasible set. This projection has no closed form solution and needs to be solved via inner iterations (requiring temporarily O(L2 ) variables per pixel). The dual energies, ∗ e.g. Etight-III with only penalizer terms (recall Eq. 12), allows to smoothen the dual energy in a numerically robust way. A principled way to smooth non-smooth functions with bounds on the Lipschitz constant of its gradient is presented in [13]: for a non-smooth (convex) function f and a smoothing parameter ε > 0, a smooth version fε of f with Lipschitz-continuous gradient (and Lipschitz constant 1/ε) is given by fε = (f ∗ + εk·k2 /2)∗ . In order to have con∗ vex instead of concave terms, we minimize −Etight-III with respect to p and q, X X ∗ qs − div pis − θsi + −qs + −Etight-III (p, q) = s

s,i

X X √ + 2 kpis − pjs k2 − θij + . s

i,j:i
(14)

The second and third sums are non-smooth. First, the [·]+ = max(0, ·) expressions in the second sum can be replaced by a soft-maximum function. Especially in the machine learn ε→0 ing literature the logistic soft-hinge, ε log 1 + ex/ε → [x]+ , is often employed, but the exponential and logarithm functions are slow to compute and require special handling for very small ε. Similar to the Huber cost, which is a smooth version of the magnitude function, the smooth version of [·]+ can be easily derived as   x≤0 0 [x]+,ε := x − ε/2 x ≥ ε   2 x /2ε 0 ≤ x ≤ ε. Obtaining √a smooth variant of expressions of the shape hθ (z) := 2[kzk2 − θ]+ appearing in the last summation is more involved, but can be shown to be  if kzk ≤ θ  0 √ 2 hθε (z) = (kzk−θ) if θ ≤ kzk ≤ θ + 2ε 2ε  √ √  2(kzk − θ) − ε if kzk ≥ θ + 2ε.

We refer to the supplementary material for the derivation. Overall, the smooth energy corresponding to Eq. 14 reads as

∗ −Etight-III,ε (p, q) =

+

X s

−qs +

X X s

i,j:i
X qs − div pis − θsi +,ε s,i

ij hθε (pis

− pjs ).

(15)

By using the chain rule, ∇x f (Ax) = AT ∇y f (y)|y=Ax , for a differentiable function f and a matrix A, the upper bound of the Lipschitz constant of ∇x f (Ax) is given by L ≤ kAk22 Lf , where Lf is the Lipschitz constant of ∇f and kAk2 is the respective operator norm of A. Conse∗ quently, the Lipschitz constant of ∇Etight-III can be bounded by 5(L + 1)/ε, since kAk2 ≤ 5(L + 1) for the matrix A mapping (p, q) to their appearances in the respective summands (see the supplementary material for details). Thus, the largest allowed timestep in forward-backward splitting and related accelerated gradient methods is required to be less or equal than ε/(5(L + 1)) in order to have convergence guarantees. Note that Eq. 15 is completely smooth and the backward step e.g. in forward-backward splitting is a no-op. We considered and implemented different dual energies leading to a smooth and a non-smooth term in the objective, but none of these appears to be superior to Eq. 15. Hence, in Fig. 4(c) and (d) we report the energy evolution of Eq. 15 using the proximal gradient algorithm proposed in [2], and the Euclidean distance to a converged, groundtruth solution, respectively. Surprisingly, while the dual energy and the distance to the true solution seems to favor the smoothing-based approach over a primal-dual implementation for Etight , the primal energy evolution is clearly inferior. Our conjecture is, that the marginalization constraints in the primal are only slowly satisfied in the smooth formulation. The recurring peaks in the energy and distance graphs Fig. 4(c,d) are due to adjustment of ε in an annealing scheme. A clear advantage of using FISTA for the smooth energy is the trivial implementation on GPUs, where we expect speedups of two orders of magnitude.

(a) Etight vs. Efast +refinement

(b) Etight vs. Efast +refinement

(c) Etight vs. Etight-III,ε

(d) Etight vs. Etight-III,ε

Figure 4. Evolution of the energies and respective Euclidean distances to a converged ground truth solution for the tight model Eq. 5, the refinement strategy (a,b), and FISTA applied on Etight-III,ε (c,d).

5. Conclusion In [5] the question is raised, whether there is a simple primal representation of the convex relaxation Eq. 4 for multi-label problems. In this work we are able to give an intuitive answer to that question at least in the discrete, finite-dimensional setting. Thus, there is now a clearer understanding what the tight convex formulation optimizes on a discrete image grid, and how to improve the computational efficiency. There are strong links between the local polytope relaxation for MRFs and the ones derived in a continuous and infinite-dimensional setting. We do not know whether it is easy to state the primal program in the continuous domain in a similar way to Eq. 6. For instance, the P constraint in its difference form, Pmarginalization ∇xi = ji y ij , would read just as a linear PDE, but there is the complication that xis is not smooth. Analyzing the continuous setting and further extensions of Eq. 6 are subject to future work.1

References [1] G. Alberti, G. Bouchitt´e, and G. D. Maso. The calibration method for the Mumford-Shah functional and freediscontinuity problems. Calc. Var. Partial Differential Equations, 16(3):299–333, 2003. [2] A. Beck and M. Teboulle. A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM J. Imaging Sci., 2:183–202, 2009. [3] J. Borwein and A. S. Lewis. Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer, 2000. [4] Y. Boykov and V. Kolmogorov. Computing geodesics and minimal surfaces via graph cuts. In Proc. ICCV, pages 26– 33, 2003. [5] A. Chambolle, D. Cremers, and T. Pock. A convex approach for computing minimal partitions. Technical report, Ecole Polytechnique, 2008. [6] A. Chambolle and T. Pock. A First-Order Primal-Dual Algorithm for Convex Problems withApplications to Imaging. Journal of Mathematical Imaging and Vision, pages 1–26, 2010.

[7] P. L. Combettes and J.-C. Pesquet. Fixed-Point Algorithms for Inverse Problems in Science and Engineering, chapter Proximal Splitting Methods in Signal Processing, pages 185–212. Springer, 2011. [8] J. Eckstein and D. Bertsekas. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Programming, 55:293–318, 1992. [9] A. Globerson and T. Jaakkola. Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In NIPS, 2007. [10] V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell., 28(10):1568–1583, 2006. [11] J. Lellmann, D. Breitenreicher, and C. Schn¨orr. Fast and exact primal-dual iterations for variational problems in computer vision. In Proc. ECCV, 2010. [12] D. Mumford and J. Shah. Optimal approximation by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math., 42:577–685, 1989. [13] Y. Nesterov. Smooth minimization of non-smooth functions. Math. Programming, 103:127–152, 2005. [14] E. Strekalovskiy, B. Goldluecke, and D. Cremers. Tight convex relaxations for vector-valued labeling problems. In Proc. ICCV, 2011. [15] Y. Weiss, C. Yanover, and T. Meltzer. MAP estimation, linear programming and belief propagation with convex free energies. In Uncertainty in Artificial Intelligence, 2007. [16] T. Werner. A linear programming approach to max-sum problem: A review. IEEE Trans. Pattern Anal. Mach. Intell., 29(7), 2007. [17] C. Zach, D. Gallup, J.-M. Frahm, and M. Niethammer. Fast global labeling for real-time stereo using multiple plane sweeps. In Vision, Modeling and Visualization Workshop (VMV), 2008.

1 Code is available at http://www.inf.ethz.ch/personal/chaene/. Supported by the 4DVideo ERC Starting Grant Nr. 210806.

What Is Optimized in Tight Convex Relaxations for Multi-Label Problems? Supplementary material Christopher Zach (MSR Cambridge), Christian H¨ane (ETH Z¨ urich), Marc Pollefeys (ETH Z¨ urich) April 2, 2012

1

Notations

We recall the following definitions from the main paper. The original tight relaxation of multi-label problems [Chambolle et al., 2008] reads as X X ECCP-I (u, q) = θsi (ui+1 − uis ) + (qsi )T ∇uis (1) s s,i

s,i

0 L+1 s.t. uis ≤ ui+1 = 1, uis ≥ 0 s , us = 0, us j−1

X

qsk 2 ≤ θij

∀s, i, j,

k=i

The corresponding version in terms of node-wise pseudo-marginals is given by X X ECCP-II (x, p) = θsi xis + (pis )T ∇xis

(2)

In the main paper we state the following primal energy of Eq. 2 X X X Etight (x, y) = θsi xis + θij kysij k2

(3)

s,i

s,i

s.t. pis − pjs 2 ≤ θij , xs ∈ ∆ s,i

s.t. ∇xis =

X

j:j
s

ysji −

∀s, i, j,

i,j:i
X

j:j>i

ysij xs ∈ ∆

∀s, i,

as well as this one, E(x) =

X s,i

s.t. ∇xis =

2

θsi xis + X

j:j6=i

X X s

xji s −

i,j:i
X

j:j6=i

ji θij xij s + xs 2

ij xij s , xs ∈ ∆, xs ≥ 0

(4) ∀s, i.

Switching Between Superlevel and Indicator Representations

In this section we show the equivalence between Eq. 1 and Eq. 2. In the main paper we subsequently focus on Eq. 2. 1

We use uis to denote the superlevel representation and xis for the indicator representation of a label assignment, i.e. xs = ∂i us , where we use backward differences for ∂i and u0s = 0 as boundary condition. With these definitions we obtain x1s = u1s and xis = uis − usi−1 , which is desired. We have (in 2 dimensions, but this generalizes to any dimension) ! ! i−1 i i−1 (uis+(1,0) − uis ) − (us+(1,0) − ui−1 (uis+(1,0) − ui−1 s ) s+(1,0) ) − (us − us ) = ∂i ∇x us . = ∇x ∂i us = i−1 i i−1 (uis+(0,1) − uis ) − (us+(0,1) − ui−1 (uis+(0,1) − ui−1 s ) s+(0,1) ) − (us − us ) Since we have xs = ∂i us , max hps , ∇x xs i = max hps , ∇x ∂i us i = max hps , ∂i ∇x us i = max h∂iT ps , ∇x us i,

ps ∈C

ps ∈C

ps ∈C

ps ∈C

where C is the constraint set C = {p : kpi − pj k ≤ θij }. Explicitly we have ( u1s if k = 1 k ∂i us = k k−1 us − us if 1 < k ≤ L. For a 6-label problem the matrix corresponding to ∂i is  1 −1 1   −1 1   −1 1   −1

 1 −1

1

For the adjoint operator ∂iT we have

∂iT pks

( pks − pk+1 s = pL s PL

The solution of ∂iT ps = qs is of the form pks = in terms of qs are θ

ij

≥

kpis

−

pjs k

l l=k qs

if 1 ≤ k < L if k = L.

(like an antiderivative), and the constraints expressed

L L X

X

= qsl − qsl = l=i

   .   

l=j

( P

j−1 qsl l=i

Pi−1 l

qs l=j

if i < j if i > j,

(5)

which are exactly the constraints used in the super-level representation Eq. 1.

3

The Primal of the Isotropic Tight Convex Relaxation

Since ECCP-II can be written as ECCP-II (x) =

X

θsi xis +

X s

s,i

max i ps

X i

(pis )T ∇xis

(6)

s.t. pis − pjs 2 ≤ θij , xs ∈ ∆,

we only need to consider the point-wise problem X max (pis )T ∇xis i ps

i

subject to pis − pjs 2 ≤ θij . 2

(7)

We will omit the subscript s and derive the primal of X

max (pi )T ∇xi subject to pi − pj 2 ≤ θij ∀i < j. i p

i

Fenchel duality (−f ∗ (−AT p) − g ∗ (p) X

i,j:i
f (y) + g(−Ay)) leads to the primal

θij y ij 2

subject to Ay = ∇x,

(8)

since the conjugate of f ≡ ı{k·k2 ≤ θ} is θk·k2 , and the conjugate of g ≡ aT · is ı{· = a}. The matrix −A (which has rows corresponding to pi and columns corresponding to y ij ) has a -1 entry at position (pi , y ij ) (for i < j) and a +1 element at (pj , y ij ) (i > j). Thus, the i-th row of −Ay reads as X X y ji − y ij , (9) j:j
j:j>i

and the purely primal form of Eq. 7 is given by X

min θij ysij 2 ij ys

i,j:i
s.t. ∇xis =

X

j:j
(10)

ysji −

X

ysij .

j:j>i

By replacing the inner maximization problem in Eq. 6 with this expression we obtain Etight . We can express the primal energy also in terms of non-negative pseudo-marginals. We start with the decoupled binary potentials from Eq. 4, X X

ji Es (x) = θij xij ı{xij (11) s + xs 2 + s ≥ 0} i,j:i
s.t.

∇xis

=

X

xji s

j:j6=i

−

X

i,j

xij s ,

j:j6=i

ji and dualize Es . First, we note that every optimal solution satisfies complementarity constraints xij s ⊥ xs , ji ij ij ij ij i.e. (xij ) (x ) = 0 (otherwise one could strictly decrease the norm term by setting x ← x −min{x , x s k s k s s s s } ji ij ij and xji ← x − min{x , x } without affecting the marginalization constraint). Hence, we have s s s s q

ij

ji ij ji 2 2

xs + xji

((xij s )1 + (xs )1 ) + ((xs )2 + (xs )2 ) s 2 = s ji ij ji ij ji ij ji 2 2 2 2 = ((xij s )1 ) + ((xs )1 ) + (xs )1 (xs )1 +((xs )2 ) + ((xs )2 ) + (xs )2 (xs )2 | {z } | {z } =0

ij

xij

ij ji ji T s

= ((xs )1 , (xs )2 , (xs )1 , (xs )2 ) =

xji

. s 2

=0

Consequently, Es above can be restated as

ij X

xs

+ θij ı{xij s ≥ 0}

xji

s 2 i,j:i
Es (x) =

X

j:j6=i

j:j6=i

3

(12)

which seems to be more convenient to work with. Using the fact that the conjugate of f (x) = θkxk2 +ı{x ≥ 0} is f ∗ (p) = ı{[p]+ ≤ θ} (see Section 4), we obtain the following dual of Es ,

i

X

[ps − pjs ]+ ∗ i T i

≤ θij (13) Es (p) = (ps ) ∇xs s.t. j [ps − pis ]+ 2 i X

= (pis )T ∇xis s.t. pis − pjs 2 ≤ θij , (14) i

which is exactly Eq. 7. Thus, we have shown the equivalence of the primal programs Eq. 3 and Eq. 4.

4

Dual Energies

If we consider the primal energy Etight (x, y) =

X

θsi xis +

s,i

∇xis =

X

X X s

j:j
i,j:i
ysji −

θij kysij k2

X

j:j>i

subject to

ysij , xs ∈ ∆, (15)

the dual energy is given by ∗ Etight-I (p) =

X s

min{div pis + θsi } − i

X X ı kpis − pjs k2 ≤ θij . s

i,j:i
Note that we have redundant constraints on the primal variables ysij ∈ [−1, 1] × [−1, 1] (since xis ∈ [0, 1]). One could√compute the dual of θij kysij k2 + ı{kysij k∞ ≤ 1}, but because of its radial symmetry the constraint kysij k2 ≤ 2 seems to be more appropriate. Via ∗ x 7→ θ|x| + ı[0,B] (x) (y) = max {xy − θ|x|} = B max{0, |y| − θ} x∈[0,B]

and the radial symmetry of terms in ysij we obtain for the full dual energy in this setting X X √ X ∗ min{div pis + θsi } + Etight-II (p) = 2 min 0, θij − kpis − pjs k2 . i

s

s

i,j:i
P

∗ The first term in Etight-II , mini {div pis + θsi }, can also be replaced by penalty terms: if we move the P i s normalization constraint i xs = 1 to the linear constraints and introduce a respective Lagrange multiplier qs , we obtain via ∗ x 7→ θx + ı[0,1] (x) (y) = [y − θ]+ and

(ı{x:Ax=b} )∗ (y) = ıim(AT ) (y) + bT λ

for y = AT λ

the dual energy in pis and qs : X X X X √ ∗ Etight-III 2 min 0, θij − kpis − pjs k2 , (p, q) = qs + div pis + θsi − qs − + s

s

s,i

4

i,j:i
(16)

5

Proof of Observation 1

This section shows that for graph-based MRFs with truncated smoothness costs a compact representation is equivalent to the full one. We have the full model, X X X Efull = θsi xis + θij xij st (s,t)∈E i,j

s,i

=

X

θsi xis +

s,i

X

(s,t)∈E

 

X

∗ θij xij st + θ

i,j:|i−j|
X

i,j:|i−j|≥T



 xij st

(17)

P P ij i i ij ∗ subject to the marginalization constraints j xij st = xs and i xst = xt . We assume θ = θ for |i − j| ≥ T (where T is the truncation point) and θij < θ∗ . The reduced program reads as   ∗ X X X X θ ∗i   (xi∗ (18) Ered = θsi xis + θij xij st + st + xst ) 2 i s,i (s,t)∈E

i,j:|i−j|
with the slightly different marginalization constraints X i∗ xis = xij and st + xst

X

xjt =

i,j:|i−j|
∗j xij st + xst .

i,j:|i−j|
If we have a minimizer of Efull , we can easily construct a solution of Ered with the same overall objective by setting X X xi∗ xij and x∗j xij st st = st , st = j:|i−j|≥T

i:|i−j|≥T

since the pairwise truncated smoothness costs are the same θ∗ X i∗ θ∗ X ∗j θ∗ X xst + xst = 2 i 2 j 2 i

X

xij st +

j:|i−j|≥T

θ∗ X 2 j

X

∗ xij st = θ

i:|i−j|≥T

X

xij st .

(19)

i,j:|i−j|≥T

If we have a minimizer x of Ered , we have to construct a solution x ˆ of Efull with the same objective. We set x ˆis = xis

and

ij x ˆij st = xst

∀i, j : |i − j| < T.

Determining xij st for i, j : |i − j| ≥ T is more difficult. In the following we consider a particular edge st and omit the subscript. We use the north-west corner rule-like to assign x ˆij for i, j : |i − j| ≥ T : x ¯i∗ ← xi∗ x ¯∗j ← x∗j while some x ˆij is not assigned do Choose i and j (with |i − j| ≥ T ) such that x ˆij is not assigned ij i∗ ∗j x ˆ ← min{¯ x ,x ¯ } x ¯i∗ ← x ¯i∗ − x ˆij x ¯∗j ← x ¯∗j − x ˆij end while

5

{ˆ xij ≥ 0} {¯ xi∗ ≥ 0} {¯ x∗j ≥ 0} P i ij {xs = j:(i,j) assigned x ˆ +x ¯i∗ } P {xjt = i:(i,j) assigned x ˆij + x ¯∗j }

The updates ensure that x ˆij , x ¯i∗ and x ¯∗j stay non-negative and that the following modified marginalization constraints are still satisfied after each iteration: X X X x ˆis = x ˆij + x ˆij + x ¯i∗ = x ˆij + x ¯i∗ X

x ˆjt =

j

j:|i−j|≥T

i,j:|i−j|
x ˆij +

i:|i−j|
X

x ˆij + x ¯∗j =

X

x ˆij + x ¯∗j .

i

i:|i−j|≥T

We show that all x ¯i∗ and x ¯∗j are 0 after termination of this algorithm. First, it cannot be that x ¯i∗ > 0 and ∗j ij x ¯ > 0 for some i and j: if this is the case for i, j : |i−j| < T , we can increase x ˆ and simultaneously strictly lowering the overall smoothness cost, thus contradicting that our initial solution was optimal. If x ¯i∗ > 0 and ∗j ij i∗ ∗j i∗ x ¯ > 0 for some i, j : |i − j| ≥ T , this contradicts the instructions (ˆ x ← min{¯ x ,x ¯ }, x ¯ ←x ¯i∗ − x ˆij , ∗j ∗j ij i∗ ∗j i∗ x ¯ ←x ¯ −x ˆ ) in the algorithm above, which sets one of x ¯ or x ¯ to zero. W.l.o.g. some of the x ¯ are strictly greater than 0, but all x ¯∗j are 0. We have X XX X j 1= x ˆis = x ˆij + x ¯i∗ = x ˆt + x ¯i∗ = 1 + x ¯i∗ , i

i

j

j

which is a contradiction. Hence all x ¯i∗ and x ¯∗j have to be 0 at the end of the algorithm. We further have X X x ˆij = xi∗ and x ˆij = x∗j j:|i−j|≥T

i:|i−j|≥T

and the pairwise smoothness costs are the same for x and x ˆ (similar to Eq. 19) and both overall objectives for Efull (ˆ x) and Ered (x) coincide. Thus, we have proved Observation 1.

6

Proof of Observation 2

We show that if we are given an optimal primal/dual solution pair generated by the refinement procedure satisfying the assumption stated in the observation, a primal-dual pair of optimality certificates can be constructed for the tight model, Etight . Note that the only difference between the dual of the tight model, X ∗ min{div pis + θsi } s.t. kpis − pjs k2 ≤ θij , (20) Etight-I (p) = s

i

and the weaker model for truncated costs, X ∗ Efast (p) = min{div pis + θsi } s

(21)

i

s.t. kpis − pjs k2 ≤ θij kpis k

∀s, ∀i, j : |i − j| < T

∗

≤ θ /2

∀s, i,

is the set of constraints. We assume that θij = θ∗ of |i − j| > T in Eq. 20 and that θ∗ ≥ θij , since we consider truncated smoothness cost. Consequently we have that the constraints in Eq. 21 are a superset of those in Eq. 20, due to kpis k ≤ θ∗ /2 implies kpis − pjs k ≤ θ∗ . The essential fact to prove observation 2 is, that if only two phase transitions are active, i.e. ysi1 ∗ 6= 0 and ysi2 ∗ 6= 0 for some i1 and i2 , it must hold that ysi1 ∗ = −ysi2 ∗ (the boundary normal of the entering phase must be opposite to the one of the leaving phase). This can be easily seen and is intuitive for the Potts smoothness cost. Extending that fact to general

6

truncated smoothness priors can be seen as follows:  X X X  0=∇ xis = ∇xis = i

=

i

X

i,j:i−T
=

X

ysji −

ysij

i,j:i
=

−ysi1 ∗

−

−

X

i

i,j:i
X

i,j:i
i∗ ys2 .

X

j:i−T
X

ysji −

j:i
ysij − ysi1 ∗ − ysi2 ∗



ysij − ysi∗ 

ysij − ysi1 ∗ − ysi2 ∗

P P Note that from the normalization constraint, i xis = 1, it follows that ∇ i xis = 0. Further, by assumption we have ysi∗ = 0 for i 6= i1 , i2 . First order optimality conditions ysi∗ ∈ ∂ı{k−pis k2 ≤ θ∗ /2} (i.e. ysi1 ∗ ∝ −pis1 and ysi2 ∗ ∝ −pis2 ) imply that pis1 = −pis2 . Together with kpis1 k = kpis2 k = θ∗ /2 we obtain kpis1 − pis2 k = θ∗ . In the following we assume i1 < i2 w.l.o.g. Given now the primal solution obtained from the refinement approach, we construct a feasible primal solution for the tight energy, i.e. we have to determine ysij for i, j : |i − j| ≥ T . We set in this case ysi1 i2 = ysi1 ∗ , and ysij = 0 for i, j : |i − j| ≥ T otherwise. It can be easily checked that this choice for ysij satisfies the marginalization constraints, i.e. one half of the optimality conditions. The dual variables p are a certificate for optimality, since ysi1 i2 6= 0 implies kpis1 − pis2 k = θ∗ (i.e. the inequality constraint is tight), and for i, j : |i − j| ≥ T we have ysij = 0 and kpis − pjs k ≤ θ∗ . Overall, the other half of optimality conditions, ysij 6= 0 =⇒ kpis − pjs k = θij , and we have shown optimality of the constructed solution with respect to the tight energy Etight .

7 7.1

Notes on smoothing-based optimization A smooth version of hθ (z) =

√ 2 kzk2 − θ]+

By construction we know that the convex conjugate of hθ is given by √ (hθ )∗ (x) = θkxk2 + ı{kxk ≤ 2}. Thus, a smooth version of hθ is the convex conjugate of (hθε )∗ (x) = θkxk2 + ı{kxk2 ≤

√

ε 2} + kxk22 . 2

Consequently, hθε (z) =

ε max√ xT z − θkxk2 − kxk22 . 2 x:kxk2 ≤ 2

If we fix kxk, then an x colinear with z is maximizing the expression, hence we can reduce the problem by restricting x to be x = cz for some c ≥ 0. Hence, the above maximization problem is equivalent to hθε (z) =

2 √ ckzk2 c≥0:ckzk2 ≤ 2

max

ε − cθkzk2 − c2 kzk22 . 2

We have hθε (0) = 0, and in the following we assume z 6= 0, i.e. kzk2 > 0. We have to analyze three cases: √ • c ∈ (0, 2/kzk2 ): First order conditions on c yield !

kzk22 − θkzk2 − εckzk22 = 0 7

i.e. c=

kzk2 − θ εkzk2

hθε (z) =

and

in this case. Note that c > 0 if kzk2 > θ.

2 1 kzk2 − θ 2ε

• c = 0: This case is effective if kzk2 ≤ θ, and in this case we have hθε (z) = 0. • c=

√

2/kzk2 : In this case we obtain hθε (z) =

This case is in effect if c =

kzk2 −θ εkzk2

≥

√ 2 kzk2 ,

√

2 kzk2 − θ − ε.

i.e. kzk ≥ θ +

√

2ε.

Overall we obtain the smooth version of hθ as stated in the main text.

7.2

Bound on the operator norm of A

To get the Lipschitz constant we again look at the A matrix and get an upper bound for kAk2 via kAk22 ≤ kAk1 kAk∞ . Note that kAk1 is the maximum absolute column sum, and kAk∞ is the maximum absolute row sum. The columns of A are indexed by the unknowns ((pis )1 , (pis )2 and qs ), and the rows of A correspond to ∗ (or its smooth version), the terms in Etight-III ∗ Etight-III (p, q) =

X s

qs +

X X √ X 2 min 0, θij − kpis − pjs k2 . div pis + θsi − qs − + s

s,i

i,j:i
Since all occurrences of pis and qs have a +1 or −1 coefficient, it is sufficient to just count the occurrences of each variable. Since at most 5 variables appear in one term (rows corresponding to [div pis + θsi − qs − ), we P have kAk∞ = 5. qs appears in L + 1 terms (in qs and in i div pis + θsi − qs − ), and e.g. (pis )1 occurs also at most √ in L + 1 terms (in the divergence terms with respect to s and s − (1, 0) and in L − 1 expressions P ij i j ), hence kAk1 = L + 1. Overall we have the bound kAk22 ≤ 5(L + 1). 2 min 0, θ − kp − p k 2 s s i,j:i
7.3

Extracting the primal solution from the smooth dual

We recall the smooth dual energy and indicate the correspondence between the terms in the dual energy and the respective primal variable, X X X X ij ∗ qs − div pis − θsi +,ε + hθε (pis − pjs ) . (22) −Etight-III,ε (p, q) = −qs + {z } | {z } s i,j:i
,xis

First order optimality conditions require that the corresponding primal unknowns are given by xis = and

d z − θsi +,ε |z=qs −div pis dz ij

ysij = ∇z hθε (z)|z=pi −pjs . s

This allows to obtain primal estimates for iterative dual optimization methods, but the marginalization constraints between xs and ys will be only fulfilled after convergence. 8

Primal: Iterative Refinement Dual: Iterative Refinement Primal: Tight Model Dual: Tight Model

76200

Energy

76100

76000

75900

75800 100

1000

10000

Time [s]

(a) Primal and dual energy evolution 0.02 Iterative Refinement Tight Model

Average L2-Distance

0.015

0.01

0.005

0 100

1000

10000

Time [s]

(b) Euclidean distance to a fully converged solution

Figure 1: Energy evolution and distance to the final solution for the Tsukuba stereo pair.

8

Numerical Convergence and Visual Comparison Between Etight and Efast

We use the standard Tsukuba stereo pair for illustration. The data term (unary potential) is X c c λ |Ileft (x) − Iright (x + d)| c∈{R,G,B}

In Fig. 1 the evolution of the energies and of the distance to a converged solution is depicted (with λ = 20 and the Potts smoothness prior). The graphs are shown for direct optimization of the full model Eq. 3 and for the iterative refinement method (Section 4.1 in the main submission). Although there is very little difference in the visual results after a few 100 iterations, numerical convergence is slow (as usual for first-order methods applied on non-strict convex problems). Fig. 2 illustrates the visual difference between the tight and the efficient model for truncated linear smoothness costs. The values of λ are varying for the different truncation values in order to have roughly the same visual appearance. In real situations the difference between the tight and the efficient relaxations are smaller than for the triple junction inpainting example (due to the presence of the unary data term).

9

Figure 2: Visual comparison between the efficient and the tight relaxation. Top row: Efast , bottom row: Etight . 1st column: Potts model, λ = 5. 2nd column: truncated linear with truncation at 2, λ = 10. 3rd column: truncated linear with truncation at 4, λ = 15.

10

What Is Optimized in Tight Convex Relaxations for Multi ...

st â [â1, 1] in the primal program leads to ... straints. Different choices of primal constraints lead to dif- .... Since a large jump is represented twice via yiâ and yjâ ...

Download PDF

786KB Sizes 1 Downloads 139 Views

Report

Recommend Documents

No documents