High-Order Methods for Overcoming Degeneracy in Interior-Point Methods for Quadratic Programming Nicholas Gould 1
Dominique Orban 2 1 2
3
Daniel Robinson 3
Rutherford Appleton Laboratory
École Polytechnique de Montréal
Oxford University Mathematical Institute
Santiago, Chile July 29, 2010
Outline
Introduction and Motivation
Problem of interest QP
minimize cTx + 12 xTHx x
subject to Ax = b, x ≥ 0
Interested in general H, but require H 0 for some results Assume that A has full row rank
First order optimality conditions Hx∗ + c = AT y∗ + z∗ Ax∗ = b min(x∗ , z∗ ) = 0 max(x∗i , z∗i ) > 0 for all i =⇒ nondegenerate max(x∗i , z∗i ) = 0 for some i =⇒ degenerate
Problem of interest QP
minimize cTx + 12 xTHx x
subject to Ax = b, x ≥ 0
Interested in general H, but require H 0 for some results Assume that A has full row rank
First order optimality conditions Hx∗ + c = AT y∗ + z∗ Ax∗ = b min(x∗ , z∗ ) = 0 max(x∗i , z∗i ) > 0 for all i =⇒ nondegenerate max(x∗i , z∗i ) = 0 for some i =⇒ degenerate
Problem of interest QP
minimize cTx + 12 xTHx x
subject to Ax = b, x ≥ 0
Interested in general H, but require H 0 for some results Assume that A has full row rank
First order optimality conditions Hx∗ + c = AT y∗ + z∗ Ax∗ = b min(x∗ , z∗ ) = 0 max(x∗i , z∗i ) > 0 for all i =⇒ nondegenerate max(x∗i , z∗i ) = 0 for some i =⇒ degenerate
Interior point subproblem T
minimize c x + x
1 T x Hx−µ 2
n X
ln(xi )
subject to Ax = b
i=1
µ > 0 is the barrier parameter if x(µ), y(µ) is the minimizer, then generally lim x(µ), y(µ), z(µ) → (x∗ , y∗ , z∗ ) µ→0
def
def
where z(µ) = µX(µ)−1 e and X = diag(x1 , x2 , . . . , xn ) Notation: v = (x, y, z), v(µ) = x(µ), y(µ), z(µ) , . . . etc. Key result (Stoer/Wechs/Mizuno 1998, Wright/Orban 2002) ( Θ(µ) nondegenerate ∗ v(µ) − v = √ Θ( µ) degenerate
Interior point subproblem T
minimize c x + x
1 T x Hx−µ 2
n X
ln(xi )
subject to Ax = b
i=1
µ > 0 is the barrier parameter if x(µ), y(µ) is the minimizer, then generally lim x(µ), y(µ), z(µ) → (x∗ , y∗ , z∗ ) µ→0
def
def
where z(µ) = µX(µ)−1 e and X = diag(x1 , x2 , . . . , xn ) Notation: v = (x, y, z), v(µ) = x(µ), y(µ), z(µ) , . . . etc. Key result (Stoer/Wechs/Mizuno 1998, Wright/Orban 2002) ( Θ(µ) nondegenerate ∗ v(µ) − v = √ Θ( µ) degenerate
Our motivation ( Θ(µ) v(µ) − v∗ = √ Θ( µ) 1
nondegenerate degenerate
“one off” solve of problem QP - µ = 1.0−6 =⇒ kv(µ) − v∗ k ≈ 1.0−3 in the degenerate case
2
improve cross-over methods for solving QP - use x(µ) to predict optimal active set for an active set phase
3
improve large-scale SQP methods (S2QP) - solve a convex QP to predict an optimal active set, which is used to compute an additional accelerator step
4
efficiency - higher-order methods =⇒ fewer factorizations
Nondegenerate QP
Trajectory of minimizers (Fiacco and McCormick) Let v∗ be a nondegenerate second-order sufficient local minimizer of QP and assume that A has full row rank, then there exists a continuously differentiable path v(µ) such that lim v(µ) = v∗
µ→0
and v(µ) minimizes the barrier problem for all µ sufficiently small 0.6
0.5
0.4
0.3
0.2
0.1
0
−0.1 −0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Question: Given v(µ) and µ ¯ ∈ (0, µ), how do we find v(µ) ¯ efficiently? Answer: Apply Newton’s Method to perturbed optimality conditions
Barrier optimality conditions 1
Hx + c =
2
Ax = b µ ¯ Z−1 e
3
x=
4
min(x, z) > 0
H −AT A 0 Z(µ) 0
ATy
+z
Equivalent conditions 1
Hx + c = ATy + z
2
Ax = b
3
Xz = µe ¯
4
min(x, z) > 0
N −I ∆x Hx(µ) + c − ATy(µ) − z(µ) 0 ∆yN = − Ax(µ) − b N X(µ) ∆z X(µ)z(µ) − µe ¯
Question: Given v(µ) and µ ¯ ∈ (0, µ), how do we find v(µ) ¯ efficiently? Answer: Apply Newton’s Method to perturbed optimality conditions
Barrier optimality conditions 1
Hx + c =
2
Ax = b µ ¯ Z−1 e
3
x=
4
min(x, z) > 0
H −AT A 0 Z(µ) 0
ATy
+z
Equivalent conditions 1
Hx + c = ATy + z
2
Ax = b
3
Xz = µe ¯
4
min(x, z) > 0
N −I ∆x Hx(µ) + c − ATy(µ) − z(µ) 0 ∆yN = − Ax(µ) − b N X(µ) ∆z X(µ)z(µ) − µe ¯
Question: Given v(µ) and µ ¯ ∈ (0, µ), how do we find v(µ) ¯ efficiently? Answer: Apply Newton’s Method to perturbed optimality conditions
Barrier optimality conditions 1
Hx + c =
2
Ax = b µ ¯ Z−1 e
3
x=
4
min(x, z) > 0
H −AT A 0 Z(µ) 0
ATy
+z
Equivalent conditions 1
Hx + c = ATy + z
2
Ax = b
3
Xz = µe ¯
4
min(x, z) > 0
N −I ∆x Hx(µ) + c − ATy(µ) − z(µ) 0 ∆yN = − Ax(µ) − b N X(µ) ∆z X(µ)z(µ) − µe ¯
Question: Given v(µ), is there a method to compute accurate estimates of v(µ) ¯ for 0 < µ ¯ µ? Answer: Yes, use a higher-order Taylor approximation for v(µ), ¯ i.e., v(µ) ¯ ≈ v(µ) +
p X v[j] (µ)(µ ¯ − µ)j j=1
j!
def
= vp (µ) ¯
v1 (µ) ¯ ≡ ∆vN existence of derivatives follows from Fiacco and McCormick derivatives are obtained by repeated differentiation of Hx(µ) + c = ATy(µ) + z(µ) Ax(µ) = b X(µ)z(µ) = µe these results assume that v(µ) is computed exactly
Question: Given an approximation to v(µ), is there a method to compute accurate approximations to v(µ) ¯ for 0 < µ ¯ µ? Answer: Yes, use approximations to multiple points on the trajectory compute (µj , xj ) such that xj ≈ x(µj ) for j = 1, . . . p solve the least-squares problem minimize c
1 2
p X
pr (µj ) − xj
2
j=1 def
for the coefficient vector c, where pr (µ) =
Pr
l=0 cl µ
l
and r ≤ p
use the approximation pr (µ) ¯ ≈ x(µ) ¯ For most of the talk we focus on higher-order approximations, but key concepts may be adapted to a least-squares approach
Question: Given an approximation to v(µ), is there a method to compute accurate approximations to v(µ) ¯ for 0 < µ ¯ µ? Answer: Yes, use approximations to multiple points on the trajectory compute (µj , xj ) such that xj ≈ x(µj ) for j = 1, . . . p solve the least-squares problem minimize c
1 2
p X
pr (µj ) − xj
2
j=1 def
for the coefficient vector c, where pr (µ) =
Pr
l=0 cl µ
l
and r ≤ p
use the approximation pr (µ) ¯ ≈ x(µ) ¯ For most of the talk we focus on higher-order approximations, but key concepts may be adapted to a least-squares approach
Nondegenerate example minimize x
1 2 x 2
subject to x ≥ 1
Errors for various approximation schemes µ = 1.0e−5 µ 10 −10
x(0)
x µ2
Interp 2
9.99e−10
9.99e−10
Interp 3
−12
1.99e
−12
1.99e
1.77e−12
Interp 4
4.86e−14
4.84e−14
4.30e−14
Hermite 2
0.00e+00
0.00e+00
0.00e+00
Spline 2
9.99e−10
9.99e−10
8.90e−10
Spline 3
1.99e−12
1.99e−12
1.77e−12
Spline 4
−14
4.86e
−14
4.86e
4.30e−14
Taylor 2
1.99e−15
1.99e−15
1.33e−15
x
8.90e
Degenerate QP
Degenerate example minimize x
1 2 x 2
subject to 0 ≤ x ≤ 1
Errors for various approximation schemes µ = 1.0e−5
Interp 4
x(0)
x µ2
2.34e−03
2.33e−03
−03
−03
x
µ 10 −03
1.42e
Hermite 3
1.40e
1.39e
5.99e−04
Hermite 4
1.40e−03
1.39e−03
5.98e−04
Spline 4
2.34e−03
2.33e−03
1.42e−03
Taylor 2
1.18e−03
1.17e−03
4.19e−04
Taylor 3
9.88e−04
9.78e−04
2.75e−04
Taylor 4
−04
8.64e
−04
8.54e
1.93e−04
Taylor 5
7.78e−04
7.68e−04
1.42e−04
The main idea – go Puiseux!
√ Based on the result that v(µ) − v∗ = Θ( µ), we postulate v(µ) = v∗ +
∞ X
cj µ j/2 ,
j=1
and by defining ρ =
√
µ, we have def
v¯(ρ) = v∗ +
∞ X
cj ρ j = v(µ)
j=1
Hope that v¯(ρ) is analytic Use higher-order or least-squares approximations to v¯(ρ)
The main idea – go Puiseux!
√ Based on the result that v(µ) − v∗ = Θ( µ), we postulate v(µ) = v∗ +
∞ X
cj µ j/2 ,
j=1
and by defining ρ =
√
µ, we have def
v¯(ρ) = v∗ +
∞ X
cj ρ j = v(µ)
j=1
Hope that v¯(ρ) is analytic Use higher-order or least-squares approximations to v¯(ρ)
Degenerate example revisited minimize x
1 2 x 2
subject to 0 ≤ x ≤ 1
Errors for various approximation schemes µ = 1.0e−5
Interp 4
x(0)
x µ2
6.99e−09
6.96e−09
−13
−13
x
µ 10 −09
4.15e
Hermite 3
4.87e
4.82e
1.73e−13
Hermite 4
1.27e−14
1.26e−14
4.44e−15
Spline 4
6.99e−09
6.96e−09
4.15e−09
Taylor 2
1.19e−08
1.17e−08
3.79e−09
Taylor 3
5.49e−13
5.42e−13
1.29e−13
Taylor 4
−13
1.38e
−13
1.36e
2.07e−14
Taylor 5
4.97e−16
4.88e−16
5.05e−17
Comparison of “µ” and “ρ” parametrization. µ = 1.0e−5
Interp 4
x(0)
x(µ) x µ2
x
2.34e−03
2.33e−03
1.42e
−03
−03
µ 10 −03
−04
x(0)
x¯(ρ) x µ2
x
6.99e−09
6.96e−09
4.15e
−13
−13
µ 10 −09
Hermite 3
1.40e
1.39e
5.99e
4.87e
4.82e
1.73e−13
Hermite 4
1.40e−03
1.39e−03
5.98e−04
1.27e−14
1.26e−14
4.44e−15
Spline 4
2.34e−03
2.33e−03
1.42e−03
6.99e−09
6.96e−09
4.15e−09
Taylor 2
−03
1.18e
−03
1.17e
−04
4.19e
−08
1.19e
−08
1.17e
3.79e−09
Taylor 3
9.88e−04
9.78e−04
2.75e−04
5.49e−13
5.42e−13
1.29e−13
Taylor 4
8.64e−04
8.54e−04
1.93e−04
1.38e−13
1.36e−13
2.07e−14
Taylor 5
7.78e−04
7.68e−04
1.42e−04
4.97e−16
4.88e−16
5.05e−17
Give credit where credit is due We are not the first, but few seem aware of this result Stoer/Wechs/Mizuno (1998), proved that x¯(ρ) is analytic for the sufficient horizontal complementarity problem Stoer/Wechs/Mizuno (1998) suggest different strategies for using x¯(ρ) and indicator functions as the basis of an algorithm Zhao/Sun (1999), suggested a “predictor-corrector” method and a single step method based on perturbed trajectories We do not know of any numerical results We do not know of any software that incorporates this idea We have our own ideas for implementation
Implementation
Strategy 1: the “lunge” 1
Compute v(µ) for µ ≈ 10−6 using an interior point solver
2
Approximate v∗ by a sixth degree Puiseux expansion
3
Obtain an estimate of an optimal active set
4
Verify optimality
5
Use “cross-over” to resolve multipliers for general constraints May use Puiseux expansion to “skip down the path” Strategy has been added to QPB Highly accurate solutions are typically returned Active set identification appears to be improved Requires staying “close” to the path QPB is a two stage method for solving indefinite QP
Strategy 1: the “lunge” 1
Compute v(µ) for µ ≈ 10−6 using an interior point solver
2
Approximate v∗ by a sixth degree Puiseux expansion
3
Obtain an estimate of an optimal active set
4
Verify optimality
5
Use “cross-over” to resolve multipliers for general constraints May use Puiseux expansion to “skip down the path” Strategy has been added to QPB Highly accurate solutions are typically returned Active set identification appears to be improved Requires staying “close” to the path QPB is a two stage method for solving indefinite QP
Strategy 2: perturbed trajectory + Zhang + Zhao/Sun 1
Given v ≈ v(µ) where µ = (xT z)/n, compute σ ∈ (0, 1)
2
Approximate v(σµ) by a Puiseux expansion
3
Perform linesearch to ensure + +
X z
+
≥ γµ
+
and µ
+
Hx + c − AT y+ − z+
≥ν
Ax+ − b
v
Single phase v(µ)
Intended for convex QP Higher order friendly
v
Dynamic updating of µ
+
v(σµ)
CQP in GALAHAD
May use final “lunge”
v∗
Strategy 2: perturbed trajectory + Zhang + Zhao/Sun 1
Given v ≈ v(µ) where µ = (xT z)/n, compute σ ∈ (0, 1)
2
Approximate v(σµ) by a Puiseux expansion
3
Perform linesearch to ensure + +
X z
+
≥ γµ
+
and µ
+
Hx + c − AT y+ − z+
≥ν
Ax+ − b
v
Single phase v(µ)
Intended for convex QP Higher order friendly
v
Dynamic updating of µ
+
v(σµ)
CQP in GALAHAD
May use final “lunge”
v∗
Algorithm CQP on degenerate problem using Taylor expansion It 0 1 2 3 4 5 . 20 21 22 23 24 25 26 27 28 29
p-feas 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00 . 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00
d-feas 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00 . 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00 3.2E-27 8.1E-28 4.0E-28 2.0E-28
com-slk 1.0E+00 1.5E-01 2.1E-02 3.1E-03 4.6E-04 6.7E-05 . 1.2E-17 1.7E-18 2.3E-19 3.3E-20 4.6E-21 6.5E-22 9.1E-23 1.3E-23 1.8E-24 2.5E-25
obj 5.0E-01 7.3E-02 1.1E-02 1.6E-03 2.3E-04 3.4E-05 . 5.9E-18 8.3E-19 1.2E-19 1.6E-20 2.3E-21 3.2E-22 4.5E-23 6.4E-24 9.0E-25 1.3E-25
Optimal solution value = 5.0259E-13
step 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 . 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00 1.0E+00
mu 1.0E-02 1.5E-03 2.1E-04 3.1E-05 4.6E-06 5.5E-07 . 4.0E-26 2.1E-27 1.1E-28 5.9E-30 3.1E-31 1.6E-32 8.7E-34 4.6E-35 2.4E-36 1.3E-37
Algorithm CQP on degenerate problem using Puixeux expansion It 0 1 2 3 4
p-feas 0.0E+00 0.0E+00 0.0E+00 0.0E+00 0.0E+00
d-feas 0.0E+00 0.0E+00 1.7E-18 8.1E-20 0.0E+00
com-slk 1.0E+00 2.2E-04 5.0E-08 5.6E-15 7.1E-29
obj 5.0E-01 1.1E-04 2.5E-08 2.8E-15 3.5E-29
Optimal solution value = 8.4205E-15
step 1.0E+00 1.0E+00 1.0E+00 1.0E+00
mu 1.0E-02 2.2E-06 1.1E-11 4.2E-22 6.0E-43
Summary
Summary Highly accurate solution of degenerate QP is not possible by standard higher order or curve fitting methods Degeneracy in QP may be overcome by using a “fractional” Puiseux expansion that is analytic (convex QP) Posed two strategies for incorporating the Puiseux expansion into interior-point framework Appears to work well for nonconvex QP, although not covered by existing theory Extends to classes of LCP Higher order Puiseux expansions improve efficiency by reducing the number of overall factorizations, although generally more system solves are required
References
Josef Stoer, Martin Wechs, and Shinji Mizuno High order infeasible-interior-point methods for solving sufficient linear complementarity problems Mathematics of Operations Research, 23, pp. 832-862, (1998) Yin Zhang On the convergence of a class of infeasible interior-point methods for the horizontal linear complementarity problem Siam J. Optimization, 1 (1994), pp. 208–227 Gongyun Zhao and Jie Sun On the rate of local convergence of high-order infeasible path-following algorithms for P∗ -linear complementarity problems Computational Optimization and Applications, 14, 293-307 (1999)