Online Supplement to: “Stochastic Trust-Region Response-Surface Method (STRONG) – A New Response-Surface Framework for Simulation Optimization” Kuo-Hao Chang Dept. of Industrial Engineering and Engineering Management National Tsing Hua University, Hsinchu 30013, Taiwan L. Jeff Hong Dept. of Industrial Engineering & Logistics Management The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong Hong Wan School of Industrial Engineering Purdue University, West Lafayette, IN 47906 December 21, 2011
This documents provides the full STRONG algorithm and some proofs that are not included in the main body of the article.
A A.1
Appendix The STRONG Algorithm
The main framework of STRONG is provided in Figure 1, and the details of Stages I and II and inner loops are provided in Figure 2. Step 0. Set the iteration count k = 0. Select an initial solution x0 , initial sample size of the center point ˜ satisfying n0 and of the gradient estimator m0 , initial trust region size ∆0 , the switch threshold ∆ ˜ > 0, and constants η0 , η1 , γ1 , and γ2 satisfying 0 < η0 < η1 < 1 and 0 < γ1 < 1 < γ2 . ∆0 > ∆ ˜ go to Stage I ; otherwise, go to Stage II. Step 1. Let k = k + 1. If ∆k > ∆, Step 2. If the termination criterion is satisfied, stop and return the solution. Otherwise, go to Step 1.
Figure 1: STRONG-Main Framework
A.2
Proof of Lemma 1
b k and H b k to denote gbk (xk ), ∇g b k (xk ) Proof. To simplify the presentation, in this proof, we use gbk , ∇g b and Hk (xk ), respectively.
1
Stage I. First-order polynomial (for kth iteration) 1. Build a first-order polynomial. 2. Solve the subproblem and obtain a new solution x∗k . 3. Take n0 replications at the new solution x∗k . 4. Conduct the ratio comparison and sufficient reduction tests and update the size of trust region. • if ρk < η0 , let xk+1 = xk , ∆k+1 = γ1 ∆k and return to Step 1 in the main framework. • if η0 ≤ ρk < η1 , perform the sufficient reduction test with Type I error αk . If the test is passed, let xk+1 = x∗k and ∆k+1 = ∆k , and return to Step 1 in the main framework. Otherwise, let xk+1 = xk , ∆k+1 = γ1 ∆k and return to Step 1 in the main framework. • if ρk ≥ η1 , perform the sufficient reduction test with Type I error αk . If the test is passed, let xk+1 = x∗k and ∆k+1 = γ2 ∆k , and return to Step 1 in the main framework. Otherwise, let xk+1 = xk , ∆k+1 = γ1 ∆k and return to Step 1 in the main framework. Stage II. Second-order polynomial (for kth iteration) 1. Build a second-order polynomial. 2. Solve the subproblem and obtain a new solution x∗k . 3. Take n0 replications at the new solution x∗k . 4. Conduct the ratio comparison and sufficient reduction tests and update the size of trust region. • if ρk < η0 , go to inner loop. • if η0 ≤ ρk < η1 , perform the sufficient reduction test with Type I error αk . If the test is passed, let xk+1 = x∗k and ∆k+1 = ∆k , and return to Step 1 in the main framework. Otherwise, go to inner loop. • if ρk ≥ η1 , perform the sufficient reduction test with Type I error αk . If the test is passed, let xk+1 = x∗k and ∆k+1 = γ2 ∆k , and return to Step 1 in the main framework. Otherwise, go to inner loop. Inner Loop (for ith loop in kth iteration) 1. Set subiteration count i = 0. Let nk0 = n0 , mk0 = m0 , and ∆k0 = ∆k . 2. Let i = i + 1. Increase the sample sizes for the current and new solutions as Section 3.4 describes. 3. Increase the sample size of the gradient estimator as Section 3.4 describes. 4. Build a second-order polynomial. 5. Solve the subproblem and obtain a new solution x∗ki . 6. Conduct the ratio comparison and sufficient reduction tests and update the size of trust region. • if ρki < η0 , let ∆ki+1 = γ1 ∆ki , and go back to Step 2 in inner loop. • if η0 < ρki < η1 , perform the sufficient reduction test with Type I error αk . If the test is passed, let xk+1 = x∗ki and ∆k+1 = ∆k , and return to Step 1 in the main framework. Otherwise, go back to Step 2 in inner loop. • if ρki > η1 , perform the sufficient reduction test for x∗ki with Type I error αk . If the test is passed, let xk+1 = x∗ki and ∆k+1 = γ2 ∆k , and return to Step 1 in the main framework. Otherwise, go back to Step 2 in inner loop.
Figure 2: STRONG Algorithm
2
When a first-order polynomial is used, it is easy to find that { } b b T d : ∥d∥ ≤ ∆k = − ∇gk ∆k dk = arg min gbk + ∇g k b k∥ ∥∇g and τk = 1. Then, x∗k = xk + dk and b b T ∇gk ∆k . rk (x∗k ) = gbk − ∇g k b ∥∇gk ∥ b k ∥ · ∆k . The conclusion of the lemma holds Because rk (xk ) = gbk , we have rk (xk ) − rk (x∗k ) = ∥∇g for first-order polynomials. When a second-order polynomial is used, by the analysis of the deterministic trust region method (Page 69 of Nocedal and Wright (1999)), we have b k ∇g ∆ , b k∥ k ∥∇g { } { b k ∇g b k ] , if ∇g b b b k ∥3 /[∆k ∇g b TH b TH min 1, ∥∇g k k k ∇gk > 0,
dk = − τk =
1,
otherwise.
Furthermore, by the second-order polynomial, we have rk (x∗k ) = rk (xk ) −
b TH b b b T ∇g b k ∇g 1 ∇g k k ∇gk 2 2 k τk ∆k + τk ∆k . b k∥ b k ∥2 2 ∥∇g ∥∇g
(1)
b TH b b Now we consider three cases. In the first case, ∇g k k ∇gk ≤ 0. Then, τk = 1. By Equation (1), we have b k ∥ · ∆k . rk (xk ) − rk (x∗k ) ≥ ∥∇g 3 b TH b b b b Tb b In the second case, ∇g k k ∇gk > 0 and ∥∇gk ∥ /[∆k ∇gk Hk ∇gk ] ≥ 1. Then, τk = 1. By Equation (1), we have 1 b rk (xk ) − rk (x∗k ) ≥ ∥∇g k ∥ · ∆k . 2 3 3 b Tb b b b Tb b b b TH b b In the third case, ∇g k k ∇gk > 0 and ∥∇gk ∥ /[∆k ∇gk Hk ∇gk ] < 1. Then, τk = ∥∇gk ∥ /[∆k ∇gk Hk ∇gk ]. By Equation (1), we have
rk (xk ) − rk (x∗k ) =
b k ∥4 b k ∥2 1 ∥∇g 1 ∥∇g ≥ , b TH b k ∇g b k bk ∥ 2 ∇g 2 ∥H k
where the last inequality follows from a property of the Euclidean norm, i.e., ∥AB∥ ≤ ∥A∥ · ∥B∥ (Page 456 of Kuang (2004)). Combining all three cases, we conclude that the conclusion of the lemma holds for second-order polynomials, including both Stage II and inner loop. This concludes the proof of the lemma.
A.3
Proof of Lemma 2
Proof. First note that Lemma 2 is for inner loop. Given any iteration k, {∆ki , i = 0, 1, 2, 3 . . .} is a deterministic sequence. The statement ”w.p.1” is with respect to the observation noise. Since
3
both statements can be proved by using the same scheme, we only prove the first statement. By Chebyshev’s inequality (Billingsley, 1995), { } σ 2 (xk ) supx∈Rp σ 2 (x) Pr |b ≤ . gki (xk ) − g(xk )| > ∆2ki ≤ 4 n ki ∆ki nki ∆4ki
(2)
Since nki+1 ≥ (1 + ⌈γ1−4 ⌉) nki and ∆ki+1 = γ1 ∆ki , then [ ]−1 supx∈Rp σ 2 (x) supx∈Rp σ 2 (x) γ1−4 < 1. · ≤ nki+1 ∆4ki+1 nki ∆4ki 1 + γ1−4 By the series ratio test (Rudin, 1976), we know the series ∞ ∑ sup i=1
σ 2 (x) < ∞. nki ∆4ki x∈Rp
Then, by the first Borel-Cantelli Lemma in Billingsley (1995), { } Pr |b gki (xk ) − g(xk )| > ∆2ki i.o. = 0. Therefore, the first statement holds. This concludes the proof of the lemma. Note that Lemma 2 can also be proved for DOE-based gradient estimators by applying the same argument to each dimension of the gradient estimator. First observe that the covariance matrix of the OLS-based gradient estimator is σ 2 (x)(X T X)−1 (Myers et al., 2009), which is dominated by supx∈Rp σ 2 (x)(X T X)−1 . Since X is an orthogonal design matrix, X T X is a diagonal matrix, so is (X T X)−1 . Therefore, the variance of each dimension of the OLS-based gradient estimator is dominated by supx∈Rp σ 2 (x)/N, where N is the total number of design points in the inner loop. b k (xk ) and v is the first element of ∇g(xk ), use the similar Suppose that u is the first element of ∇g i scheme as above, we can show that √ Pr {|u − v| > ∆ki / p i.o.} = 0. Finally, we can also prove the second statement for DOE-based gradient estimators.
A.4
Proof of Lemma 3
Proof. By the same arguments used in proving the first statement of of Lemma 2, we can also prove that { } Pr gbki (x∗ki ) − g(x∗ki ) > ∆2ki i.o. = 0. (3) Note that rk (x∗ ) − gbk (x∗ ) ki ki i i ≤ rki (x∗ki ) − g(x∗ki ) + gbki (x∗ki ) − g(x∗ki ) b T (xk )(x∗ − xk ) + 1 (x∗ − xk )T H b k (xk )(x∗ − xk ) = gbki (xk ) + ∇g ki ki ki i 2 ki 1 ∗ T ∗ T ∗ − g(xk ) − ∇g (xk )(xki − xk ) − (xki − xk ) H(ξi )(xki − xk ) + gbki (x∗ki ) − g(x∗ki ) 2 b ≤ |b gki (xk ) − g(xk )| + ∥∇gki (xk ) − ∇g(xk )∥ · ∥x∗ki − xk ∥ 1 b + ∥Hki (xk ) − H(ξi )∥ · ∥x∗ki − xk ∥2 + gbki (x∗ki ) − g(x∗ki ) 2 b k (xk ) − ∇g(xk )∥ · ∆k + β1 + κ ∆2 , ≤ |b gki (xk ) − g(xk )| + gbki (x∗ki ) − g(x∗ki ) + ∥∇g ki i i 2 4
b k (xk )∥ ≤ κ. Let where the last inequality holds because ∥x∗ki − xk ∥ ≤ ∆ki , ∥H(x)∥ ≤ β1 and ∥H i c = 3 + (β1 + κ)/2. By Assumption 2, along with Lemma 2 and Equation (3), we complete the proof.
A.5
Proof of Lemma 4
Proof. To show that STRONG can always find a new satisfactory solution, we need to show that, for any center point xk ∈ Rp with ∥∇g(xk )∥ = ε > 0, STRONG can find a new solution x∗ki that can pass both the RC and SR tests when i is sufficiently large w.p.1. We first consider the RC test. Note that ρki − 1 =
gbki (xk ) − gbki (x∗ki ) rki (x∗ki ) − gbki (x∗ki ) − 1 = . rki (xk ) − rki (x∗ki ) rki (xk ) − rki (x∗ki )
b k (xk )∥ ≤ κ, we have By Lemma 1 and because ∥H i rk (x∗ ) − gbk (x∗ ) i i ki ki { }. |ρki − 1| ≤ 1 b b k (xk )∥/κ, ∆k ∥ ∇g (x )∥ · min ∥ ∇g k k i i i 2
(4)
Suppose that the RC test is failed infinitely often, i.e., ρki < η0 i.o. Then, mki → ∞ and
b
∆ki → 0, which imply that Pr{ ∇gki (xk ) < ε/2 i.o.} = 0 and Pr{∆ki > δ i.o.} = 0 for any δ > 0. Then, by Lemma 3, rk (x∗ ) − gbk (x∗ ) 2 4cδ i i ki ki { }> i.o. = 0. (5) Pr 1 ∥∇g ε min{ε/2κ, δ} b k (xk )∥ min · ∥∇g b k (xk )∥/κ, ∆k i i i 2 √ Let δ = max{ (1 − η0 )/8cκϵ, (1 − η0 )ϵ/4c}. We can easily find that 4cδ 2 ≥ 1 − η0 . ε min{ε/2κ, δ} Then, by Equations (4) and (5), we have Pr{|ρki − 1| > 1 − η0 i.o.} = 0, which implies that Pr{ρki < η0 i.o.} = 0. This is a contradiction! Therefore, when i is sufficiently large, x∗ki can always pass the RC test w.p.1. Now we consider the SR test. Note that only the solutions that pass the RC test take the SR test. Therefore, we know that ρki ≥ η0 for all x∗ki in the SR test, which implies that ( ) gbki (xk ) − gbki (x∗ki ) ≥ η0 rki (xk ) − rki (x∗ki ) ≥ η0 ζki , where the second inequality follows from Lemma 1. Then, the test statistic t∗ki =
gbki (xk ) − gbki (x∗ki ) − η02 ζki η0 (1 − η0 )ζki ≥ , Ski Ski
(6)
where Sk2i = S 2 (xk , nki )/nki + S 2 (x∗k , n∗ki )/n∗ki . Suppose that x∗ki always fails the SR test, i.e. t∗ki ≤ t1−αk ,df for all i = 1, 2, . . . . Then, both nki → ∞ and n∗ki → ∞, thus, Sk2i → 0 w.p.1 and t1−αk ,df → z1−αk . Furthermore, from the analysis
5
of the RC test, we know that x∗ki can always pass the RC test when i is sufficiently large. Therefore, ∆ki is bounded away from zero w.p.1. Then, it is easy to see that { } 1 ∥b gki (xk )∥ ζki = ∥b gk (xk )∥ min ,∆ b k (xk )∥ ki 2 i ∥H i
is also bounded away from zero w.p.1 when i is sufficiently large. Therefore, by Equation (6), t∗ki → ∞ as i → ∞, which implies that t∗ki > t1−αk ,df when i is sufficiently large. This is a contradiction! Therefore, x∗ki will pass the SR test at some i w.p.1. This concludes the proof of the lemma.
A.6
Proof of Lemma 5
Proof. First note that ∆′k is the trust region size when inner loop at iteration k is terminated, {∆′k , k ≥ 0} is a stochastic sequence since each iteration k may have different number of inner loops. ˜ Now, if ∥∇g b ′ (xk )∥ > 0, In light of the algorithm, we know that for every iteration k, ∆k ≥ γ1 ∆. k √ similar as the proof of Lemma 4, we know that for any k, when ∆ki < δ = max{ (1 − η0 )/8cκϵ, (1− η0 )ϵ/4c}, the RC test will be passed and the trust region will no longer shrink. Note that δ does not depend on k. Therefore, there exists a lower bound for ∆′k , i.e., lim inf k→∞ ∆′k > 0. This concludes the proof of the lemma.
A.7
Proof of Lemma 6
Proof. Let m′kj denote the total number of replications required for estimating gradient in inner b ′ (xk )∥ = 0, then m′ → ∞ as j → ∞. loop of iteration k. We first prove that if limj→∞ ∥∇g kj
j
kj
We prove by contradiction. Assume that there exists an m ˜ such that for all kj , m′kj ≤ m. ˜ Let ′ ′ b ˜ Akj = {ω : ∥∇gkj (xkj )∥ > ϵ}, where ϵ is an arbitrarily small positive constant. Since mkj ≤ m ∑∞ ∑ ˜ = ∞. By the second for all kj , inf kj Pr(Akj ) = p˜ > 0. It follows that j=1 p kj Pr(Akj ) ≥ Borel-Cantelli Lemma (Billingsley (1995)), } { b ′ (xk )∥ > ϵ i.o. = 1, Pr(Akj i.o.) = Pr ω : ∥∇g kj j
b ′ (xk )∥ → 0. Therefore, the assumption is not true and m′ → ∞ as which contradicts limj→∞ ∥∇g j kj kj b ′ is uniformly convergent. Therefore, j → ∞. Further, by Assumption 2, the gradient estimator ∇g kj w.p.1 b ′ (xk )∥ = lim ∥∇g(xk )∥ = 0. lim ∥∇g kj j j j→∞
j→∞
This concludes the proof of the lemma.
A.8
Proof of Lemma 7
T X . If X is an orthogonal design matrix, then W is a diagonal matrix. The Proof. Let Wm = Xm m m m ∑m ∑m 2 2 . Therefore, as m → ∞, W ii = ii , equals x ith diagonal element, denoted as Wm m u=1 xui → u=1 ui −1 ∞, for all i. It follows then (Wm ) → 0 since its diagonal element is the reciprocal of diagonal element of Wm and the off-diagonal elements are all zeros.
6
A.9
Proof of Theorem 1
Proof. Note that Theorem 1 is analyzed for outer loop, where xk is random in the decision space.
b ′
The “w.p.1” is respect to the randomness of the algorithm. Also note that, if lim inf k→∞ ∇gk (xk ) = b ′ (xk )∥ = 0. By Lemma 6, we 0, then there exists a subsequence of {xk }∞ such that limj→∞ ∥∇g j
j=0
kj
j
have limj→∞ ∥∇g(xkj )∥ = 0 w.p.1, which implies that lim inf k→∞ ∥∇g(xk )∥ = 0 w.p.1. Therefore, it suffices to prove that
b ′
lim inf ∇g (x ) (7) k k = 0 w.p.1. k→∞
b ′
We prove Equation (7) by contradiction. Suppose that lim inf k→∞ ∇g (xk ) > 0. Then, by k
′ Lemma 5, lim inf k→∞
∆k > 0 w.p.1. Then, for almost all sample paths, there exists a constant
b ′ ′ K1 > 0 such that ∇g k (xk ) ≥ ε and ∆k ≥ ∆L when k ≥ K1 for some ϵ > 0 and ∆L > 0. Let
b ′ (xk )∥ · ∆′ for Stage I, or Ak = {ω : g(xk ) − g(xk+1 ) ≤ η02 ∥∇g k {k } b ′ (xk )∥ ∥∇g 1 2 b ′ ′ k g(xk ) − g(xk+1 ) ≤ η0 ∥∇gk (xk )∥ · min , ∆k for Stage II or inner loop }. b ′ (xk )∥ 2 ∥H k ∑ The SR test guarantees that Pr(Ak ) ≤ αk . Since k αk < ∞, by the first Borel-Cantelli Lemma (Billingsley, 1995), Pr(Ak i.o.) = 0. Then, for almost all sample paths, there exists a constant K2 > 0 such that b ′ (xk )∥ · ∆′ for Stage I, or g(xk ) − g(xk+1 ) > η02 ∥∇g k k { } b ′ (xk )∥ ∥∇g 1 2 b ′ ′ k g(xk ) − g(xk+1 ) > η ∥∇gk (xk )∥ · min , ∆k for Stage II or inner loop b ′ (xk )∥ 2 0 ∥H k when k ≥ K2 . Let K = max{K1 , K2 }. Then, for almost all sample paths, ∞ ∑ k=K
{ } ˜ + 1 η 2 ε nII (K) min ε , ∆L , [g(xk ) − g(xk+1 )] ≥ η02 ε nI (K)∆ 2 0 κ
(8)
where nI (K) and nII (K) are the numbers of successful iterations of Stages I and II after iteration ˜ is the lower bound of the trust-region size for STRONG to switch from Stage I to II, and κ K, ∆ b ′ (xk )∥. Therefore, it is clear that, for almost all sample paths, is the upper bound of ∥H k ∞ ∑
[g(xk ) − g(xk+1 )] = ∞.
k=K
However, note that
∞ ∑
[g(xk ) − g(xk+1 )] ≤ g(xK ) − lim inf g(xk ) < ∞ k→∞
k=K
because g(x) is bounded below by Assumption 1. This is a contradiction. Therefore, Equation (7) holds. This concludes the proof of the theorem.
7
References Billingsley, P. 1995. Probability and Measure. John Wiley and Sons., New York. Kuang, J. 2004. Applied Inequalities. 3rd ed. Shangdong Science and Technology Press. Myers, R.H., D.C. Montgomery, C.M. Anderson-Cook. 2009. Response Surface Methodology-Process and Product Optimization Using Designed Experiments. John Wiley and Sons., New York. Nocedal, J., S.J. Wright. 1999. Numerical Optimization. Springer. Rudin, W. 1976. Principles of Mathematical Analysis. McGraw-Hill., New York.
8