FULLY AUTOMATIC hp-ADAPTIVITY IN THREE ...

Viewer
Transcript

ICES Report 04-22

FULLY AUTOMATIC hp-ADAPTIVITY IN THREE DIMENSIONS W. Rachowicz∗ , D. Pardo∗∗ , and L. Demkowicz∗∗ ∗ Section

of Applied Mathematics Cracow University of Technology ∗∗ Institute

for Computational Engineering and Sciences The University of Texas at Austin Abstract

The paper presents a three-dimensional implementation of the fully automatic hp-adaptive mesh re£nement strategy proposed in [11]. The methodology is based on a minimization of the projectionbased interpolation error with a £xed number of degrees-of-freedom. A (coarse) grid is £rst uniformly re£ned in both h and p to yield a corresponding £ne grid. A two grid solver is used to solve the problem on the £ne grid or iterate towards the £ne grid solution. The £ne grid (possibly partially converged) solution serves then as a reference solution to be interpolated on the original coarse mesh, and the next, optimal coarse grid to be determined. The difference between the coarse and £ne grid solutions provides an excellent error estimate for the coarse grid error. The procedure is repeated, with the coarse and £ne meshes generated during the adaptive process simultaneously, until a prescribed error tolerance for the coarse grid error is met. The solution obtained on the last £ne grid is considered to be the £nal product delivered by the adaptive procedure.

Key words: hp £nite elements, automatic adaptivity AMS subject classi£cation: 65N30, 35L15

Acknowledgment The work of the £rst author has been supported partially by J.T. Oden ICES Fellowhip program, and by the Polish Committee for Scienti£c Research under the Grant 7 T11F 014 20. The work of the second and third authors has been supported by Air Force under Contract FA9550-04-1-0050. The computations reported in this work were done through the National Science Foundation’s National Partnership for Advanced Computational Infrastructure.

1

1 Introduction It has been shown almost two decades ago by Babu³ska and his collaborators [2, 18] that Finite Element discretizations in which element size h and locally variable polynomial order p are distributed in an optimal way, will deliver exponential convergence rates in terms of error vs. the number of degrees-of-freedom (d.o.f.). Designing algorithms that deliver a sequence of optimally re£ned hp meshes, and the exponential convergence in a fully automatic mode, with no user’s interaction, has been the “holy grail” of hp computations for a decade and a half, see Introduction in [10]. The present work is an extension of [16], based on the concept of minimizing the projection-based interpolation error, allowing for a systematic generalization of our original ideas to two- and three space dimensions. The 2D version of the presented algorithm was presented in [11] for elliptic problems, and in [5] for 2D Maxwell equations. The theory of projection-based interpolation in 3D [4, 7] differs considerably from its 2D version [6], and it was not clear at all in the beginning of the reported project that the hp algorithm will work in 3D. Well, we are happy to report that it has. The results on the hp mesh optimization presented here, con£rm that the idea of hp mesh optimization based on minimizing the projection-based interpolation error is correct, and we hope to convince the reader that it is also implementable.

Preliminaries Given a bounded Lipschitz domain Ω ⊂ R I 3 , we consider a model elliptic boundary-value problem,  ∂u ∂u ∂   (aij ) + bi + cu = f in Ω −    ∂xj ∂xi  ∂xi u = u0 on ΓD    ∂u   + βu = g on ΓN . ni aij  ∂xj The variational formulation of (1.1) is as follows. ( u∈u ˜0 + V

(1.1)

(1.2)

b(u, v) = l(v),

∀v ∈ V ,

where u ˜0 is a lift of u0 in H 1 (Ω), and where the bilinear and linear forms are de£ned as follows, ¶ Z µ ∂u ∂u ∂v + bi v + cuv dx b(u, v) = aij ∂xi ∂xj ∂xi ZΩ βuv dS + ΓN Z Z gv dS, f v dx + l(v) = ΓN

Ω

2

(1.3)

with V = {v ∈ H 1 (Ω) : v = 0 on ΓD } .

(1.4)

We assume that domain Ω is covered with an hp adaptive £nite element mesh which de£nes the corresponding £nite element space V hp ⊂ V . The £nite element solution is obtained by solving the approximate variational problem, ( uhp ∈ u ˜0 + Vhp (1.5) b(uhp , vhp ) = l(vhp ), ∀vhp ∈ Vhp . In two and one dimensions, the corresponding formulations take an analogous form with a range of indices for spatial variables being reduced to 2 or 1, respectively. With standard assumptions on material data aij , bi , c, β, to guarantee continuity and V -coercivity of the bilinear form, the Cea’s Lemma expresses optimality of the Galerkin approximation, ku − uhp k1,Ω ≤ C

inf

vhp ∈Vhp

ku − (˜ u0 + vhp )k1,Ω ,

(1.6)

with a mesh independent constant C > 0. In particular, the inequality holds if v hp is an interpolant of u−u ˜0 . This provides the standard motivation to consider the interpolation error rather than the true error as a target of the mesh optimization process: we look for an hp adaptive mesh such that, for a £xed number of degrees-of-freedom, an estimate of the corresponding interpolation error is minimized. The last quantity is localized in a sense that the interpolation error can be found for a given element independently of the rest of the mesh. The presented hp algorithm is based on the search for an optimal mesh to interpolate the exact solution u(x). Of course, in general, u(x) is not available, and it must be replaced with an approximation of the exact solution that is essentially better than the actual FE solution on the current mesh. We re£ne the current coarse grid uniformly in both h and p, i.e. each hexahedral element in the mesh is divided into eight element sons, and the order of each node is raised by one, uniformly in all directions. The £ne grid solution u h/2,p+1 replaces then the unknown exact solution in the mesh optimization process. The next optimal coarse grid replaces then the old one, and the whole process is repeated until a prescribed error tolerance is met. The error is estimated simply by evaluating the difference between the coarse and £ne grid solutions. A two-grid solver is used to solve the £ne grid problem. As we will demonstrate in our numerical experiments, the £ne grid solution needs not to be fully convergent; it is suf£cient to perform only a few two-grid iterations. As long as the convergence error is an order of magnitude smaller then the (estimated) approximation error from the previous step, the overall mesh optimization process and resulting convergence history seem to be unaffected. Mesh regularity assumptions. We refer to [10] for details of the 3D hp code within which the presented strategy has been implemented. Out of many technical assumptions, we want to mention only three. The hp 3

meshes supported in the code must be 1-irregular. This practically means that no element can be h-re£ned unless all its nodes are regular. A special algorithm enforces “unwanted” h-re£nements to eliminate the hanging nodes and make the h-re£nement possible. Both h and p re£nements may be fully anisotropic, elements can be broken into two, four, or eight element sons, and the polynomial order used in the trivariate approximation can be different in different directions. The anisotropy is crucial in achieving optimal convergence rates in 3D. Finally, at any point of the game, the meshes must satisfy the minimum rule: the order for edges or faces is always set to the minimum of orders for the adjacent elements. Scope of the presentation. The scope of the presentation is as follows. We will begin by reviewing the fundamental concept of projection-based interpolation, and discuss its generalization to piecewise polynomial spaces. A bulk of the presentation will be devoted to the discussion of the presented hp mesh optimization algorithm. After a short overview of the algorithm, we discuss the “competition” between h and p re£nements resulting (along with the use of isotropy ¤ags) in the topology of the new mesh, and discuss details of routine optimize forming the core of the implemented algorithm. After summarizing the algorithm, we present two numerical examples: the Fichera problem whose solution exhibits both edge and vertex singularities, and a made up “shock problem” in which the solution is analytical but very high gradients are making it indistinguishable from singular problems in the preasymptotic range. Two-dimensional equivalents of these examples: the L-shape domain problem, and a 2D version of the shock problem were used in our 2D experiments. Finally, we conclude the paper with a short presentation on the two-grid solver, which enables the application of the presented adaptive strategy to practical problems.

2 Projection Based Interpolation Let K denote an arbitrary (not necessarily unit) hexahedron in R I 3 with faces f and edges e. The idea of the projection-based interpolation is based on three fundamental assumptions. Locality: the interpolant ΠK (u) should be constructed using the restriction of function u to element K only. Conformity: the union of element interpolants of an H 1 -conforming function must be itself H 1 -conforming, Optimality: the interpolation error, ku − ΠK ukH 1 (K) , should exhibit the same convergence rates (in both h or p) as the best approximation error. We consider a general hexahedral element of variable order. We introduce an element local system of coordinates ξ1 , ξ2 , ξ3 . For each of six faces of the element, we introduce the corresponding local face coordinates ξf,1 , ξf,2 using lexicographic ordering,

4

• for faces ξ1 = 0, 1,

ξf,1 = ξ2 , ξf,2 = ξ3 ,

• for faces ξ2 = 0, 1,

ξf,1 = ξ1 , ξf,2 = ξ3 ,

• for faces ξ3 = 0, 1,

ξf,1 = ξ1 , ξf,2 = ξ2 .

Finally, for each of the twelve edges e, we use the corresponding element coordinate ξ i with axis i parallel to edge e, for the edge local coordinate ξe . The face and edge local coordinates are implied by the element local coordinates. With each element edge e, we associate a polynomial degree p e , with each face f , a polynomial degree pf = (pf,1 , pf,2 ) and, £nally, a degree p = (p 1 , p2 , p3 ) is associated with the (interior of) the element. We assume that the edge, face and interior orders satisfy the minimum rule: • the order of element faces must not exceed the element order. More precisely, p f,i ≤ pj where j is the element axis parallel to face axis i; • the order of element edges must not exceed the order for the adjacent element faces, i.e. p e ≤ pf,i , where face axis i is parallel to edge e, for both faces f adjacent to edge e. In practice, the conditions are enforced by setting the face and edge orders to the minimum order of all adjacent elements, hence the name - the minimum rule. The element space of shape functions X(K) = Qppf ,pe (K) consists of all trivariate polynomials from Qp = Q(p1 ,p2 ,p3 ) = P p1 ⊗ P p2 ⊗ P p3 whose traces to element faces f reduce to polynomials of possibly lower degree from Qpf = Q(pf,1 ,pf,2 ) , and whose traces to element edges e, reduce to polynomials of possibly lower order pe . The element space of shape functions is a well de£ned subspace of Q (p1 ,p2 ,p3 ) . To indicate the dependence on face and edge orders, we will denote it shortly by Q ppf ,pe = Qppf ,pe (K). The p p corresponding trace spaces for faces f will be denoted by Q pfe = Qpfe (f ). Finally, P pe will denote the space of traces on edge e. Let now u be a function de£ned on element K, and let 0 < ² < 12 denote an arbitrarily small constant. The projection based interpolant up = ΠK u [13, 9, 6, 4, 7] is de£ned in the following steps. Interpolation at vertices. The locality and conformity conditions imply that we have little choice - the interpolant up must match function u at the vertices, up (a) = u(a),

for each vertex a .

We lift the vertex values using any polynomial extension that lives in the element space. It is natural to assume though that the lift is independent of all other polynomial degrees p, p f , pe . This leads naturally to the simplest - trilinear interpolant u1 ∈ Q(1,1,1) (K). 5

Edge projections. We subtract the linear interpolant u1 from function u. The difference u − u1 vanishes at the element vertices. We project the difference onto the space of edge polynomials of order p e , pe (e), vanishing at vertices - the edge bubbles: P−1 ( p pe , u2,e ∈ P−1 ku − u1 − up2,e k²,e → min .

The projection is done in the H ² (e) - norm, kuk²,e = kukH ² (e) . Each edge projection u2,e is then lifted to the element, using the corresponding edge shape functions - the lift is of order p e in the direction of the edge, linear in two other directions, and vanishes over all faces f that are not adjacent P p ˜2,e . to edge e. The sum of the lifts of the edge projections, is the edge interpolant u p2 = e u

Face projections. We subtract the edge interpolant from u − u 1 , and project the resulting difference u − u1 − up2 onto the space of face polynomials of order pf , vanishing along the face edges - the face pf (f ), bubbles: Q−1  p pf  u3,f ∈ Q−1 ,  |u − u1 − up − up | 1 2 3,e +²,f → min . 2

1 +² 2

The projection is done in the H (f ) - seminorm, |u| 1 +²,f = |u| 12 +² . Each face projection up3,f (f ) H 2 is then lifted to the element using the face shape functions. The extension is of order p f,1 , pf,2 in the two face directions, linear in the direction perpendicular to the face, and vanishes over remaining P p ˜3,f . faces. The sum of the lifts of the face projections, is the face interpolant u p3 = f u

Interior projection. We subtract the face interpolant from u − u 1 − up2 , and project the resulting difference u−u1 −up2 −up3 onto the space of element polynomials of order p, vanishing on the element boundary - the interior bubbles: Qp−1,−1 (K), ( p u4 ∈ Qp−1,−1 , |u − u1 − up2 − up3 − up4 |1,K → min .

The projection is done in the H 1 -seminorm. Each edge, face or element minimization problem is equivalent to the solution of a system of linear equations. The £nal interpolant is de£ned as the sum of its vertex, edge, face and interior interpolants, up = ΠK u = u1 + up2 + up3 + up4 . Notice that the £nal interpolant is independent of the choice of lifts from edges and faces, as long as they live in the element space of shape functions. The face and edge projections require only an extra ² of regularity, the interpolated function must “live” in space H 1+² (K). The most demanding regularity assumption comes from the interpolation at vertices, the 6

interpolated function must be in H 3/2+² (K). Introducing a quasi local interpolation at vertices, see [17], makes it possible to reduce the regularity assumptions to a minimum, at the expense of trading the locality for a quasilocality. The discussed H 1 -interpolation operator belongs to a family of operators forming the de Rham diagram [9]. The use of seminorms rather than norms in the face and element projections, is motivated with the commutativity of the interpolation operators. The following result was proved in [4, 7] for a tetrahedral element.

THEOREM 1 The following estimate holds, 

ku−Πuk1,K ≤ C 

min

v∈Pppf ,pe (K)

ku − vk1,K +

X f

min p

v∈Ppef (f )

3 +² 2

ku − vk 1 +²,f + 2

X e

min

pe (e) v∈P−1

(K), and u1 denoting the linear interpolant of u. Constant C = O(² for every u ∈ H of polynomial orders p, pf , pe .



ku − u1 − vk²,e  ,

− 21

(2.7) ) is independent

The result leads to ²-(sub)optimal p-estimate and optimal hp estimate for the projection-based interpolation error. We believe that a similar result can be established for the hexahedral element. Interpolation on one-irregular meshes. The described projection-based interpolation procedure produces a globally conforming interpolant on regular meshes. This is a consequence of the hierarchical construction of the interpolant. Values of the interpolant at vertices, on edges and faces are common for all adjacent elements. One may perform the interpolation element-wise or in a global fashion: all vertices £rst, then all edges, then faces and, £nally, all element interiors. The situation is more complicated for irregular meshes when elements may share only a portion of an edge or face. In order to ensure the global continuity, d.o.f. for constrained vertex, edge or face nodes have to be derived from the corresponding constraining, parent edges or faces. This implies that the global loops listed above have to be modi£ed. The interpolation has to be done cycling through vertices, edges, and faces several times before the interpolation over elements is possible. Each time, we visit an edge with constrained vertices, the edge interpolation is conditional upon completing £rst the interpolation for the parent nodes (constraining edges or faces). A similar situation holds for faces. Interpolation with piecewise polynomials. The idea of the projection-based interpolation can easily be generalized to spaces of piecewise polynomials. Let K denote again an arbitrary hexahedron in R I 3 , with faces f , edges e and vertices a. Let K be covered with a FE mesh consisting of one, two, four or eight 7

elements of variable order corresponding to all possible h-re£nements of a single hexahedron into two, four, or eight element-sons. Let Xhp (K) denote the £nite element space corresponding to the hp mesh covering the hehahedron, Xhp (f ) denote the corresponding space of traces on a face f , and X hp (e) denote the corresponding space of traces on an edge e. Let Vhp (K), Vhp (f ), Vhp (e) denote the corresponding spaces of test functions, Vhp (K) = {vhp ∈ Xhp (K) : vhp = 0 on ∂K} Vhp (f )

= {vhp ∈ Xhp (f ) : vhp = 0 on ∂f }

Vhp (e) = {vhp ∈ Xhp (e) : vhp = 0 on ∂e} . r H (K), r > 3/2, we de£ne its projection-based interpolant

Given a function u ∈ by the following steps.

Π hp u = uhp ∈ Xhp (K),

• Interpolation at vertices a, uhp (a) = u(a) ∀a .

(2.8)

• Projection along edges e, (uhp − u, vhp )²,e = 0,

∀vhp ∈ Vhp (e),

∀e,

(2.9)

• Projection over faces f , (uhp − u, vhp ) 1 +²,f = 0, 2

∀vhp ∈ Vhp (f ),

∀f,

(2.10)

• Projection over element K, (uhp − u, vhp )1,K = 0,

∀vhp ∈ Vhp (K)

(2.11) 1

Here (u, v)²,e , (u, v) 1 +²,f , (u, v)1,K denote the H ² (e) inner product, and the H 2 +² (f ), H 1 (K) semi- inner 2 products, respectively. The interpolation can again be viewed as a succession of local Dirichlet problems on edges, faces and the hexahedron, with the results from the previous steps determining the approximation of the Dirichlet data. For polynomial spaces, the described procedure reduces to the original interpolation. Again, the interpolation can be generalized to the case of meshes with hanging nodes. Choice of norms. The choice of norms used in the element, face and edge projections is dictated by the Trace Theorem and re¤ects minimum regularity assumptions. Constants ² present in the de£nition of edge 1 norms and face seminorms can be dropped (see [4, 7]). Additionally, we replace the H 2 -face and L2 -edge inner products with scaled H 1 -products. For an edge e parametrized with x(ξ), ξ ∈ (0, 1), we use, µ ¶ Z Z 1 du dv ds du dv ds −2 dξ . (2.12) (u, v)e = ds = dξ 0 dξ dξ dξ e ds ds |{z} | {z } h

h−2

8

Notice that for straight edges, the corresponding norm scales with edge size h identically as the L 2 norm. Similarly, for a face parametrized with parametrization, (0, 1)2 3 ξ = (ξ1 , ξ2 ) → x(ξ) ∈ R I3 , we use the following face inner product, (u, v)f = where A=|

∂x ∂x × |, ∂ξ1 ∂ξ2

Z

0

1Z 1 0

gij =

∂u ∂v ij 3 g A 2 dξ1 dξ2 , ∂ξi ∂ξj ∂xk ∂xk , ∂ξi ∂ξj

(2.13)

{g ij } = {gij }−1 . 1

Again, for a square face, the corresponding norm scales identically in terms of face size h as the H 2 -norm. The use of H 1 -norms for face and edge projections does not preclude optimal convergence rates as long as the interpolated function is suf£ciently regular, u ∈ H 2+² (K), the projections in H 1 -norms yield the 1 same convergence rates in order p as the projections in L 2 and H 2 -norms. The right scaling in element size h guarantees also the optimal h-convergence rates.

3 An Overview of the Mesh Optimization Algorithm The main logic behind the construction of the hp mesh optimization algorithm is based on splitting the algorithm into three steps: • mesh optimization on coarse mesh edges, • mesh optimization on coarse mesh faces, • mesh optimization in the interior of coarse mesh elements. Only the £rst step of the algorithm involves choosing between h and p re£nements. Once all the edges selected to be broken have been marked, the requested edge re£nements are executed by breaking all adjacent elements in an appropriate way. Given an element with one or more edges marked to be broken, the corresponding minimal h-re£nement ¤ag 1 is possibly upgraded by taking into account the element isotropy ¤ag re¤ecting the directional behavior of the error function. Once the element re£nement ¤ags are known, the requested re£nements are executed using the 1-irregular mesh re£nements algorithm [10]. This may cause some unwanted re£nements resulting from the enforcement of the 1-irregularity rule. 1

Recall that an element can be broken in seven different ways into two, four, or element sons

9

At this point, the topology of the new coarse mesh is fully known, and it remains only to determine the optimal orders of approximation for edges, faces, and element interiors. Topology wise, we might say that the algorithm is edge driven. In selecting the optimal orders of approximation for element edges, we follow our original algorithm from [16]. First, for each coarse grid edge i with ni degrees of freedom, we de£ne the corresponding cost function ei (ni ). If an edge has not been broken, n = p and e is identi£ed as the interpolation error for the edge of order p + 1. If the edge has been h-re£ned, n degrees of freedom can be realized with different orders p1 , p2 for the edge sons, and value e(n) corresponds then to such an allocation of the d.o.f. that produces the minimal interpolation error. De£ning function e(n) involves thus comparing various local pre£nements. Once functions e i (ni ) are known for each edge, we formulate a global optimization problem aiming at minimizing the global error with the number of d.o.f. £xed 2 . Find the distribution of ni , i = 1, . . . , M such that,  M  X    e2i (ni ) → min   i=1

M  X    ni = const .  

(3.14)

i=1

Here M denotes the number of edges. It has been proved in [16] that the strategy that leads to solving this minimization problem, consists in selecting the edges for which the reduction of the error per number of d.o.f. added, is the largest possible. That is, if a certain initial distribution n i , i = 1, . . . , M is given, then we look for the i-th edge and the argument n∗i such that the rate reduction of the error: −

e2i (ni ) − e2i (n∗i ) → max, ni − n∗i

(3.15)

and we perform the enrichment in the i-th element leading to the increase in the number of d.o.f. n i → n∗i . Note that this principle is different from the rule of re£ning the element with the maximum error; we look instead for the maximum rate of the error reduction. Using the economic terms we might say that we invest in the element which brings the maximum pro£t: the largest error reduction per number of invested degrees-of-freedom. The maximum rate strategy (3.15) is the basis not only for developing the algorithms to determine the optimal orders of approximation for edges, faces and element interiors, but also the algorithm choosing between h- or p-re£nements of edges, discussed next.

4 Establishing the direction of re£nement: h or p ? The optimal re£nements are determined by computing the interpolation errors of the £ne mesh solution uh/2,p+1 . This suggests limitation of possible p-re£nements of the edge to just one, increasing order from 2

Equivalently, the dual problem consists of minimizing the global number of d.o.f., with a bound on the total error

10

p to p + 1. Consistently with the de£nition of functions e(n) de£ned in the previous section, we compare then the p-re£nement with all possible h-re£nements that result in the unit increase of the number of d.o.f. For an edge of order p, p different cases must be examined, corresponding to orders of approximation (1, p), (2, p − 1), . . . , (p, 1) for the edge sons. For each of the p cases, the corresponding interpolation error is determined, and the best competitive h-re£nement is identi£ed with the corresponding interpolation error. The edge is marked for breaking if the best h-re£nement produces a smaller interpolation error than the p-re£nement. We emphasize that this competition between h- and p-re£nements is used only for establishing, what we call, the direction of re£nement: h or p. The decision which edges of the mesh should actually be re£ned and in what way, is based on the principles sketched in the previous section, and it is discussed in a full detail in the next section.

5 Routine OPTIMIZE Let us recall that once the topology of the mesh is established, then for all three stages of optimization: on edges, faces and interiors (middle nodes) of coarse mesh elements, we must make the decision which of the nodes should be re£ned and what is the optimal distribution of their orders. By “nodes” we mean here mid-edge nodes, mid-face nodes or middle nodes, depending on the stage of optimization. For each stage we may formulate the optimization problem similar to (3.14) with appropriate de£nitions of the errors. In the following discussion we use the term “submesh” to refer either to an original (coarse mesh) node or to a set of the same type of nodes-sons resulting from breaking of the coarse node (if such an re£nement has occurred). • Optimization on edges. We consider a collection of M submeshes consisting either of a single original (coarse) edge or a pair of edge-sons of the coarse edge. A particular (trial) selection of the orders p of the edges of the submesh de£nes a £nite element space V hp (e) for the (possibly) re£ned edge. We evaluate the interpolation error by solving the local variational problem (2.9) with the (·, ·) e -product of (2.12) and we set. e2 = ku − uhp k2e . (5.16) • Optimization on faces. We consider a collection of M submeshes consisting either of a single original (coarse mesh) face, or of m = 2, 3, 4 face-sons of the coarse face. A trial selection of orders p of these face-sons de£nes a £nite element space V hp (f ) for the (possibly) re£ned face. We evaluate the corresponding interpolation error by solving the local variational problem (2.10) with the (·, ·) f -product of (2.13) and we

11

set, e2 = ku − uhp k2f .

(5.17)

• Optimization in the interior of (coarse mesh) elements (middle nodes). We consider a collection of M submeshes consisting either of a single original (coarse) element or of m = 2, 4, 8 element-sons of the coarse element. A trial selection of orders p of these element-sons de£nes a £nite element space V hp (K) for the (possibly) re£ned element. We evaluate the corresponding interpolation error by solving the local variational problem (2.11) and we set. e2 = ku − uhp k21,K .

(5.18)

In order to be consistent with the condition (3.15) we should always consider all possible combinations of enrichments of the nodes constituting the submesh including directional enrichments of mid-face and middle nodes. This would be prohibitively expensive. Instead, we propose a selective search of the distribution of orders within a submesh resulting in the optimal rate of reduction of the error. This procedure is called OPTIMIZE in our algorithm and it is based on the following principles. 1. First, an initial distribution of orders is selected. • For h-re£ned edges, the initial orders p 1 , p2 correspond to the most competetive h-re£nement, determined during the competition between the p and h-re£nements. • For faces and middle nodes the initial orders are the smallest orders which satisfy the minimum rule (with previously established edge or face orders, respectively). 2. Then we attempt to enrich the nodes constituting the submesh following the p-adaptive strategy of enriching the nodes with the largest errors. • We enrich all the nodes whose error exceeds 70% of the maximum element error in the submesh, e2K ≥ 0.70 max e2L . L

(5.19)

• We possibly perform directional enrichment (instead of isotropic one) if the error within the element of the submesh shows signi£cant directionality. We discuss the details of this issue later. The enrichments continue until the maximum allowed order is reached. In this way, we establish a path of re£nements with a corresponding convergence curve, determining the relation between the error and ndof.

12

3. In the process of performing the p enrichments (described above) we investigate the rate of reduction of the interpolation error. We begin investigating the behavior of the error only after it drops below the error corresponding to the coarse mesh (edge, face or element). We call the submesh satisfying this condition the reference point (on the path of re£nements). Selecting a reference point is necessary as the £nite element space for the initial orders does not necessarily contain the original space for the coarse mesh element. 4. We de£ne the notion of a minimum guaranteed rate for a given submesh. We understand by this the maximum rate of the error reduction obtained along the whole path of p re£nements between the reference point and a current mesh. The enrichments are stopped when all the elements indicated for enrichment according to Step 2. achieve the maximum allowed order. Evaluation of the minimum guaranteed rates is performed in a Version A of routine OPTIMIZE. 5. Having established the minimum guaranteed rates for all the submeshes, r gi , i = 1, . . . , M , we are ready to make the decision which submeshes should actually be re£ned. Ideally we should re£ne only one submesh which satis£es (3.15). This would, however, lead to a very slow re£nement process. For this reason we select for re£nements all submeshes for which the minimum guaranteed rate is large enough relative to a reference quantity rm . rgi ≥

1 rm 3

(5.20)

We de£ne r m as the maximum. rm := max rgi i

(5.21)

6. Having decided which submeshes are to be re£ned, we need to £nd the optimal distribution of orders of nodes constituting the submeshes. This is done in a Version B of the routine OPTIMIZE. It works analogously as Version A discussed before (in steps 1-4.) with the difference, that in the process of p enrichments we monitor when the current error reduction rate drops below the threshold value of 1/3 rm . The corresponding distribution of orders at this point is returned as the optimal one. Below we present in detail the steps of the algorithm. We introduce the following notation: m p r rg e2c n nref

-

number of elements in the submesh, orders of approximation of nodes of a submesh, p = {p iK }, i = 1, dim, K = 1, . . . , m, current rate of reduction of the error = (decrease of square of error)/(increase of ndof), minimum-guaranteed-rate, error (squared) on a coarse grid element, number of active (internal) d.o.f. of a submesh, ndof for a reference point, 13

e2ref - error (squared) for a reference point, popt - optimal orders, pold , e2old , nold - orders, the error and ndof corresponding to the previous step of p-enrichments, k, b dim T OL i βK

-

stiffness matrix and load vector for the variational problem of interpolation, dimension of the problem, dim = 1, 2, 3 for edges, faces and middle nodes, resp. target relative accuracy of approximation on a coarse mesh, directionality ¤ags, i = 1, dim, K = 1, . . . , m; i = 1 if element K is to be enriched in i-th direction, β i = 0 otherwise. βK K

Routine OPTIMIZE. Version A: £nd the minimum guaranteed rate, Version B: £nd the optimal distribution of p. INPUT: e2c , nc , rm 1. Initialize: p - from hp-competition for edges and from the minimum rule for faces and middle nodes, rg = −1030 (Version A), pold = p, e2old = e2c , nold = nc (Version B). 2. Evaluate: k, b and n corresponding to orders p. 3. Solve for the interpolant uhp . 4. Evaluate: i , i = 1, dim, K = 1, m • directionality ¤ags β K

• element errors (squared) e2K , P • total error (squared) e2 = K e2K .

5. Attempt to set a reference point.

if (no reference point has been set and e2 ≤ 1.01e2c ) then nref := n e2ref := e2 endif 6. Evaluate the current rate (if reference point has already been set). 14

if (reference point has been set) then if (n > nref ) then r := −(e − eref )/(n − nref ), forward rate elseif (nref 6= nc ) then r := −(ec − eref )/(nc − nref ), backward rate else r := 0 endif endif 7. Update the minimum guaranteed rate (Version A only). rg := max(rg , r) 8. Attempt to £nish the search for p opt (Version B only). if (r > 0 and r < 1/3rm ) then if (e2old < 1.01e2c ) then popt := pold else popt := p endif RETURN endif 9. Save the current parameters. (Version B only, if search for popt has not been completed yet). nold := n, pold := p, e2old := e2 10. Perform the enrichments. (Next step along the path of re£nements of the submesh). (a) set e2max := maxK e2K (b) for elements K such that e2K > 0.7e2max for i = 1, dim i = 1) pi := pi + 1 if(piK < pimax and βK K K endfor i if no enrichment of K occurred – set e2K := 0 endfor K 15

(c)

if (all elements ¤agged for enrichment hit p max ) then popt := pold (Version B only) RETURN endif

(d) if some K has been enriched, go to (a)

5.1

Directionality ¤ags

Elements are designated for breaking based on the fact whether some of their edges are to be broken. We assume that an element may be re£ned in one shot either isotropically into eight element-sons or anisotropically into four element-sons (producing elongated elements), or also anisotropically into two elements – producing ¤at elements. The best way to decide which re£nement kind results in the optimal mesh seems to be considering the competition between all of them: after selecting a trial topology (way of breaking) one should search for the best rates associated with p-enrichments of nodes-sons. Such a procedure would be fairly expensive and we have replaced it with a simple veri£cation of the directional behavior of the error. 1. We evaluate directionality indicators. ¶ ¶ µ Z µ ∂u ∂uh ∂u ∂uh ij dij = − − g dx , i, j = 1, dim. ∂ξi ∂ξi ∂ξj ∂ξj K

(5.22)

where g ij is the metric tensor given by (2) in two dimensions and g ij =

∂ξi ∂ξj ∂xk ∂xk

(5.23)

in three dimensions. 2. In the two-dimensional case, we demand a directional breaking only if the error is close to onedimensional in a sense that. • dij ≤ 0.01 maxk dkk (small off-diagonal terms), • d11 ≤ 0.01d22 or d22 ≤ 0.01d11 . 3. In the three-dimensional case, we £rst try to identify a situation which might be close to a twodimensional one, i.e. the error changes little in only one, say, m-th direction. dmi ≤ 0.01dkl , k, l, 6= m, for any i = 1, 2, 3.

(5.24)

Next, if such a direction is identi£ed, we use the 2D rules in the k, l-directions. If (5.24) is not satis£ed for any m, we decide to re£ne isotropically.

16

6 The hp mesh optimization algorithm As we have indicated before, the hp re£nements of the current mesh are based on investigating the interpolation of the £ne mesh solution u(x) := u h/2,p+1 (x) on the coarse mesh. The hp re£nement process consists then in repeating the steps of obtaining the £ne mesh solution and hp re£nements of the coarse mesh. We perform these operations within an adaptive code which can handle only one mesh at the time. The presence of two meshes (coarse and £ne) in the procedure is handled by dumping out and dumpin in the data structures for these meshes to external £les. The hp adaptive algorithm uses the procedures presented in previous sections and it consists of the following steps. PREPARING DATA FOR hp MESH OPTIMIZATION 1. Solve for the coarse mesh solution uhp (x). 2. Dump out the coarse mesh. 3. Generate the £ne mesh: re£ne all hexahedral elements isotropically into 8 elements-sons and p → p + 1 enrich all the nodes. 4. Solve for the £ne mesh solution u(x) := u h/2,p+1 (x). 5. For all coarse elements K compute the H 1 -seminorm of the solution error eK = |u(x) − uhp (x)|1,K and of the solution itself, and the error directionality indicators. STOP if P the tolerance criterion is met, eK /|u|1 ≤ T OL. 6. Store the £ne mesh solution corresponding to the subsequent coarse elements. OPTIMIZATION ON EDGES 7. Dump in the coarse mesh. 8. For all coarse edges decide about h or p “direction of re£nement.” 9. Evaluate minimum guaranteed rates for all coarse edges (Version A of routine OPTIMIZE). 10. Generate a list of the coarse edges designated for h or p re£nements, according to (5.20). 11. Find out which coarse elements should be h-re£ned to break the edges from the list and perform (possibly anisotropic) h-re£nements of these elements. 12. Investigate which coarse edges have undergone unwanted h-re£nements and determine the orders for the new mid-edge nodes (see the remark below). 17

13. Find the optimal enrichments of the edge-sons resulted from breaking of coarse edges (Version B of routine OPTIMIZE). 14. Perform p-enrichments of the coarse edges (which have not been involuntarily subdivided) and edgesons of the broken coarse edges. OPTIMIZATION ON FACES 15. Evaluate minimum guaranteed rates for all coarse faces (Version A of routine OPTIMIZE). 16. Generate a list of the submeshes corresponding to the coarse faces (subdivided or not) designated for p re£nements, according to (5.20). 17. Find the optimal enrichments of the mid-face coarse nodes or the mid-face nodes-sons resulted from breaking of coarse faces (Version B of routine OPTIMIZE). 18. Perform p-enrichments of the coarse faces (which have not been subdivided) and faces-sons of the broken coarse faces. OPTIMIZATION ON MIDDLE NODES 19. Evaluate minimum guaranteed rates for all coarse grid middle nodes (Version A of routine OPTIMIZE). 20. Generate a list of the submeshes corresponding to the coarse middle nodes (subdivided or not) designated for p re£nements, according to (5.20). 21. Find the optimal enrichments of the coarse grid middle nodes or the middle nodes-sons resulted from breaking of the coarse grid middle nodes (Version B of routine OPTIMIZE). 22. Perform p-enrichments of the coarse grid middle nodes (which have not been subdivided) and middle nodes-sons of the broken coarse grid middle nodes. 23. Restart the algorithm at Step 1. Remark: The edge-sons that have resulted from involuntary breaking of coarse grid edges which have not been designated for any re£nement, are enriched to orders p 1 and p2 which allow for at most tol increase of the error (squared) of their coarse grid ancestor with, P 2 ∆ei tol = 0.01 , (6.25) n where summation extends over all edges designated for re£nements and n is their total number.

18

7 Numerical Examples We applied the automatic hp-adaptive strategy to solve two model problems for the Laplace operator in three dimensions,  −∆u = f in Ω,      u = u0 on ΓD , (7.26)      ∂u = g on Γ . N ∂n The £rst problem illustrates the performance of the adaptive algorithm for an analytic solution which is, however, signi£cantly irregular due to localized very high derivatives. We refer to it as a shock problem. The second one is a model problem with singularities caused by a three-dimensional reentrant corner. We consider the Laplace equation in the “Fichera cube”, a three-dimensional counterpart of the famous benchmark in two dimensions – the L-shaped domain problem.

7.1

The shock problem

We solve the Poisson equation in a unit cube Ω = [0, 1]3 . We impose the Dirichlet boundary conditions on three faces located on the three planes of the Cartesian system of coordinates, and the Neumann conditions on the remaining three faces. The data for the volume load f and the boundary conditions u 0 and g corresponds to the exact solution – a shock-like function, u(r) = tan−1 (α|r − r 0 |), r 0 = (−1/4, −1/4, −1/4), α = 20.

(7.27)

Since the forces vary rapidly, we use an adaptive integration for evaluating the £nite element load vectors. The solution is displayed in Fig. 1. Figures 2 and 3 present an optimal hp-adaptive mesh: a general view of the mesh, and a view after removing a few elements. The convergence curves are presented in Fig.4. We use the log-log scales for both the number of degrees-of-freedom N and and the error e = |u − u hp |1,Ω , and the N 1/5 and log e scales. The £rst curve is bent downward as expected for convergence better than algebraic, the second one is close to a straight line allowing one to identify the exponential convergence of the type, e ≤ a e−bN

1/5

(7.28)

In this simulation we achieved the relative accuracy of 0.49% with 14891 dofs. The maximum order of elements is 8.

19

7.2

The Fichera cube problem

We solve the Laplace equation in the domain obtained by subdividing a “large” cube into eight equal small cubes, and removing one of the small cubes, Ω = (−1, 1)3 − [0, 1]3 .

(7.29)

The homogeneous u = 0 Dirichlet boundary condition is imposed on the three square faces located on the planes of the system of coordinates, ΓD = [0, 1] × [0, 1] × {0} ∪ [0, 1] × {0} × [0, 1] ∪ {0} × [0, 1] × [0, 1] .

(7.30)

The problem is driven by Neumann boundary conditions applied on the remaining faces of Ω. Data g in formulation (7.26) has been selected in such a way that it corresponds to the sum of two-dimensional exact solutions for the L-shaped domain problem in the XY , Y Z and XZ planes, p u2D = r2/3 cos θ, r = ξ 2 + η 2 , θ = tan−1 ξ/η . (7.31) Here ξ, η stand for x, y or y, z, or x, z components, respectively. This data is smooth on Γ N , and it generates the solution with features analogous to the two-dimensional solution for the L-shaped domain. The volume load is set to zero, f = 0. The initial mesh consists of only 7 hexahedral elements of order p = 3. In the adaptive process, the symmetry of the mesh and of the solution with respect to the three planes bisecting the octants remain preserved. The solution of the problem is shown in Fig. 5. The £nal optimal hp adaptive mesh is presented in Figures 6-10, a general view and parts of the interior. Fig. 11 presents the history of convergence on the log − log scale, as well as using N 1/5 and log e scaling. The relative error is reduced to 1.2% with 63166 d.o.f. The maximum order of elements is p = 7.

8 A 3D Two Grid Solver for hp-Finite Elements Critical to the success of the presented adaptive strategy is the solution of the £ne grid problem. Typically, in 3D, the global hp-re£nement increases the problem size at least by one order of magnitude, making the use of iterative solvers inevitable. With a multigrid solver in mind, we £rst implemented a two-grid solver based on the interaction between the coarse and £ne hp meshes. The choice is quite natural. The coarse meshes are minimum in size, and the use of a direct solve for the coarse grid should be ef£cient. Also, for timeharmonic wave propagation problems, the size of the coarsest mesh in the multigrid algorithm is limited by the condition that the mesh has to resolve all eigenvalues below the frequency of interest. Consequently, the sequence of multigrid meshes may be limited to just a few meshes only. In this section, we present a short discussion and implementation details on an ef£cient two grid solver algorithm for 3D hp meshes, suitable for elliptic problems. The two grid solver forms a crucial part of the hp-adaptive strategy. 20

We begin with a formulation of the algorithm, and a short discussion of its convergence, related to the standard domain decomposition methods and the Schwartz framework [19]. Next, we present a few essential implementation details. Finally, numerical studies illustrate the ef£ciency of the solver. Among implementation and theoretical issues that we address in this section, we focus on the question: is it possible to guide the optimal hp-re£nements with a partially converged £ne grid solution only, and to what extent?

8.1

Formulation and convergence theory

We are interested in solving a variational problem in the standard form, ( u∈V a(u, v) = l(v),

∀v ∈ V .

(8.32)

Here • V is a Hilbert space. • l ∈ V 0 is a linear continuous functional on V . • a is a bilinear and symmetric form, assumed to be coercive and continuous on space V . Thus, we can de£ne an inner product on V as (u, v) := a(u, v), with the corresponding norm denoted by kuk. Space decomposition. We assume that space V can be represented as an algebraic (in general not direct) sum of subspaces Vk , k = 0, . . . , M , V = V0 + V1 + . . . + V M . The corresponding inclusions and their transposes are denoted by, ιTk : V ∗ → Vk∗ ,

ιk : Vk → V, and the elliptic projections,

(

Pk u ∈ V k a(Pk u − u, v) = 0,

∀v ∈ Vk ,

are denoted by Pk : V → Vk . The associated subspace problems are de£ned as, ( uk ∈ V k

a(uk , vk ) = l(vk ), 21

∀vk ∈ Vk .

(8.33)

Given an approximation u(n) ∈ V to the solution of (8.32), and the corresponding residual r (n) (v) = (n) a(u(n) , v) − l(v), we de£ne u k as the solution of the following problem,   u(n) ∈ Vk k

(8.34)

 a(u(n) , v ) = r(n) (v ), k k k

∀vk ∈ Vk .

(n)

Notice that, formally, solution uk and test function vk should be replaced with their inclusions into V (the forms are de£ned on the whole space). Two grid solver algorithm. Given a current solution u(n) , we determine the next iterate u(n+1) performing the following two steps. 1. Coarse grid correction, 1

(n)

u(n+ 2 ) = u(n) + ι0 u0 2. Fine grid smoothing, 1

1

u(n+1) = u(n+ 2 ) + α(n+ 2 )

M X

(n+ 21 )

ι k uk

,

k=1

1

where relaxation coef£cient α (n+ 2 ) is determined by minimizing the residual (the Steepest Descent (SD) algorithm). Convergence. One can show that [15], k u − u(n+1) k κ−1 ≤ , (n) κ+1 ku−u k where

P λmax ( M k=0 Pi ) . κ= PM λmin ( k=0 Pi )

Note that the two-grid iterations may additionally be accelerated using the Conjugate Gradient (CG) Method. Stopping criterion. Choosing a right stopping criterion is one of the most sensitive issues for all iterative methods. We consider two sources of numerical error: 1. the discretization error, representing the difference between the exact and (fully converged) discrete solutions, ku − uh,p k, and, 22

(n)

2. the iterative solver error given by kuh,p − uh,p k. Typically, the £ne grid discretization error is smaller than the coarse grid discretization error at least by one order of magnitude, k u − uh/2,p+1 k ≤ 0.1 . 0.01 ≤ k u − uh,p k (n)

Therefore, it makes sense to impose the same error bounds for approximation u h/2,p+1 to the solution of the £ne grid problem, (n)

0.01 ≤

k uh/2,p+1 − uh/2,p+1 k k uh,p − uh/2,p+1 k

≤ 0.1 . (0)

As we always start the two-grid iterations with the coarse grid solution prolongated to the £ne grid, u h/2,p+1 = uh,p , the last criterion reduces to, (n)

0.01 ≤

k uh/2,p+1 − uh/2,p+1 k (0)

k uh/2,p+1 − uh/2,p+1 k

≤ 0.1 .

(8.35)

The upper bound indicates the maximum acceptable error, while the lower bound points to the value below which the discretization error will dominate the convergence error by an order of magnitude, and further iterations make little sense. Obviously, this quantity is not computable, and a stopping criterion can only be based on an approximation to it.

8.2

Implementation details

This section is devoted to a discussion of several implementation aspects, such as assembling, selection of blocks for the block-Jacobi smoother, and construction of the prolongation operator. To assemble or not to assemble. A standard £nite element code generates a number of dense element stiffness matrices. These matrices can be stored in an element-by- element fashion, or they can be assembled in a global stiffness matrix. The main advantages of assembling the global stiffness matrix are: 1. Considerable savings in storage for low order p meshes. Storing the (unassembled) element matrices results in an increased memory requirements, due to the repetition of nodes shared by neighboring elements. The amount of extra storage becomes signi£cant in 3D but it decreases with increasing order p. We illustrate this assertion with a grid composed of 4 × 4 × 4 brick elements shown in Fig. 12. For uniform order of approximation p, table 1 compares the number of nonzero entries in a globally 23

p 1 2 3 4

Number of Nonzero Entries Global Stiffness Matrix Element Stiffness Matrices 2197 4096 35937 46656 226981 262140 912673 1000000

% Savings 47 % 23 % 13.5 % 8.8 %

Table 1: Number of nonzero entries for a 4 × 4 × 4 elements grid in 3D assembled stiffness matrix against the number of nonzero entries in the corresponding set of element stiffness matrices. 2. The logical complexity of manipulations with the assembled matrix is lower than that for the unassembled element matrices. Matrix-vector multiplication in the element fashion requires two extra subroutines: one that extracts element vectors from a global vector, and another one, that assembles back the element vectors into a global vector. On the other hand, the main disadvantages of assembling are: 1. Assembling the global matrix requires a sparse storage pattern. A global (assembled) stiffness matrix is sparse. Therefore, we should store it by using a sparse storage pattern. In h-adaptive codes, this is usually implemented by using a bandwidth storage pattern. For hp-adaptive elements, the bandwidth depends not only upon the topology of the mesh but order of approximation as well, and it is determined by the elements of highest order in the mesh. This would result in prohibitive storage requirements. Using the example of Fig. 12, if we take the bandwidth corresponding to p = 4, with a majority of elements, however, being of £rst order only, we are storing over 40000 % more entries than neccesary! Motivated with these simple observations, we have decided to use the Compressed Column Storage (CCS) pattern [3], also known as Harwell-Boeing sparse matrix format [12]. 2. The parallelization becomes more challenging. A standard parallel £nite element code is based on assigning a number of elements to each processor. Therefore, it is more natural to assign element matrices to each processor as opposed to partitioning of the global matrix between processors. Nevertheless, a partitioning of the global stiffness matrix (or partial assembling) can be performed and ef£cient parallel linear algebra libraries can be utilized [20]. In summary, the decision whether to assemble or not seems to be mostly a function of personal preference. In our case, we have decided to use the CCS pattern and assemble the global matrix. Assembling of the stiffness matrix. We present now the assembling algorithm for the stiffness matrix: 24

• Description: Assemble element matrices into a CCS matrix. • Input: Set of element stiffness matrices. • Output: CCS matrix. • High Level Implementation: – Construct a global denumeration for element d.o.f. We use the concept of the natural ordering of nodes (and the corresponding d.o.f.) discussed in [10]. The goal is to determine, for each node, the number (in the global ordering) of the £rst d.o.f. associated with the node. – Using element nodal connectivities, create for each node a list of interacting nodes. Notice that element nodes include both regular nodes and parent nodes of the element constrained nodes, comp. the notion of modi£ed element discussed in [10]. – Allocate space (number of columns). – Save number of nonzero entries per column. – Allocate space (number of row entries). – Save row position for each entry. – Reorder row positions for each column according to the global ordering of d.o.f. – Deallocate space (interaction between nodes). – Allocate space (values of the stiffness matrix). – Generate the entries of the global stiffness matrix. Construction of the smoother matrix. We also assemble the smoother into a global matrix. The necessary steps are as follows. 1. Determine the blocks. Typically, we de£ne blocks by considering d.o.f. corresponding to all basis functions whose supports are contained within already speci£ed patches of elements. The corresponding subspace Vk can then be identi£ed not only as a span of the selected basis functions but also as a subspace of all FE functions that vanish outside the patch. This yields a de£nition of space V k that is independent of the choice of shape functions and, consequently, simpli£es the convergence analysis. The £rst two of our choices follow this idea. A block corresponds to the span of all £ne grid basis functions whose supports are contained within, • the support of a coarse grid vertex basis function, or, • the support of a £ne grid vertex basis function.

25

In our third choice, the blocks are de£ned by considering all d.o.f. corresponding to a particular (modi£ed) element. The corresponding space V k does depend upon the way the basis functions extend into neighboring elements and it is not independent of the selection of basis functions. In each of the discussed cases, a block is speci£ed by listing all nodes contributing to it. Advantages and disadvantages of each of the mentioned smoothers are discussed in [15], where it is shown that the third choice of blocks is the most ef£cient one. Thus, on the remaining of the paper, we will discuss the last type of blocks. 2. Extract each block from the assembled stiffness matrix. This operation requires the construction, for each block, of the corresponding list of global d.o.f. 3. Invert each block. Since the order of approximation p in our code cannot exceed a rather small number (p = 9), the size of the blocks is relatively small. For example, for blocks de£ned by element basis functions, the size of the blocks is equal to (p + 1)3 . 4. Assemble the global smoother. We use the same logic as for assembling the global stiffness matrix, and assemble the inverted block matrices, back into a global CCS matrix. In the case of the third choice of patches, the stiffness matrix and the smoother share the same topology expressed in terms of identical CCS integer arrays. The prolongation (restriction) matrix. Determination of the prolongation matrix reduces to determining all coarse grid basis functions in terms of £ne grid basis functions. This corresponds exactly to the initiation of new d.o.f. during h-re£nements, and it is related to the constrained approximation technique. Consequently, we generate the prolongation matrix coef£cients, node by node, as we re£ne coarse mesh elements. The only technicality comes from the fact that, during the global hp-re£nement, some intermediate nodes may appear, i.e. parent nodes of £ne mesh nodes may not necessary belong to the coarse mesh themselves, but they may result from an earlier re£nement of the coarse grid nodes. The d.o.f. corresponding to the intermediate nodes are eliminated using a logic similar to that used for implementing multiply constrained nodes [8].

8.3

Numerical results

This section is devoted to a numerical study of convergence, performance, and ef£ciency of our two-grid solver. The study will be based on the two model elliptic problems discussed in the previous section: the Fickera problem, and the “shock problem”. Using these examples, we will try to address the following issues: • importance of the optimal relaxation parameter, 26

• error estimation for the two-grid solver, • scalability and ef£ciency, and • the possibility of guiding the optimal hp-re£nements with a partially converged solution. Optimal relaxation parameter. We study the effect of using different relaxation parameters on a collection of (initial) uniform hp grids for the Fickera problem, changing the number of elements and their order of approximation. We have used the third choice of blocks (spans of element basis functions), and focused on studying the smoothing operation only. We compare the block Jacobi iterations for different £xed values of the relaxation parameter with the optimal relaxation. Using the optimal relaxation is equivalent to ’accelerating’ the standard block Jacobi (with no relaxation) with the Steepest Descent (SD) method. Fig. 13 summarizes the results of the experiment in terms of the number of iterations required to attain a given (£xed) tolerance error, as a function of mesh size and order of approximation. From these plots we conclude the following: • For a £xed relaxation parameter, whether the method converges or not, depends almost exclusively upon p (and not upon h). • For a £xed relaxation parameter, convergence rate of the method (provided that the method converges) depends almost exclusively upon h (and not upon p). • The optimal relaxation guarantees faster convergence than any £xed relaxation parameter. In summary, without the optimal relaxation parameter, the method diverges for suf£ciently large order of approximation p. The condition number of the stiffness matrix preconditioned with the block Jacobi smoother seems to be almost insensitive to p but it does depend upon the mesh size. These observations seem to be consistent with the results on preconditioning of hp methods, see e.g. [1]. Error estimation. In the following, we focus on error estimation for the selected stopping criterion given by (8.35). We approximate: (n+ 1 )

2 k uh/2,p+1 − uh/2,p+1 k

(1)

≈

2 − uh/2,p+1 k k uh/2,p+1

(n+ 1 )

(n+1)

(1)

(n+1)

2 k uh/2,p+1 − uh/2,p+1 k

(8.36)

2 − uh/2,p+1 k k uh/2,p+1

A correction term based on the optimal relaxation parameter and the convergence theory, can be added to the error estimate (I) as described in [14]. By doing so, the resulting error estimate (II), becomes signi£cantly

27

more accurate than the one given by (8.36). Fig. 14 shows the accuracy of both estimates for an hp-grid with 13897 unknowns corresponding to the Fickera problem example. At each step n, we can estimate the error corresponding to step n − 1 without performing any additional matrix-vector operations. Scalability and Ef£ciency. Several operations are performed during the execution of the two grid solver algorithm. And depending upon the particular hp-grid we may be solving, some operations become rather expensive in terms of computational time. In the next lines, we enumerate the most expensive operations, and their scalability with respect number of elements N , polynomial order of approximation p, and number of V-cycles V . 1. Coarse grid solve (O((p − 1)3 (N/8)2 )). 2. Construction of block Jacobi smoother (O(p9 N )). 3. Matrix vector multiplication (O(p6 N V )). Notice that if N is large enough, then the CPU time used by the coarse grid solve will dominate the total time of the solver. On the other hand, if N is small enough, then the relation between V and p 3 will determine the most expensive operation. In Fig. 15, we display the time (in seconds) used by the two grid solver versus the number of d.o.f., for a sequence of hp-grids produced by the fully automatic hp-adaptive strategy. As expected, the total time required by the two grid solver does not grow linearly with respect the number of d.o.f. As p grows, time required for construction of the block Jacobi smoother (patch inversion) becomes dominant. It is also expected that, for larger grids, the cost of the coarse grid solve will eventually dominate (if p is bounded). From Fig. 16, we conclude that for p = 2, solution of 2.15 millions d.o.f. requires just 8 minutes. For p = 4 and over 2 millions d.o.f., the cost of the coarse grid solve becomes outrageously expensive, which is indicating that a full multigrid should be implemented. Finally, in Fig. 17 we display the time used by the two grid solver (in the algebraic scale) versus the relative error (in percentage). The straight line indicates exponential convergence of the fully automatic hp-adaptive strategy combined with the two grid solver with respect time.

8.4

Guiding hp-re£nements with partially converged solutions

Now, we try to adjust the admissible tolerance error constant to make it as large as possible, without affecting the overall exponential convergence property of the whole hp algorithm.

28

Thus, we come to the most crucial question addressed in this section. How much can we relax our convergence tolerance for the two grid solver, without loosing the exponential convergence in the overall hp-adaptive strategy? Obviously, this is a rather dif£cult question, and we will try to reach a conclusion via numerical experimentation only. We work this time with the Fickera problem (very similar results were obtained for the shock problem). We run the hp-adaptive strategy using different strategies to solve the £ne grid problem: 1. a direct (frontal) solver, 2. the two-grid solver with tolerance error set to 0.01 (i.e., error estimate ≤ 0.01), and 3. the two-grid solver with tolerance error set to 0.1. For the Fickera problem the exact solution is unknown. Thus, we use an approximation to the error in the energy norm. The exact relative error in the energy norm, is given by: k xh,p − xexact k = k xexact k

p k xh,p k2 − k xexact k2 k xexact k

Thus, a good approximation to it is given by: p p k xh,p k2 − k xexact k2 k xh,p k2 − k xbest k2 ≈ k xexact k k xbest k where xbest is the best approximate solution that we can trust. We use for x best the last £ne grid solution. Finally, we also report the number of two grid iterations necessary to achieve the required tolerance.

From Fig. 18 we draw the following conclusions. • The two-grid solver with 0.01 error tolerance generates a sequence of hp-grids that has similar convergence rates to the sequence of hp-grids obtained by using a direct solver. • As we increase the two-grid solver error tolerance up to 0.1, the convergence rates of the corresponding sequence of hp-grids decreases only slightly (or it does not decrease at all). At the same time, the number of iterations decreases dramatically. • The number of required iterations for our two-grid solver does not increase as the number of degrees of freedom increases. Furthermore, it remains at the level of 10 iterations per grid. To summarize, it looks safe to relax the error tolerance to 0.1 value, without loosing the exponential convergence rates of the overall hp mesh optimization procedure. 29

9 Conclusions The presented work represents a two and a half years research effort to generalize and implement the automatic hp-adaptive strategy to three dimensions. We are proud of the very fact that we have managed to overcome the complexity of the algorithm and coding. The presented results con£rm our long time conviction that it is possible to design a fully automatic hp-adaptive strategy that delivers exponential convergence both in terms of problem size (error vs. number of d.o.f.), and the CPU time. Our most recent implementation of hp data structures supporting anisotropic hp re£nements and constrained approximation, has passed the ultimate test of adaptivity. Nevertheless, the presented 3D algorithm and implementation, represent just a £rst step in our threedimensional effort. In mean time, our two-dimensional algorithm has undergone four revisions, including a generalization to Maxwell problems. A few simple lessons that we have learned from the three-dimensional computations besides £ghting the complexity are as follows. • Minimization of the error in the energy (H 1 ) norm, results in higher values of order p much earlier than for the lower dimension problems. Solution of 1D Laplace equation with singular solution u = x 0.6 and 1 percent error, requires almost 100 h-re£nements with the highest order p max = 3 (for large elements away from the singularity). Solution of the L-shape domain problem, with the same accuracy of 1 percent, results in 10 h-re£nements and p max = 5. Solution of the Fichera problem, with again 1 percent accuracy, requires only 5 levels of h-re£nement and p max = 7. This increasing importance of p-re£nements with growing spatial dimension re¤ects the nature of H 1 -spaces. In 1D, functions in H 1 are automatically H¨ older continuous, in 2D, the £nite energy condition does not imply the continuity, but already slightly more regular functions u ∈ H 1+² are continuous. In 3D, one needs u ∈ H 3/2+² to imply continuity. In all our codes, the order of approximation has been limited to p ≤ 9 (data structure restrictions). We may have to rethink this assumption for the 3D problems. • Anisotropic re£nements are crucial. An ef£cient resolution of boundary layers (¤at elements) and edge singularities (needle elements) requires the use of anisotropic re£nements. Choosing the anisotropic over isotropic re£nements is part of the mesh optimization process, and the current use of anisotropy ¤ags will eventually have to be replaced with staging a competition between the p- and h-re£nements in a similar way, we do it currently for edges. In 2D, all singularities are point singularities, and they can be ef£ciently resolved using the isotropic re£nements only. In 3D, the presence of edge singularities makes the use of the anisotropic re£nements absolutely necessary. • Determination of interpolation (projection) errors for submeshes realizing the max error re£nement paths, has to be done more ef£ciently, utilizing the fact that the corresponding spaces are nested. At the moment, this is the most expensive part of the algorithm in terms of CPU time, as we solve each 30

projection problem starting from scratch. • Convergence of the two-grid solver is sensitive to anisotropy in meshes. If these anisotropies arise from optimal mesh re£nements, the solver performance is excellent. If, however, elongated or ¤at elements are introduced as a result of a clumsy initial mesh generation that re¤ects only the geometry of the domain, and not the shape of the solution, the two-grid solver may fail to converge. This remark is relevant for problems that we have not reported in this paper. • The numerical evidence indicates that it is possible to guide the optimal hp-re£nements with a partially converged solution. Our current work goes in three directions. Cutting down complexity. In the presented implementation, the optimization on edges is immediately followed by the execution of the optimal edge re£nements. The 1-irregularity rule results in unwanted edge re£nements which severely complicate the logic of the algorithm and the implementation. In process of developing the 2D hp algorithm for Maxwell equations, we have learned that the projection-based interpolation operators do not commute on 1-irregular meshes, unless projections on constrained elements are replaced with projections on the corresponding, partially broken element-fathers, see [5] for details. This motivated us to ignore the constraints in the mesh optimization algorithm. The simpli£ed version of the algorithm has delivered overall almost identical meshes. The 2D experience suggests that we should try the same concept in three dimensions. Once the constraints are ignored, the whole mesh optimization process, including optimization on edges, faces, and element interiors, can be done before executing optimal re£nements. In fact, the mesh optimization algorithm can then be implemented as a stand alone package. This is the direction that we want to pursue. Generalization to Maxwell problems. Solutions to Maxwell equations exhibit stronger singularities than those for elliptic problems. The £elds, as opposed to derivatives only (for elliptic problems), “blow up” to in£nity at diffracting corners and edges. This makes the hp methodology especially attractive for this class of problems. The corresponding mesh optimization algorithm is fully analogous, but the implementation is considerably more complex, as it involves manipulation with H 1 - and H(curl)conforming elements at the same time. Two grid solver. The multigrid technology makes the presented ideas practical. The solver has already been generalized to both 2D and 3D Maxwell equations [14] and applied to test hp-adaptivity in two dimensions. We hope to report new results on the three-dimensional hp algorithm soon.

31

References [1] M. Ainsworth. A hierarchical domain decomposition preconditioner for hp £nite element approximation on locally re£ned meshes. SIAM J. Sci. Comput., 17(6):1395–1414, 1996. [2] I. Babu³ska and M. Suri. The p and hp versions of the £nite element method, basic principles and properties. SIAM Review, 36:578–632, 1994. [3] R. Barrett, M. Berry, J. Chan, T.F. Demmel, J.M. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vost. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, 1994. [4] L. Demkowicz. Projection based interpolation. In Transactions on Structural Mechanics and Materials. Cracow University of Technology Publications, Cracow, 2004. Monograph 302, A special issue in honor of 70th Birthday of Prof. Gwidon Szefer, see also ICES Report 04-03. [5] L. Demkowicz. Fully automatic hp-adaptivity for Maxwell’s equations. Comput. Methods Appl. Mech. Engrg., 194:605–624, 2005. [6] L. Demkowicz and I. Babu³ska. p interpolation error estimates for edge £nite elements of variable order in two dimensions. SIAM J. Numer. Anal., 41(4):1195–1208 (electronic), 2003. [7] L. Demkowicz and A. Buffa. H 1 , H(curl) and H(div)-conforming projection-based interpolation in three dimensions. quasi-optimal p-interpolation estimates. Comput. Methods Appl. Mech. Engrg, 194:267–296, 2005. [8] L. Demkowicz, K. Gerdes, Ch. Schwab, A. Bajer, and T. Walsh. Hp90: A general and ¤exible fortran 90 hp-FE code. Computing and Visualization in Science, 1:145–163, 1998. [9] L. Demkowicz, P. Monk, L. Vardapetyan, and W. Rachowicz. de Rham diagram for hp £nite element spaces. Comput. Math. Appl., 39(7-8):29–38, 2000. [10] L. Demkowicz, D. Pardo, and W. Rachowicz. 3d hp-adaptive £nite element package (3dhp90). version 2.0. Technical Report 24, TICAM, 2002. [11] L. Demkowicz, W. Rachowicz, and Ph. Devloo. A fully automatic hp-adaptivity. Journal of Scienti£c Computing, 17(1-3):127–155, 2002. [12] I. Duff, R. Grimes, and J. Lewis. Sparse matrix test problems. ACM Trans. Math. Soft., 15:1–14, 1989. [13] J.T. Oden, L. Demkowicz, R. Rachowicz, and T.A. Westermann. Toward a universal hp adaptive £nite element strategy, part 2. a posteriori error estimation. Comput. Methods Appl. Mech. Engrg., 77:113–180, 1989. 32

[14] D. Pardo. Integration of hp-Adaptivity with a Two Grid Solver: Applications to Electromagnetics. PhD thesis, The University of Texas at Austin, April 2004. [15] D. Pardo and L. Demkowicz. Integration of hp-adaptivity and multigrid. I. a two grid solver for hp £nite elements. Comput. Methods Appl. Mech. Engrg., 2005. [16] W. Rachowicz, J.T. Oden, and L. Demkowicz. Toward a universal hp-adaptive £nite element strategy, part 3: Design of hp meshes. Comput. Methods Appl. Mech. Engrg., 77:181–212, 1989. [17] J. Schoeberl. Commuting quasi-interpolation operators for mixed £nite elements. Technical Report 10, Dept. of Mathematics, Texas A&M University, 2001. [18] Ch. Schwab. p and hp-Finite Element Methods. Clarendon Press, Oxford, 1998. [19] B.F. Smith, P.E. Bjorstad, and W.D. Gropp. Domain Decomposition. Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, New York, 1996. [20] R. Van de Geijn. Using PLAPACK: Parallel Linear Algebra Package. MIT Press, 1997.

33

x

Figure 1: The shock problem, contour map of the solution

34

x

Figure 2: An hp adaptive optimal mesh for the shock problem. The general view of the mesh. Colors (shades of gray) correspond to orders p between p = 1 and p = 8

35

x

Figure 3: An hp adaptive optimal mesh for the shock problem. The view of the mesh after removing a number of elements

36

0.1

log(e)

-0.1 -0.3 -0.4 -0.6 -0.8 -0.9 -1.1 -1.3 -1.5 -1.6 2.86

0.1

3.02

3.19

3.35

3.51

3.67

3.83

3.99

log(N) 4.15

4.12

4.50

4.88

5.26

5.64

6.02

6.40

N^(1/5) 6.78

log(e)

-0.1 -0.3 -0.4 -0.6 -0.8 -0.9 -1.1 -1.3 -1.5 -1.6 3.74

Figure 4: The shock problem: convergence curves with log N − log e and N 1/5 − log e scales 37

x

38

x

Figure 6: An hp adaptive optimal mesh for the Fichera cube problem. The general view of the mesh. Colors (shades of gray) correspond to orders p between p = 1 and p = 8

39

x

Figure 7: An hp adaptive optimal mesh for the Fichera cube problem. The view of the mesh after removing a number of elements

40

x

Figure 8: An hp adaptive optimal mesh for the Fichera cube problem. The view of the cube [0, 1]×[−1, 0]× [−1, 0]

41

x

Figure 9: An hp adaptive optimal mesh for the Fichera cube problem. The view of the cube [−1, 0] × [−1, 0] × [−1, 0]

42

y

Figure 10: An hp adaptive optimal mesh for the Fichera cube problem. The back view of the mesh

43

-0.2

log(e)

-0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -0.9 -1.0 -1.1 2.50

-0.2

2.79

3.07

3.36

3.65

3.94

4.23

4.51

log(N) 4.80

3.91

4.65

5.40

6.14

6.89

7.63

8.38

N^(1/5) 9.12

log(e)

-0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -0.9 -1.0 -1.1 3.16

Figure 11: The Fichera cube problem: convergence curves with log N − log e and N 1/5 − log e scales 44

Figure 12: 3D grid with 64 elements. Relaxation parameter=0.6

Relaxation parameter=0.5 1000

56 elements 448 elements 3584 elements

800

Number of iterations

Number of iterations

1000

600 400 200 0

2

3 Order of approximation

600 400 200 0

4

56 elements 448 elements 3584 elements

800

2

Relaxation parameter=0.4 1000 56 elements 448 elements 3584 elements

800

Number of iterations

Number of iterations

4

Optimal relaxation parameter

1000

600 400 200 0

3 Order of approximation

2

3 Order of approximation

600 400 200 0

4

56 elements 448 elements 3584 elements

800

2

3 Order of approximation

4

Figure 13: Fickera problem. Convergence of the smoother using different relaxation parameters.

45

NRDOF = 13897 0 Error Est. 1 Exact Error Error Est. 2

−0.5

−1

LOG EXACT ERROR

−1.5

−2

−2.5

−3

−3.5

−4

−4.5

0

10

20

30 40 50 NUMBER OF ITERATIONS

60

70

80

Figure 14: Fickera problem. Comparing the exact error (red) vs error estimate I (light blue) vs error estimate II (dark blue)

350 TOTAL TIME PATCH INVERSION MATRIX VECTOR MULTIPLICATION INTEGRATION COARSE GRID SOLVE OTHER

300

TIME IN SECONDS

250

200

150

100

50

0

2

4

6

8 NUMBER OF DOF

10

12

14 4

x 10

Figure 15: 3D shock like solution problem. Number of d.o.f. vs time (in seconds) used by the two grid solver for a sequence of optimal hp-grids. In core computations, AMD Athlon 1 Ghz processor.

46

PATCH INVERSION MATRIX VECTOR MULTIPLICATION INTEGRATION COARSE GRID SOLVE ASSEMBLING OTHER 6%

2%

6% 2%

10%

15%

21%

6%

4%

17%

15% 33%

13%

36%

33%

82%

Nrdofs ≈ 2.15 Million Total time ≈ 8 minutes Memory* ≈ 1.0 Gb p=2

Nrdofs ≈ 0.27 Million Total time ≈ 10 minutes Memory* ≈ 2.0 Gb p=8

Nrdofs ≈ 2.15 Million Total time ≈ 50 minutes Memory* ≈ 3.5 Gb p=4

Figure 16: The shock problem. Ef£ciency analysis of the two grid solver. In core computations, IBM Power4 1.3 Ghz processor. *Memory = memory used by nonzero entries of stiffness matrix.

RELATIVE ERROR IN % (LOGARITHMIC SCALE)

30

15

10

5

3

2

1

5

10

19

32 52 80 119 TIME IN SECONDS (ALGEBRAIC SCALE)

172

243

336

Figure 17: The shock problem. Error vs time utilized by the two grid solver. Exponential convergence is obtained by using the fully automatic hp-adaptive strategy combined with the two grid solver.

47

40 0.1 0.01

Frontal Solver 0.1 0.01

33.1

35

20.1 30

12.2

NUMBER OF ITERATIONS

RELATIVE ENERGY NORM ERROR (IN %)

54.6

7.38 4.48 2.71

25

20

15

1.64

10

1

5

0.60 1

32

243

1024 3125 NUMBER OF DOF

7776

16807

0

0

Energy error estimate

1

2 3 4 NUMBER OF DOF IN THE FINE GRID

5

6 5

x 10

Number of iterations

Figure 18: Fickera problem. Guiding hp-re£nements with a partially converged solution.

48

FULLY AUTOMATIC hp-ADAPTIVITY IN THREE ...

way, will deliver exponential convergence rates in terms of error vs. the number of degrees-of-freedom .... 3 with faces f and edges e. The idea of the projection-based interpolation is based on three ..... we call, the direction of reÂ£nement: h or p.

Download PDF

1MB Sizes 2 Downloads 230 Views

Report

FULLY AUTOMATIC hp-ADAPTIVITY IN THREE ...

Recommend Documents