Design Specific Joint Optimization of Masks and ...

Viewer
Transcript

Design Specific Joint Optimization of Masks and Sources on a Very Large Scale

K. Lai, M. Gabrani, D. Demaris, N. Casati, aA. Torres, S. Sarkar, P. Strenski, S. Bagheri, b D. Scarpazza, A. E. Rosenbluth, D. O. Melville, A. Wächter, J. Lee, V. Austel, M. Szeto-Millstone, K. Tian, F. Barahona, T. Inoue, M. Sakamoto IBM Corporation, New York, USA Mentor Graphics Inc., California, USA b currently at D. E. Shaw Research, New York, USA a

contact: [email protected] Keywords : Source Mask Optimization, SMO, Joint optimization, Large scale source optimization, progressive deletion, pattern selection, pattern clustering, lithographic difficulty, pattern generation, optimization parallelization

ABSTRACT Joint optimization (JO) of source and mask together is known to produce better SMO solutions than sequential optimization of the source and the mask. However, large scale JO problems are very difficult to solve because the global impact of the source variables causes an enormous number of mask variables to be coupled together. This work presents innovation that minimize this runtime bottleneck. The proposed SMO parallelization algorithm allows separate mask regions to be processed efficiently across multiple CPUs in a high performance computing (HPC) environment, despite the fact that a truly joint optimization is being carried out with source variables that interact across the entire mask. Building on this engine a progressive deletion (PD) method was developed that can directly compute "binding constructs" for the optimization, i.e. our method can essentially determine the particular feature content which limits the process window attainable by the optimum source. This method allows us to minimize the uncertainty inherent to different clustering/ranking methods in seeking an overall optimum source that results from the use of heuristic metrics. An objective benchmarking of the effectiveness of different pattern sampling methods was performed during postoptimization analysis. The PD serves as a golden standard for us to develop optimum pattern clustering/ranking algorithms. With this work, it is shown that it is not necessary to exhaustively optimize the entire mask together with the source in order to identify these binding clips. If the number of clips to be optimized exceeds the practical limit of the parallel SMO engine one can starts with a pattern selection step to achieve high clip count compression before SMO. With this LSSO capability one can address the challenging problem of layout-specific design, or improve the technology source as cell layouts and sample layouts replace lithography test structures in the development cycle 1. SMO overview and the challenges of large scale joint optimization Source Mask Optimization (SMO) has been proposed and demonstrated as a RET for extending optical lithography even in the absence of further increases in the scanner NA [1, 2, 3]. For some mask levels at the 22nm node and beyond, SMO is able to recover forbidden pitches and non-printable 2D features by co-optimizing all such difficult design patterns jointly with the source. As has been reported by our group and others, source optimization is improved when the simultaneous interactions of all mask variables with the source variables is directly taken into account during a joint optimization (JO) step [1, 4, 6]. Previous SMO publications have focused on small-scale joint optimization, in which only few generic or critical clips are co-optimized with the source [6,7]. However, for a fab to provide finely tuned support for different customers whose layouts may exhibit significant variation, there are potential risks of missing critical features (i.e. unanticipated legal constructs) and ignoring the variable context of constructs as appearing in real design. In order to sample more clips, an automatic selection of a broad and encompassing clip population from an actual design is preferred. When SMO uses a process window objective, only a very small proportion of features in a layout is likely to be binding on the common process window, and it is only these binding features which are determinative of the optimum source for the layout. The clips that contain these features are called Binding Constructs. These can formally be defined as any clip that contains at least 1 binding constraint (of any kind) in the optimization. Other features will not contribute binding

Optical Microlithography XXIV, edited by Mircea V. Dusa, Proc. of SPIE Vol. 7973, 797308 · © 2011 SPIE · CCC code: 0277-786X/11/$18 · doi: 10.1117/12.879787

Proc. of SPIE Vol. 7973 797308-1 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

constraints when included in the optimization, and so will not affect the source solution [1]. However, without knowing these binding features in advance it is preferable to extract as many unique clips as possible to represent the design. Unfortunately, during joint source mask optimization every mask variable is coupled quite closely to every other mask variable via the source variables (since each source pixel illuminates all parts of the mask), presenting SMO systems with a very challenging optimization problem as mask area is increased to include a large number of clips. This may be contrasted with mask-only design procedures like OPC, in which mask variables that are widely separated will only be linked by indirect residual interactions that propagate through the very long chain of variables occupying intervening positions - In practice these interactions are more or less negligible. With mask-only design procedures it is therefore straightforward (in broad outline) to distribute different portions of a layout to separate CPUs for processing in parallel; however with JO such straightforward parallelization is not possible. Lithographic requirements like SRAF print-through avoidance imply that problem constraint count will scale proportionately with mask area, as will the number of variables, and in joint optimization even constraint requirements that involve widely separated mask regions can potentially exhibit strong trade-offs with one another. This long-range coupling makes JO especially challenging since the run time scales as the cube of problem size in the large-area limit [8, 9]. JO optimization of an extensive pattern set is inherently quite challenging. Table 1 uses a degree of freedom analysis to illustrate the bottleneck of really large scale JO for SMO. Optimum (0,1)

Table 1. Degree of freedom analysis comparison between large scale and small scale JO small scale Large scale Clip size = 400nmx400nm (10 clips) (10,000 clips) # source variables (16 pixels ~200 ~200 per radius) # mask variables (15nm 7100 7.1 million ! pixel size)

Max x + 2y x>0 y>0 x+y<1

Simple Example Optimizer needs JO in order to move in a direction whose x gradient would degrade quality.

Optimizer Stuck (1,0) Start (0.25,0)

Sequential x then y

Fig. 1. a simple linear optimization counter example

In the face of this difficult scaling, it is tempting to consider a more scaling-friendly sequential optimization of mask and source, in which the optimization is done iteratively between source alone and mask alone until a final solution is converged, despite the poor performance that has been reported in the literature for such sequential procedures [1, 4]. However, at a fundamental level joint optimization has strong inherent advantages in finding a true optimum compared to sequential modes. This can be illustrated by the simple linear counter-example shown in Fig. 1, which involves only a single "source" variable x and "mask" variable y. A simple objective function z=x+2y and linear constraints are employed, such that the overall optimization problem is Max z=x+2y such that x>0, y>0, x+y <1

(1)

As shown in Fig.1, if we use (0.25, 0) as a starting point, and attempt sequential optimization beginning with the x variable, we first drive the solution to the right corner of the solution space at (1, 0), where the sequential optimizer gets stuck, since any move in the y direction is ruled out by the constraints. A JO is needed in order to move in a direction to the true global optimum at (0, 1). This of course is not a realistic example problem, but in SMO terms the optimum adjustment can include a source change (x) that is desirable despite degrading EPE, given that EPE can be restored by a simultaneous mask change (y) that may also be beneficial. Our SMO methodology adopts a JO approach in order to obtain the best optimization result. 2. Fundamental questions to be answered In the previous section it is noted that a large scale optimization can be made feasible by extracting representative constructs from a real design. It is essential to consider what is needed to test the validity of such an argument. Basically, this can be done by validating the following 3 hypotheses that our methodology is based on, namely,

Proc. of SPIE Vol. 7973 797308-2 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

#1 #2

The extracted constructs will generate a comparable source as if the full design was used Sources based on a set of constructs extracted from a real design could provide better litho performance for that design than the source based on the set of constructs used in technology definition (generalization) #3 There exist pattern selection methods to generate additional clips that are binding to the source to improve the overall source performance The validation result will be shown in the coming paragraphs.

subproblems

600

Distributed Joint Optimizaiton Solver Acceleration enabled even at low CPU count

4000 CA testcase runtime (s)

Original JO problem

Recombine into a complete JO solution

3. Unique large scale joint optimization method by parallelization As mentioned earlier the scaling of a sequential JO is in the order of Area^3. Parallel processing is a promising key to solve this infeasible large scale problem but there are severe challenges of processing mask clips in parallel when choosing a source and at the same time still maintaining a JO. The core idea of the parallel SMO engine is to distribute the optimization to overcome the scaling bottleneck. Our algorithm generates subproblems and distributes them to many nodes to solve locally while maintaining a truly joint and non-sequential optimization when recombined. Fig. 2 shows the schematic of this JO parallelization algorithm. Our parallelization algorithm can directly exploit massive parallel computing platform with a large number of cluster nodes and achieves very good scaling behavior as shown in Fig. 3. With only 120 cores a ~2500X improvement in run time was achieved already.

500 400 300

Serial JO run ~50k sec (14hrs); Area =168um2

200 100 0 0

50

100

150

# of cores

Fig. 2 schematics showing the flow of the parallel Joint Optimization in SMO

Fig. 3 Run time scaling performance for parallel JO w.r.t. number of CPU cores

However it is very important to confirm that the solution quality provided by this JO parallelization can match that from a high quality small-scale optimizer which solves the JO problem in the usual serial mode. A set of 2000 22nm enumerated contact holes clips was used to run both the parallel JO and the small scale optimizer with different number of clips included. Fig. 5 illustrates comparison of Common Process Window (CPW) obtained by this optimization parallelization techniques compared to the serial JO. The achieved CPW values match exceptionally well, indicating that the parallelization method is performing true JO, not sequential SO and MO. 4. Methodology to determine Binding Constructs As a matter of basic optimization theory, only those constraints which are active are needed to define (bind) the solution. All inactive inequality constraints are ineffective in impacting the converged objective function, so the inclusion or exclusion of them does not change the validity of the result as an optimal solution. In SMO, clips that are determined to be binding constructs share this property with the binding constraints they contain. The optimality of the final source is completely determined by the set of binding constructs only. All other clips are essentially rendered redundant. If the binding constructs can be extracted systematically one can use them (and their associated source solution) to do further analysis or to train the clustering algorithm to make the pattern selection task more effective. However extracting binding constructs is not an easy task. First one needs to be able to run the optimization in large enough scale so the set of binding constraints can be exhaustive. Second the extraction method has to be able to extract the binding clips even though near-degeneracy exists among the constraints, which is very common in SMO. When degeneracy occurs, many optimizers only assess optimality under the degenerate constraints in an ill-defined way that does not represent the common constraint's true impact. Also constraints that are not exactly binding can still be binding after later

Proc. of SPIE Vol. 7973 797308-3 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

modifications to the problem, since the criteria of deciding a binding constraint are not universal to different constraint types which have different units. Taking advantage of our fast parallel SMO engine as a building block we develop a prototype functionality called Progressive Deletion (PD) that can extract all binding constructs based on the constraints that are binding or near binding, without having to solve the full problem. It can essentially determine the particular feature content which limits the process window attainable by the optimum source. One aim of PD is to repeatedly flush out different occurrences of similarly difficult content (difficult within problem contexts defined by similar overall clip populations) even though these features are present in variant forms within different clips. The criteria for identifying binding constructs are set by certain threshold values in some of the SMO internal parameters. An experiment was done to demonstrate the concept of PD using enumerated clips for a contact hole layer, as described in ref.[2]. A JO is repeatedly performed on the enumerated clips and rerun multiple times after the binding clips from the previous iteration have been deleted. This loop repeats until e.g. the Common Process Window (CPW) Objective reaches a less-rapidly-changing value or the iteration count reaches a predetermined number. In Fig.4, the blue curve plots the CPW vs. the first 70 interactions and the red curve zooms in only the first 15 iterations. Accompanying the curves are the set of binding constructs and the corresponding optimized source for that iteration. It can be seen that at first CPW is increasing, indicating the deleted clips are indeed binding, and partly saturating after ~ 15 iterations, once the most difficult clips have been removed from the population. The sources are also changing slowly but the main features in the sources seem to almost stabilize after the first 15 interactions also. The set of deleted clips can thus be considered to constitute critically binding or near-binding constructs, since further removal of clips will not change the CPW at such a dramatic rate. This heuristic provides a clip set which contains a collection of patterns that are sufficiently difficult as to be "almost binding" in problems that resemble the problem defined by the overall population, with this collection being broader than is present in the smaller set of clips that are exactly binding only in the precise problem defined by the initial optimization. Sources

0

10

PW Obj

20

30

40

50

60

70

1.55 1.45 1.35

Iterations

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

2000 clips total

Removed Clips

Concept of Progressive Deletion using Parallel SMO After first 15 deletions, PW stabilized Fig. 4 Concept of Progressive Deletion using 2000 contact hole level enumerated clips. Iteration 0 includes all clips

With a tool like PD it is able to test our hypothesis #1 using an enumerated contact holes testcase. PD was run to obtain a set of binding and near-binding constructs, and use those constructs only to run full SMO each time with a different number of binding constructs and compared to the reference case that run the total population of 1000 clips. Fig. 6 shows the relative CPW from SMO runs using number of clips from 19 to 60. We observed that CPW is within 1% of the reference case even with 19 clips only. This data validate the hypothesis #1 that a small set of extracted clips can produce a source comparable to using the full design.

Proc. of SPIE Vol. 7973 797308-4 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

2000 CA testcase paralle vs Serial JO comparison

0.12

CW

Parallel JO Serial JO

0.06 0.04 0.02

Common window

0.146

0.1 0.08

CA 1000 testcases

0.147

+1% of full case CW

0.145 0.144

full case CW (1000 clips)

0.143

-1% of full case CW

0 0

200

400

600

10

number of clips

Fig. 5 Plots of CW of various number of clips optimized showing excellent match between our parallel JO and commercial serial JO

20

30 40 50 number of clips used

60

70

Fig. 6 The CW from using 19, 36 and 61 binding constructs are matched to the 1000 full population to within 1 %

Because binding constructs come directly from an optimization that uses what is typically a massive set of diverse clips (constituting a surrogate for the entire technology or layout), they have a very desirable property that other heuristic “litho difficulty” estimates can't easily provide. In particular, they take into account the collective “clip interaction” effects that are inherent to the problem as a whole, or to similar variants. In order to explain this important concept, let us start with a simple example of optimizing 2 features, a 80nm pitch and a 160nm pitch grating in a 193nm 1.35NA scenario. From simple litho difficulty estimation, the 80nm pitch (P80) is more difficult than the 160nm pitch (P160) because of the higher spatial frequency content it comprises. So P80 is much more difficult than P160. Imagining for purposes of discussion that we have to compress the number of clips down to 1, then definitely we will pick P80. In that case the optimum source will be a dipole customized to P80 but such a result would clearly be detrimental to P160. In contrast, a typical SMO run would try to strike a balance between these clips, and thus a non-dipole source will be obtained under typical tolerances. In a more complex case where layout content emphasizes both of these underlying pitches in variant forms, the problem difficulty is different from that with either pitch alone, but the combination of the two pitches as simple clips (along, in practice, with a very small number of other well-chosen elementary patterns [6]) can constitute an effective surrogate during source design for more complex layout content containing these pitches. It is, of course, the need to choose these representative patterns well which makes the problem non-trivial. These clip interaction effect will be illustrated through the use of binding constructs in a more realistic example later in this paper. 5. Expanding SMO to full chip by pattern selection Our optimization parallelization method achieves a very good scaling in speed compared to our serial version. However due to the inherent complexity of a JO, it is still challenging to handle more than hundreds of thousands of clips (equivalent to thousands of um2 area) with affordable number of CPUs. In order to expand this SMO capability to handle a full chip design and obtain the associated optimum source, a process which can achieve very high compression ratio on clip count while still sampling critical clips correctly is required. There are several ways to achieve the compression with different degrees of precision and bandwidth. The staging of these individual components depends on the applicability of each component in different scenarios. The simplest form of LSSO just involves a parallelizable SMO engine that can handle many clips that are already screened by engineers as qualifying features. A true JO is then performed and the run time bottleneck lies in the SMO process. However a few thousand clips may still not be sufficient if the intuitively preselected designs are not truly representative. Most of the time the users have no rigorous idea on the clip’s relative importance ahead of time. In this case, an added step of automatic patterning selection from a large design will allow many unique clips (~100K to ~billions) to be extracted and down-selected to representative elements after our clustering methods provide a very high compression ratio. So a full chip capable LSSO might include a first step of automatic pattern selection, which could employ some heuristics of “litho difficulty” in the sampling to reduce the total clip counts to several thousands, and followed by the parallelized SMO engine to do exact JO within timing budget. This flow is shown in Fig. 7. These heuristic methods in pattern selection might not provide exact correlation between representative elements and binding constructs but

Proc. of SPIE Vol. 7973 797308-5 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

somehow still effective in reducing the total clip count to a manageable level while not missing critical binding clips, not to mention that a training of the algorithm is also possible to achieve better correlation with binding constructs. More detail of pattern selection and their use of heuristics will be elaborated later. Create target layer dataset

Find all unique instances ~10 K’s

Output the gray scale representation of the layout

Baseline Dataprep

Layout Counting

Layout Characterizing

Libraries, Predefined Clips Start with mm2 size?

‘Binding’ clips

Layout Clustering

Cluster 1

Unique clips

clips Pre-processed

Layout clustering

ranking methodology

Source Optimization

LSSO Parallel SMO engine

Clips with context

1000’s

Cluster N

Representative Element Selection

RE 1

Optimum source

Cluster 2

RE 2

RE N

Layout Composing SMO compatible data Add context Clean up slivers etc.

Fig. 7 A possible expanded flow showing pattern selection blocks in yellow and the parallel SMO engine (LSSO) block in pink.

6. Concept of Pattern Selection There are many ways to sample a large design. The simplest one is to randomly sample the full chip, but such sampling is useful only if the sample size is very large. To make the process more sophisticated one can extract all clips from a design and perform a pattern clustering process to find a representative element in each cluster. However in a real design nowadays, especially in 22nm node and beyond, the number of fully unique patterns could easily be in the order of billions. There are many non-critical areas and also many clips that are identical or mirroring invariant. In order to perform it efficiently, one can break down the pattern selection into different steps that handle the selection differently with approaches having different compression efficiency and preciseness. Our pattern selection method is composed of different sub-blocks, namely pattern counting, pattern transformation, pattern clustering and pattern compose. Each individual sub-block will be discussed in more detail below. 6.1 Pattern Counting Pattern counting is the very first step in the pattern selection task. It can also be viewed as a clustering process that clusters redundant clips. It basically extract clips of certain specified size centering on some reference points called anchored points. Since most unique clips are 2D in nature and long lines or spaces are similar to each other, anchor points are determined usually by searching from the vertices of polygons when one is scanning the design. In order to minimize the number of clip outputs and clip redundancy, a scheme that uses projection points from vertices and then simplifies the anchor selection to reduce the number of final anchor points is adopted. Fig. 8 shows an example of how the clips are identified. The number of unique clips can also be minimized if one consider mirroring and (in cases where corresponding source symmetries are prescribed) take 90 deg-rotation symmetry of the clips into account, since these clips should contribute to the optimum source in an identical way. 6.2 Pattern Transformation and Clustering

Proc. of SPIE Vol. 7973 797308-6 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

Pattern transformation is the process of transforming the representation of clips from vertices into some abstract space so it can be efficiently clustered by existing clustering method. Since the abstract data representation is tightly coupled with the clustering method , both processes will be discussed together. Fig. 9 illustrates the basic concept of pattern transformation and clustering. After the pattern counting step a set of 2D pattern clips with geometrical representation is obtained. Each clip is transformed into a single element or a multidimensional vector usually based on frequency domain representation such as diffraction order amplitude. The abstract quantity then goes through a projection process that maps the metric into an N-Dimensional space and the clustering algorithm will group data points that are “geometrically” close to each other to form a cluster. In order to represent a large data set efficiently, it is common practice to make some frequency domain decomposition by segmenting the data into observations and either sampling or exhaustively scanning. In our case, the observations correspond to local layout configurations up to the optical radius. Since a lithographic layout is dominated by the worst yielding pattern, sampling entails a significant risk and is to be avoided in favor of exhaustive scanning. Previous work in testing of OPC introduced the use of k-means clustering with distance constraints on vectors of orthogonal Walsh Decompositions [10, 11]. In general, clustering divides the observations into sets which are represented by a single element or some abstract vector derived from local observations. If the data observations are not naturally clustered but rather are uniformly distributed in the space, clustering reduces to vector quantization, in which the subsets have approximately the same number of observations. In either case, we can speak of an effective compression ratio of the full set to the number of observations (layout clips) used to represent the full layout for the purposes of source optimization. Many clustering algorithms exist and generally trade off performance for representation error. Due to the large number of observations in design and the requirement for tight cluster bounds (to avoid missing dominating or binding patterns) a clustering algorithm emphasizing performance is chosen. Most of the time the users do not have a-priori knowledge of the granularity of the data so an unsupervised hierarchical agglomerative clustering method [13] is adopted to let the algorithm to determine optimum number of clusters. However our algorithm still has the capability to control the number of clusters (compression ratio) when there is a need. Fig. 10 shows two example clusters, each of which contains a different number of cluster elements.

clip

Transform

2D pattern tile

Layout RE

[x0,x1,«,xN] Pattern tile N-dim vector representation

Projection Process

Ideal case: low Within Cluster Scatter (WCS) and high Between Cluster Scatter (BCS)

Fig.8 Layout Counting example

2D for visualization

Projection to N-D space

Fig. 9 Pattern Transformation and Clustering flow

6.3 Pattern ranking Pattern ranking is very useful to enable us to select the most difficult clip to use. The concept of lithography difficulty (or friendliness) is widely used in pattern analysis with known source and RET. The source is known and the actual diffraction orders are accurately known. For example, a source dependent method computes a modulation transfer function to describe the filtering effect of RET (including source geometry if pre-determined) and computing a mask spectrum. If the mask spectrum contains elements energy outside the allowed frequency regions defined by the modulation transfer function, imaging will be impossible [12]. However it is less obvious that there is any correlation with real difficulty during the early stages of an SMO process when the source is not known. No a prior diffraction order info exists. A new heuristic method of difficulty estimation

Proc. of SPIE Vol. 7973 797308-7 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

was explored, and is to some extent complementary to the above one. We called this metric Litho Difficulty Estimate (LDE) . LDE operates using a generic source assumption and can be modified to work for a pre-determined source. Our LDE efficiently formulates a fast estimation problem based on the diffraction order (DO) content, both amplitude and phase, of the patterns, among other things. LDE consider the tiles’s frequency content beyond what is allowable by lithography tools for better difficulty assessment. The LDE’s output for each tile, in general, is a vector that determines the difficulty of that tile from different “perspectives”. Each perspective may refer to a particular measure of difficulty. Equation (2) shows a generic form of LDE, all quantities involved are calculated from the diffraction orders and Fig. 11 shows the plots of the diffraction order information used in calculating LDE.

Cluster RE by LDE

Fig. 10 Two Example clusters for Metal features

Fig. 11 Example DO information of a clip

Comparing to the known-source method, our metric is more independent of source geometry but bounds the highest spatial frequencies and weights based on the presence of layout features which impact common window and overlap objectives. It also includes the randomness of layout as a form of difficulty since in general more random layouts are harder to optimize then more regular layouts, regardless of the pattern density. In general, LDE could be a function of many heuristic terms, such as LDE = f(DO amplitudes, DO phase, clip randomness, radial weighting function, angular weighting function, process variation conditions, ….)

(2)

With LDE giving a numerical value it opens up the possibility to systematically rank patterns in order to facilitate clustering. 6.4 Pattern Compose Pattern Compose is basically re-extracting the final representative elements in the clip size specified by user input and may include the context around the clip depending on how people run SMO. This will provide an efficient input data structure to SMO as well as providing flexibility for user to select another layer which can be different to the layer used in Clustering. For example, clustering can use a retargeted and OPC’ed mask layer while the input to SMO is the design layer, or vice versa. 6.5 Selection of Representative Element The selection of the set of representative elements is very critical, because it defines the success of the source for the whole chip [6]. In the current approach, this subset of clips is selected by clustering the full set and taking one (or more) representatives from each cluster. The basic idea is that clustering compresses together within a group patterns that are very similar, and that then a representative element algorithm picks one (or more) from them to represent them all during source design by SMO. Hence, the REs chosen to drive SMO are specified by how the clustering technique defines which clips are considered similar enough to be clustered together. In order to potentially further constrain the number of training clips to a feasible number for some optimization process such as source design an optional third pass or meta-clustering is introduced. The centroids of clusters are clustered again to a possibly much smaller set to satisfy some runtime constraint. It is decided to use k-means clustering [13], which accepts the number of clusters as a parameter for this special case. 7. Experimental hypothesis validation data

Proc. of SPIE Vol. 7973 797308-8 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

In section 4 hypothesis #1have been validated. In order to validate our hypotheses #2,3, a large testcase containing thousands of design-legal clips was synthesized. This is because during the process and RET development phase there is not many random logic designs at our disposal. These features are usually used to qualify a patterning process and are important to process integration. We use a method called pattern enumeration to generate clips by a combinatorial use of a set of design rules so the resultant clips are design rule legal. The generation can be done in an exhaustive combinatorial way if there are only a few rules to conform or a statistical (random) generation if there are many rules, in order to control the number of clips output. A reflective boundary condition has been chosen. A 22nm Metal level design rules was used to enumerate thousands of clips as our testcase. Fig. 12 shows sample enumerated clip outputs with the reflective boundary condition adopted. The validation metrics used for these experiments need to be picked specifically. In this case an Optical Rule Check (ORC) method was used for all the clips and the Process Variation (PV) bands were measured for a small number of selected technology qualification features of which the critical CD site locations are known. The DVI/PVI values [14] were also extracted for quantitative comparison. In ORC we set up markers for different types of violations. For nominal CD violation, Nom_x(y) and PW_x(y) represent a nominal fail and a process window fail for a feature type x respectively where the CD criterion is given by the value y and x is either width, space, pinch or bridge type. Fig. 13 shows an example of catastrophic markers (red) and less severe markers (yellow) in a failed enumerated clip. In order to qualitatively compare the similarity of output sources, we use the metric Source Correlation (SrcCorr or Ca,b) which is defined by [6] ∑i sa,i sb,i (3) Ca ,b = 2 2 ∑ sa,i ∑ sb,i i

i

Where Sa,i, Sb,i are source pixel-i of source A and B respectively; and 0 ≤ Ca,b ≤ 1

Fig. 12 Examples of enumerated Metal level clips and its reflective boundary condition

Fig. 13 Example plot of ORC markers. Red markers are for catastrophic fails and yellow markers are for less severe fails

Several experiments with varying number of clips and SMO parameters, such as CD tolerance, constraints types etc., have been tried. Table 4 summarizes the most important results of the validation experiments. The table is showing the percentage change in ORC counts in each category from the POR case. Both qualitative (pass/fail of process qualification patterns) and quantitative metrics are shown. The middle 2 columns belong to the case that source optimization and verification are done with the training clip set. Case A uses the REs from the clustering run and case B used the clips from PD. Both cases show a significant improvement in ORC counts in the training set compared to the POR source except for a small increase in a bridging through process variation. The most impressive improvement is that some ORC error types are completely removed. Pattern selection by clustering seems to provide a benefit over a small scale optimization. Please note that this clustering is not trained from the PD BC clips. The next 2 columns to the right correspond to the case that the same sources are used but the verification is done on a realistic random logic product design. Results from using both case A and B sources are shown. An even larger improvement to the POR source with all ORC errors reduced or removed is seen. This suggests the source from a set of binding constructs from PD give a better optimization not only on the training set but also extrapolate well to realistic random design. This is of no surprise to us since our PD block is directly computing the binding constructs from the training set in multiple perturbed-difficulty variants without relying on some more heuristic ranking metrics. The implication of this finding will

Proc. of SPIE Vol. 7973 797308-9 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

be discussed in more detail in the coming paragraphs. Fig.14 shows the source for these 3 cases and apparent differences are observed. The source correlation values referenced to the POR source do not correlate well with the actual lithographic performance but the main dipole-like feature in the source is still observed in all 3 cases. Table 2 Table showing % change in ORC count in different optimization testcases compared to reference POR Layout tested Traditional clips used in RETs definition Random logic macro design cases Source POR Random LDERE generated with Selection Selection Semi-iso space No fail fail No fail

Binding constructs No fail

POR

Other critical

No fail

No fail No fail

NA

Random Selection NA

LDERE Selection NA

BC Selection NA

No fail

NA

NA

NA

NA

Nom_space 73 (catastrophic) Nom_width 0 (catastrophic) PW_pinch 0

109 (+45%) 0

5 (-93%)

26 (-64%)

670

737

200 (-70%)

160 (-76%)

0 (0%)

0 (0%)

0

0 (0%)

2

0 (0%)

0

0 (0%)

0 (0%)

1

1 (0%)

0 (-100%)

0 (-100%)

PW_space 87 (catastrophic) PW_width 18 (catastrophic) PW_bridge 0

2 (-98%) 1 (-99%)

0 (-100%)

436

155 (64%)

196 (-55%)

99 (-77%)

0 (-100%) 0 (-100%)

0 (-100%)

33

38 (+15%)

73 (+121%)

6 (-82%)

1

4

0

0 (0%)

0 (0%)

0 (0%)

PW_space (less severe) PW_width (less severe)

301

73 (-76%) 28 (-91%)

23 (-92%)

2288

1688 (-26%)

672 (-71%)

99 (-96%)

145

3 (-98%) 5 (-97%)

0 (-100%)

648

131 (-80%)

266 (-59%)

89 (-86%)

Fig. 14 (a) POR Source, src_corr=1

2

(b) Source from Random Sampling. src_corr=0.3937

(b) Source from Clustering src_corr=0.3826

( c) Source from Binding Constructs. src_corr=0.4244

Besides the comparison using the ORC markers count, the Calibre PVI/DVI values (see Table 3) and the PVbands data for the process qualification patterns also are obtained. Fig. 15 shows the results for both right-way (RW) and wrongway (WW) features, and both lines and spaces through pitch. The PVband data for the champion source is similar to POR source. We conclude that PVband measurement from only the regular training features (qualification features) may not have the resolution to distinguish the 2 sources performances. ORC markers magnify the subtlety of the sources in 2D patterns with different contexts. This brings up an important issue of context sensitivity in SMO and will be elaborated further in the next paragraph. From Table 3, we observe same trend of PVI reduction for the 2 sources compared to ORC counts. Table 3 PVI/DVI values comparing different sources using source training clip set Source from: POR Random sampling Clustering PVI 0.2483 0.2574 (+3.6%) 0.2390 (-3.6%) DVI 0.0292 0.0296 0.031

Binding Constructs 0.2362 (-4.9%) 0.0290

Proc. of SPIE Vol. 7973 797308-10 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

R W T hro ug h P itch

W W T h ro u gh Pitch 12

POR BC Spec

6 4 2 0 P80

P100

P 120

P 140

P 160

IS O

R W Space Pitch 16 14 12 10 8 6 4 2 0

POR BC Spec

PV Band Width (nm) Normalized PVbands (arb. unit)

8

PV Band Width (nm)

PV Band Width (nm)

PV Band Width (nm) Normalized PVbands (arb. unit)

10

10 8

POR BC Spec

6 4 2 0 P 160

P 200

P240

P280

P320

IS O

W W S pace P itch 12 10 8

POR BC Spec

6 4 2 0

S P80

SP 100

SP 120

S P 140

S P160

Spac e IS O

S P160

S P200

S P240

SP 280

SP 320

Spac e IS O

Fig. 15 Similar PVband performance of through-pitch lines or spaces between POR and Source from binding constructs

This general trend of ORC and PVI/DVI results show the consistent performance difference between the source from different pattern selection methods. This difference correlates well with the low overlap of clips among the methods before any training of the clustering method, as shown in Fig. 16. 8. Discussion of improvement to optimization From the validation experiment some important learning was obtained that may allow us to obtain a better optimization that is unique to LSSO. Fig. 17 shows the effect of how context of clips affect optimization. The clip set for SMO in the POR case includes an infinite grating of 80nm pitch. However its performance on a finite 80nm pitch grating with a nonrepeating context (as appeared in real layout) still has printing violation. In contrast, in the “Binding Constructs” source case our use of a large number of contexts from a large set of enumerated clips has allowed us to recover this finite 80nm pitch feature. This illustrates that context around a clip is useful information for SMO in general and taking context effect in LSSO could provide even better benefit.

P80 @ POR

L

ed as -b DE

g rin ste u l c

35

RE

7

Sca tt

er-b ase

23

29

d clu

ster ing R

Pitch 80nm performance disturbed by context effect

E

BC 2 26

27

1 22

Rand clips

POR source

Champion source using BC

Example Enumeration set

Fig. 16.The Venn diagram showing the clip overlap among representative elements, random clips and the binding constructs set. Population of each set is approximately 180

Fig. 17. The perturbation of the 80nmpitch structure by the context effect indicating the importance of context effect

9. Conclusion on benchmarking among different critical constructs generation methods Thus far we have discussed our pattern selection scheme to generate representative clips for LSSO and also developed a unique way to directly compute binding constructs. The performance of different methods that generate binding

Proc. of SPIE Vol. 7973 797308-11 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

constructs, either by pattern selection or direct computation using PD, was compared. These data suggest that the use of directly computed BCs could yield a better optimization given that the same number of clips are being compared. This agrees to our earlier argument that the exclusion of pattern interaction in some litho difficulty estimates for ranking and clustering might reduce the chance of getting a sufficiently optimal source. In the literature most of the algorithms suggested belong to this category [15,16]. Our PD method not only can identify binding constructs but can also be used as a gold standard in benchmarking and calibrating other pattern selection techniques that use various forms of heuristics that may tradeoff accuracy for speed.

CONCLUSION Joint optimization (JO) of source and mask together is known to produce better SMO solutions than sequential optimization of the source and the mask. However, large scale JO problems are very difficult to solve because the global impact of the source variables causes an enormous number of mask variables to be coupled together. This work presents innovation that minimize this runtime bottleneck. The proposed SMO parallelization algorithm allows separate mask regions to be processed efficiently across multiple CPUs in a high performance computing (HPC) environment, despite the fact that a truly joint optimization is being carried out with source variables that interact across the entire mask. Building on this engine a progressive deletion (PD) method was developed that can directly compute "binding constructs" for the optimization. This method allows us to minimize the uncertainty inherent to different clustering/ranking methods in seeking an overall optimum source that results from the use of heuristic metrics. An objective benchmarking of the effectiveness of different pattern sampling methods was performed during postoptimization analysis. The PD serves as a golden standard for us to develop optimum pattern clustering/ranking algorithms. With this work, it is shown that it is not necessary to exhaustively optimize the entire mask together with the source in order to identify these binding clips. If the number of clips to be optimized exceeds the practical limit of the parallel SMO engine one can starts with a pattern selection step to achieve high clip count compression before SMO. With this LSSO capability one can address the challenging problem of layout-specific design, or improve the technology source as cell layouts and sample layouts replace lithography test structures in the development cycle Throughout this paper we validated the basic assumptions, and thus justify our approach to a practical LSSO. Overall, this paper shows how having such an approach can be used to address variability among different users' designs, with the aim of delivering a better RET process for specific layouts. These sources can be realized using pixelated illumination from a programmable illuminator, or alternatively from a custom diffractive optical element (DOE) [17].

ACKNOWLEDGMENTS The authors would like to thanks for management support from Scott Mansfield, Thom Sandwick, Tim Farrell and Ronald Luijten from IBM and Roman Gafiteanu, John Sturtevant from Mentor Graphics. Special thanks to Kostas Adam, Aasutosh Dave, Alexander Tritchkov and Gandharv Bhatara of Mentor graphics and Alex Wei of IBM for very helpful technical discussion. This work was performed by various International Business machines (IBM) Research Facilities, IBM’s Semiconductor Research and Development Center, and at Mentor Graphics’ Design to Silicon Center. Some layout and dataprep verification was supported by the independent alliance programs for SOI technology development and Bulk CMOS technology development. References [1] A. E. Rosenbluth et al., “Intensive optimization of mask and sources for 22nm lithography”, Proc. SPIE vol. 7274, 727409 (2009) [2] D. O. Melville et al., “Demonstrating the Benefits of Source-Mask Optimization and Enabling Technologies through Experiment and Simulations”, Proc. SPIE vol. 7640, 764006, (2010) [3] Kehan Tian et al , “Applicability of Global Source Mask Optimization to 22/20nm Node and Beyond”, Proc. SPIE vol. 7973, (2011) [4] S. Hsu et al., “An innovative Source-Mask co-Optimization (SMO) method for extending low k1 imaging,” in Proc. SPIE vol. 7140, 714010 (2008) [5] V. Tolani, P.Hu, D. Peng, T. Cecil. R. Sinn, Linyong (Leo) Pang, “Source Mask Co-optimization (SMO) using Level Set Method”, Proc. SPIE vol. 7488, 74880Y-3 (2009) [6] Kehan Tian et al., “Benefits and Trade-Offs of Global Source Optimization in Optical Lithography”, Proc. SPIE. vol. 7274, 72740C (2009) [7] A.E. Rosenbluth et al., “Optimum Mask and Source Patterns to Print a Given Shape,” JM3 1, no.1 p.13, (2002).

Proc. of SPIE Vol. 7973 797308-12 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

[8] A. Wächter and L.T. Biegler, “On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming,” Mathematical Programming 106, no.1 (2006). [9] INFORMS Computing Society Prize, http://www.informs.org/Recognize-Excellence/Award-Recipients/AndreasWachter. [10] DeMaris et al, “An Information Retrieval System for the Analysis of Systematic Defects”, Proc ICTAI 2004, p. 216-223; [11] DeMaris et al, “Automated Regression Test Selection for Optical Proximity Correction”, ISMI Symposium on Manufacturing Effectiveness 2006 [12] A. Torres, “Challenges for the 28 nm half node : is the optical shrink dead?”, Proc. SPIE vol. 7488, 74882A (2009) [13] Chris Fraley, Adrian E. Raftery, “Model-based clustering, discriminant analysis, and density estimation”, Technical report No. 380. University of Washington, October 2000. [14] J. A. Torres, C. N. Berglund “Integrated circuit DFM framework for deep sub-wavelength processes” Proc. SPIE vol 5756, 39 (2005) [15] J. Ghan, N. Ma, S. Mishra, C. Spanos, K. Polla, N. Rodriguez, L. Capodieci, “Clustering and pattern matching for automatic hotspot classification and detection system”, Proc. SPIE vol. 7275, 727516-1 (2009) [16] Y. Cao, W. Shao, J. Ye, R. J. G. Goossens, US Patent US2010/0122225 A1 [17] Kafai Lai et al, “Experimental Result and Simulation Analysis for the use of Pixelated Illumination from Source Mask Optimization for 22nm Logic Lithography Process“ Proc. SPIE vol. 7274, 72740A (2009)

Proc. of SPIE Vol. 7973 797308-13 Downloaded from SPIE Digital Library on 18 Apr 2011 to 129.34.20.23. Terms of Use: http://spiedl.org/terms

Joint Optimization of Data Hiding and Video Compression

Joint optimization of fleet size and maintenance ...

On the Joint Design of Beamforming and User ...

Design and Optimization of Multiple-Mesh Clock Network - IEEE Xplore

Masks-Of-Mexico-Tigers-Devils-And-The-Dance-Of-Life.pdf

Design and Optimization of a Speech Recognition ...

Design and Optimization of Multiple-Mesh Clock Network - IEEE Xplore

Design and Optimization of Thermal Systems by yogesh jaluria.pdf ...

Design and Optimization of an XYZ Parallel Micromanipulator with ...

Design and Optimization of Power-Gated Circuits With Autonomous ...

Masks of the gifted.pdf

Design and Optimization of Scientific Workflows, UC ...

Masks of the gifted.pdf

A Stochastic Optimization Approach for Joint Relay ...

Design of a muscle cell-specific expression vector utilising ... - Nature

Vaults-Mirrors-And-Masks-Rediscovering-US-Counterintelligence.pdf

Handas Animal masks

PJ Masks Stickers.pdf