INFERRING REPEATED PATTERN COMPOSITION IN ...

Viewer
Transcript

INFERRING REPEATED PATTERN COMPOSITION IN NEAR REGULAR TEXTURES Yunliang Cai and George Baciu GAMA Lab, Department of Computing Hong Kong Polytechnic University, Hung Hom, Hong Kong ABSTRACT Visual patterns generated by color patches, texture regions, and repetitive textons in an image can be organized into higher-level structural forms such as geometric shapes, arrays, and partition groups. Understanding the information content formed by these visual pattern compositions is important both from a theoretical point of view as well as in the robust implementation of many image processing applications. In this paper we propose a new method for building pattern compositions and inferring the high-level structural forms over near regular textures. We exploit the shape geometry of repeated patterns to interpret pairwise connections between patterns and generate the abstract structural form by unifying the local connections. The inferred structure can reflect the organization of multiple repeated patterns and can be used in the classification of texture structures. Index Terms— Repetitive patterns, Near regular texture, Shape completion fields 1. INTRODUCTION In this paper we address the question of how to organize local patterns into a high-level structural forms. Composition of patterns is an important objective in computer vision especially when a higher-level pattern representation can improve the performance of pattern recognition and enhance the efficiency of visual data organization. We are particularly interested in the composition problem in near-regular textures (NRTs). This particular problem is very common in real world texture analysis and man-made material evaluation. The NRTs in an image often have local generative properties. They also have sufficient structural organization globally. Extracting structural forms from NRTs reveal higher-level structural information pertaining to object formation such as shape and category relationships. This is an extremely important textonbased composition framework. To the best of our knowledge, there are no algorithms to date that search for and extract connectivity relations between low level NRT patches and higher level structural forms for computer vision problems. Understanding the pattern composition of image is an active research area in computer vision. The advance of image parsing [1], higher-level segmentation [2], symmetry detec-

tion [3] provide various methods to unify isolated patterns into meaningful high-level structures. However, these methods often rely on the appearance information of the patterns of patches, and require additional training process or complex pre-defined templates. Instead of following the general composition problem, we find that pattern compositions can be recovered from the geometry of repeated patterns, especially in NRT images. We show that unsupervised composition can be achieved by measuring the geometric coherence of locally repeated structures. Our main contribution in this paper is the structure inference method based on repetitive patterns. The repeated patterns are considered as texton [4], extracted by a detection method in [5]. We measure the geometric coherence among the repeated patches detected from an NRT image. This is obtained by constructing the completion fields using a method similar to [6]. The completion fields provide structural guided connection between adjacent patches. Our resulting structures depend only on the orientation parameters extracted from the detection of NRTs. Our results on composition inference are similar to symmetry detection performed in [3]. However, our work differs significantly from the previously cited work by way we conduct the symmetry detection problem. This is often solved by deformable template matching, where the higher-level structure is defined as a template in order to find the optimal matching in the input image. Symmetry detection attempts to ’verify’ certain symmetry structure rather than to ’infer’ it from low level cues. The composition inference is related to the similarity segmentation in [2], where both methods depend on the patterns that have significant repetition in images. However, [2] focuses on measuring similarity, and does not provide explicit combinational structure over the detected patterns. Our work efficiently detects and extracts explicit combinational structures of low-level NRTs. 2. RELATED WORK The pattern composition problem has deep roots in many computer vision problems. The unit structure pattern, call texton, has been found very useful in pattern recognition [4]. Using unit patterns and their geometric organization, one can infer much higher level image structure. In Tu et al [1] the

Fig. 1. Overview of the pattern composition algorithm. (A): Repetitive pattern detection [5]; (B): extract significant orientations from each patch; (C): generate oriented structure completion fields (SCF); (D): extract local descriptor for neighbor topology in SCF; (E): structure classification & visualization. Details of each step are discussed in sections 4 and 5. authors propose the concept of a parsing graph, utilizing low-level segmentation cues to infer particular object shapes. They use object specific knowledge to improve low-level segmentation. Bagon et al [2] apply the composition idea in image segmentation, unifying the small object parts in meaningful object regions. However, in all these methods, the composed patterns are collected in sets, and not any the high-level structural forms are provided. In general pattern composition, Kemp and Tenenbaum [7] provide a voting method for grouping discrete patterns into unified structures such as trees, rings, and grids. They use generative graph grammars and adjacency information to calculate the posterior probability of different structures, then pick up the best structure among them. In particular, symmetry detection [3] can be considered as a special model of pattern composition, where unit patterns are assigned to a global structure if they follow the same symmetry rules. 3. THE MAIN IDEA We refine the pattern composition problem as the structural form discovery problem for NRTs, because an NRT image contains sufficient information for structural representation, and the structural forms can be used for real applications, as summarized in [3]. Non-regular or stochastic textures have little structural information such that pattern composition inference could be meaningless for them. The composition inference is accomplished by a series of steps in a bottom-up analysis. We first extract the repeated unit pattern in the image by the detection method in [5]. The detection result includes a set of quadrilaterals and the corresponding geometric parameters for representing the repetitive local structures. The resulting parameters describe the Lie group action of the quadrilateral windows, encoding the specific affine transforms between window pairs. The Lie-group-based parameters lend the connectivity information to adjacent pattern patches. Suppose two adjacent patches belong to the same global structure. The shape variation between them should be sufficiently small, and the difference of the corresponding parameters should be small too. We define an articulated orientation for each patch, and generate

the structure completion fields (SCF) using a similar method as in [6] to constrain the local patch connections. The local patch connection can be obtained by analyzing the geodesic distance between any spatially adjacent patch pairs in SCF. The geodesic information is extracted from SCF using front propagation, and the neighbor set for each detected patch can be generated accordingly. Since the topology of local neighbors reflect the local property of certain global structure, we can classify the composition structure by the common neighbor topology. Figure 1 shows the overall process of the composition inference. 4. DETECTION OF REPETITIVE PATTERNS We now briefly introduce the detection method. For an image I : Ω → R, we define vector-valued function Ip : Ω → Rd(p) which represents the local patch centered at p ∈ Ω. d(p) is the number of pixels in Ip , which solely depends on the shape of Ip . Ip is free-form quadrilateral. Let Gp be the affine transform that warps an m×m regular square patch to quadrilateral Ip , such that Ip ◦G−1 p is an m×m square. We assume a near-regular texture is generative, and there exists an m × m invariant texton T that satisfies 2 ||T − Ip ◦ G−1 (1) p || < for each p in the texture area with sufficiently small. Gp can be written as a 3 × 3 matrix obtained by exponential map: Gp = Exp(akp Ek ) (2) in Einstein notation, as Ek for k = 1, . . . , 6 are the matrix basic for the tangent space [5]. The algorithm starts with an initial patch as template T and a seed region Λ ⊂ Ω, it then applies the following steps: • Use {T◦Gp }p∈Λ as templates, detect perspective repetitive pattern patches by normalized correlated convolution (NCC), record NCC response R and update Λ. • For each p ∈ Λ, align the detected patches in Λ by minimizing the functional: 1 X e p )||2 ||Ip −eIq (ap )||2 +||Ip − T(a Fp (ap ) = 2 q∈N (p)

+ ||LAp ||2F + |Gp |2

(3)

where ap = (a1p , a2p , . . . , a6p ) is used as parametrization e p ) = T ◦ Gp (ap ) and eIq (ap ) = for representing T(a −1 Iq ◦ Gq ◦ Gp (ap ). LAp is the Laplacian of Ap = Log(Gp ) over the matrix space spanned by E1 , . . . , E6 . • Go back to step 1 until no more patches are detected. Output {ap }p∈Λ and NCC response R as the result. In equation (3), the first and second terms are for the matching among image patches. The third terms in (3) is for regularization of the patch shapes, and the last is for preventing shape singularity. Additional details of (3) can be found in [5]. 5. THE PATTERN COMPOSITION ALGORITHM The connectivity information among repetitive patterns is yet to be discovered after the detection. Structure Completion Field. We use a method similar to [6] for connectivity inference. The detected patches are first converted into quadrilateral windows and then they are embedded in the NCC response R obtained from Sec.4. In R, we dilate each quadrilateral so that it could cover a larger area overlapping its nearest neighbors. The dilated win2 dow over R is denoted as patch Rp : Ω → Rd(p)r as r > 1 be the dilation ratio. Given the texton size m × m for T, Rp is warped back to mr × mr square with transform Gp (ap ) obtained from (3), as shown in Figure 2A. The orientations of neighbors can then be extracted by applying Radon transform in Rp ◦ G−1 p , picking up the maximum direction lines through the patch center. Obtaining the neighbor orientations for each Rp similarly, we can have the mean patch among {Rp ◦ G−1 p }p∈Λ as Figure 2B shows. The k most significant orientations are selected from the mean neighbor orientations via the K-means, which leads to the truncated orientation set {φ1 , φ2 , . . . , φk }. We represent mr 2 2 the orientations on [− mr 2 , 2 ] ⊂ R as the beamlets set Bk = {bφ1 , bφ2 , . . . , bφk }.

(4)

mr 2 bφi (x) is a 2D function defined on [− mr 2 , 2 ] as shown in Figure 2C, where there is a line segment crossing the origin. In discrete setting, bφi (x) has value R1/||bφi || in the line segment area and 0 in the other, so that bφi (x)2 dx = 1. We extend the parametrization of bφi , having bφi (x, θ) = bφi +θ (x) to represent the variation to the original orientation. The stochastic completion field is obtained by convolution Z Z Ci,p (u) = R(u − x) bφi ◦ Gp (x, θ) R2

R 2

• N(0,σ) (θ)e−||x−u|| dθdx (5) where N(0,σ) is the Gaussian p.d.f. with zero mean and σ as standard deviation. We require σ < π/3. Oriented Nearest Neighbors. The completion fields {Ci }i=1,...,k encode the oriented probability among the detected patches. We consider a geodesic distance between p to q in field Ci as length of the curve γ(t) that minimizes:

Fig. 2. Structural completion fields construction. (A): the set of {R ◦ G−1 p }p∈Λ ; (B): the mean regular patch; (C): the beamlets set Bk ; (D): the set {bφi ◦ Gp }p∈Λ . Lip,q (γ) = −

Z

1

Ci,p (γ(t)) + Ci,q (γ(t))dt

(6)

0

whereas we let γ(0) = p,γ(1) = q. The distance can be efficiently computed by front propagation. We can define the oriented nearest neighbor using the geodesic distance. For field Ci , we seek the nearest neighbor of p ∈ Λ from distinct orientations φi and π + φi respectively i+ N,r (p) = {q |Li (p, q)(γ) < r, γ 0 (0) · (cosφi , sinφi ) > )} i− N,r (p) = {q |Li (p, q)(γ) < r, γ 0 (0) · (cosφi , sinφi ) < −)} (7) − + (p) can be empty if is large enough or r (p), N,r and N,r is sufficiently small. Apply this rules for all Ci we have the oriented nearest neighbors set i+ i− N,r (p) = ∪ki=1 {N,r (p) ∪ N,r (p)}.

(8)

Classification of Structures. After collecting orientated nearest neighbor sets for all p ∈ Λ, we can start the structural form classification to assign the optimal composition form for the detected patches. In practice, we limit the visual structural form to grids, links, and random partitions, as these forms are popular in NRTs. Tree structures are not considered because trees imply hierarchical relations among local patches while in texture images all patches should be equivalent. We define the histogram for each patch Ip : |N,(n+1)r | − |N,nr | T |N,2r | − |N,r | ,..., ) . |N,2r | |N,(n+1)r | (9) The histogram vector above encodes the local topology of each p ∈ Λ, thus if Λ follows a certain global structure, all h,r (p) will share P a common and unique value. The average vector h,r = p h,r (p)/|Λ| can serve as feature vector for structure classification. The overall algorithm is as follows. h,r (p) = (

• Initialize with {Ip }p∈Λ , {ap }p∈Λ , and NCC response R using the algorithm described in Sec.4. • Collect {Rp ◦ G−1 p }p∈Λ out of R, and compute the patch mean R. Extract k significant orientations.

Fig. 3. Examples of composition results. Left to right: the inferred structure, the structure completion field C1 , C2 , and C3 . The rows from top to bottom are corresponding to the grid, ring, and partition structure.

Fig. 5. Additional results in PSU-NRT database. 7. CONCLUSIONS

Fig. 4. Comparison with non-SCF guided meshes. Left the right: the original image, direct connection, our method. • Generate structure completion fields Ci,p for each p and k orientations using (5). i • Extract oriented nearest neighbors N,r (p) for all p. • Extract histogram h,r (p) for all p. • Classify structural form by h,r , connecting all structural valid patch pairs. 6. EXPERIMENTS We test the composition with images from PSU-NRT database 1 . In the database, album Buildings, Normal-NRT, Cloth, and Natural-single center are involved in our experiments. The first experiment tests the composition algorithm to generate the structure completion fields for different structures. The resulting completion fields are shown in Figure 3, where we illustrate three oriented SCF for each structural form. In the second experiment we compare the regularity of the structural guided mesh and the mesh directly generated by adjacent patch connections, as shown in Figure 4. The successful rates (valid ouput/all) of our algorithm are 109/117 in Buildings, 55/62 in Normal-NRT, 72/86 in Cloth, and 14/24 in Natural-single center. To the best of our knowledge, there is no algorithm so far that can provide similar bottom up inference or structural classification. Thus, a comparison with other methods is not directly applicable. Figure 5 shows a number of results we obtained during the test. 1 http://vivid.cse.psu.edu/texturedb/

We propose a pattern composition method, which combines the parameterized patches in near regular textures extracted by [5] into meaningful structural forms. The composition is based on the geometric coherence between adjacent patches and constructed by oriented structure completion fields, the structural form is obtained by the local topology of oriented nearest neighbor set. An immediate application of our method is the structural classification of texture images. 8. ACKNOWLEDGEMENTS This work was partially supported by the Hong Kong RGC GRF Grant PolyU 5101/09E, the HKRITA/ITF ITP/002/10TP, and Hong Kong PolyU G-YK54.

References [1] Z. Tu, X. Chen, A.L. Yuille, and S.C. Zhu, “Image parsing: Unifying segmentation, detection, and recognition,” IJCV, 2005. [2] S. Bagon, O. Boiman, and M. Irani, “What is a good image segment? a unified approach to segment extraction,” ECCV, 2008. [3] Y. Liu, “Computational symmetry in computer vision and computer graphics,” Found. and Trend in Computer Graphics and Vision, 2009. [4] S.C. Zhu, C.E. Guo, Y. Wang, and Z. Xu, “What are textons?,” IJCV, 2005. [5] Y. Cai and G. Baciu, “Detection of repetitive patterns in near regular texture images,” IEEE IVMSP, 2011. [6] L.R. Williams and D.W. Jacobs, “Stochastic completion fields: a neural model of illusory contour shape and salience,” Neural Computation, 1997. [7] C. Kemp and J.B. Tenenbaum, “The discovery of structural form,” PNAS, 2008.

Repeated Play and Gender in the Ultimatum Game