Measuring Compliance and Deviations in A Template ...

Viewer
Transcript

Measuring Compliance and Deviations in A Template-Based Service Contract Development Process Vijil Chenthamarakshan*, Rafah A. Hosn*, Shajith Ikbal**, Nanda Kambhatla**, Debapriyo Majumdar**, Soumitra Sarkar* * IBM T. J. Watson Research Center, New York, USA, ** IBM Research - India, Bangalore, India. {ecvijil, rhosn, sarkar} at us.ibm.com, {shajmoha, kambhatla, debapriyo} at in.ibm.com

Abstract—Asset-based approaches, involving the use of standardized reusable components (as opposed to building custom solutions), are increasingly being adopted by IT service industries to achieve higher standardization, quality and cost reduction goals. In this paper, we address issues related to the use of an asset-based approach for authoring service contracts, where standard templates are defined for each type of service offered. The success of such an approach relies on a compliance checking system. We focus on three key components of such a system. The first measures how well actual contracts comply with the standard templates. The second analyzes compliant contracts containing moderate deviations and reports on the consistent patterns of deviations observed for each template to help identify necessary modifications required in templates to keep them up-to-date with evolving business requirements and customer needs. The third analyzes noncompliant contracts and identifies groups within them such that members of each group have enough similarity to each other to warrant consideration for development of new templates for each group. We describe the architecture of the proposed system, our experience in the use of various text analysis techniques to prototype different system components, and the lessons learned. Keywords- service contract; standards compliance; automatic standard evolution; deviation analysis; document tree matching;

I. I NTRODUCTION In recent years, the IT services business has shifted from a captive market controlled by a few large organizations to a highly competitive one where many players of various sizes compete. As a consequence, standardization, improved quality, lower costs and shorter delivery times have become top priorities of the companies to maintain and/or increasing market share. To achieve these goals IT companies today are moving from labor-based approaches to assetbased approaches. The centerpiece of such movement is the development of standard reusable components created by domain experts, which can easily be tailored to customer’s requirements thus leading to higher quality standardized products/services with shorter turn around time than creating custom solution ‘from scratch’ in the field, even with practitioners who are not subject-matter experts. In this paper, we consider the problem of adopting an asset-based approach for service contract development process. In this space, two key elements are necessary for success. The first is the creation of standard contracts that reflect up-to-date business requirements and customer needs, each tailored for a given service (e.g., setting up a

Voice-over-IP system). The second is a governance process that ensures compliance to the process of using standard contracts in the field, while creating new contracts. Contract standardization is achieved with the use of templates. For each service, the corresponding template defines various structural and content requirements in contracts. For example, templates typically standardize the title, section headings, and the text of various sections (with appropriate placeholders for customization) such as Scope of Work, Service Provider Responsibilities, Terms and Conditions, etc. A template also defines a work breakdown structure of how the service is to be delivered, in terms of the sequence of activities & tasks to be performed. A critical component of such a standardization effort is a compliance checking system that measures how well actual contracts adhere to standard templates. In addition, to ensure effective and prolonged adherence to the standardization process, such a system should facilitate continuous improvements to existing templates and the identification of new templates to reflect up-to-date business requirements and customer needs, through the analysis of consistent patterns of deviation between new contracts and existing templates. In this paper, we describe a system developed for the above sub-tasks of contract standardization process. Note that our system is not based on tools that allow only restrictive editing of existing templates. This is because: 1) it is not practical to enforce domain experts of large organizations with strict adherence to such restrictive tools, and 2) such a setup is not ideally suited for our aim of evolving templates over time (as explained above, to reflect up-to-date business requirements) including creation of new templates from completely noncompliant contracts. In the following sections, we describe various business problems addressed in this paper, various challenges posed by the problems at hand, architecture of the system we developed, techniques used in the design of its components, and our experience in creating an initial prototype of the system. II. B USINESS PROBLEMS ADDRESSED The compliance system described in this paper addresses three main business problems. First, seasoned IT practitioners who are used to writing their own custom services contracts might be resistant to such standardization attempts. Therefore, continuous monitoring of compliance becomes necessary to track how well

the strategy is being communicated and acted upon. Second, well designed templates must reflect market trends and changes in client demands. Therefore, the system should have capability to track systematic deviations of newly written contracts from standard templates to uncover gaps between perceived and current market demands, which inturn could be used by the domain experts to refine the templates as needed. Finally, for contracts that do not match any template, the system should identify the groupings (clusters) of contracts that address the same type of nonstandard service to indicate the need for creation of a new template for that service. III. K EY TECHNICAL

CHALLENGES

The contract compliance system described in this paper uses state-of-the-art text analysis techniques. Several technical challenges imposed by the nature of the data, which are typical across organizations and not specific to our environment, had to be addressed by the authors. These are described below. Contract templates are developed by domain experts are preserved as Microsoft word documents and have detailed structure, defined via the use of specific types of headers. However, contracts themselves are typically archived in a content management system, stored as scanned images of the final approved versions in PDF format. Figures 1 and 2 respectively show examples of a template and a contract built from it. A contract can be compliant even if some of its sections do not appear in the same order as that of the corresponding template. For example, contract in Figure 2 do not have ‘Due diligence’ task when compared to the corresponding template in Figure 1. Additionally, because a contract for a given service type is targeted to a specific customer, it is inevitable that different contracts derived from the same template will have valid content deviations from it. The main challenge in building an effective compliance system is to be able to differentiate between valid and invalid deviations, to be able to differentiate between structural and content deviations, and to be able to strike a right balance in weighting these component deviations to reliably compute the final compliance score. Such a detailed deviation analysis requires inferring the document tree structure (i.e., identifying segments such as sections and subsections and identifying their hierarchical relationships) inherent in a contract and a template, matching those trees in order to find alignment in terms of similarity of content within segments, and to subsequently compute document similarity measure as a combination of structural deviations and the content deviation anchored around the structural deviation. This is fundamentally different from full document-level similarity measures commonly used in search systems and there are no off-the-shelf techniques that can be easily applied to solve this problem. Inferring document tree is not a difficult problem for templates due to the availability of various Word-to-HTML tools. However, this can be quite challenging for contracts

due to the noise present in text documents created from archival images using optical character recognition (OCR) tools, and the complete loss of header metadata information that was present in the original document. Document tree matching is expensive and infeasible to compute in practical amount of time for large trees especially because structural deviations could lead to alignment of any contract tree node with any template tree node in an unconstrained manner. Some robust techniques we have developed to tackle these challenges are explained in detail in section IV. Comparison of a contract with multiple candidate templates at the document tree structure level is an expensive process since there can be hundreds of contract templates corresponding to different types of services offered by a large IT services company across diverse technical areas such as networking, servers, storage, mainframe systems, etc. Therefore, comparing a contract to all templates using detailed structural level analysis is inefficient. To prune the search space, an efficient method of eliminating most of the templates from consideration must be developed. The use of various document level similarity measurement techniques at whole as well as partial-document level is an option that was explored. Finally, for contracts that do not match any templates, clustering techniques have to be applied to accurately group together the contracts that are sufficiently similar to each other to warrant manual inspection to assess their suitability for identifying new templates. The challenge is to be able to identify contracts that deal with similar subjects (service types). In the next section we describe the architecture of our service contract compliance system and its components. IV. S YSTEM ARCHITECTURE AND COMPONENTS Figure 3 shows the architecture of the service contract compliance checking system that we have developed. The input to the system is contract images (PDF files) and standard templates (Microsoft Word documents). The output is two sets of reports: one listing contracts that do not match any templates, grouped by similarity, and the other showing, per template, the match percentage and deviation trends. Contract images are processed by an OCR tool which generates plain text and HTML output giving information about font sizes used at various parts of the text. The segment analyzer attempts to recover the document tree structure from the text and HTML inputs. It uses a combination of statistical techniques and heuristic rules to reconstruct the original document tree, as described in Section IV-A. The heuristics-based filter, described in section IV-B, uses simple rules to determine if a contract clearly matches a specific template or if it is a complete mismatch from all the templates. Contracts that do not satisfy either condition are further processed by the content similarity-based filter module, which is described in Section IV-C, to identify for each contract the top candidate templates to be considered for detailed structural matching. The document tree matcher module compares the document tree representations of a contract and a template

Figure 1.

Sample template.

Figure 2.

Sample contract.

Figure 3.

Architecture of service contract compliance system.

using a tree matching algorithm to identify deviations and to compute the combined structural and content similarity score. The deviation analysis module analyzes the deviation pattern between a contract and the best matching template (if one exists) for each set of contracts associated with a given templates, to produce insights on the types of customizations being made to specific templates when creating contracts from them, and to explore modifications needed in templates from consistently observed deviation patterns. These components are described in Section IV-D and IV-E. The clustering module groups together similar noncompliant contracts, that deal with similar service types, in order to identify new services types that can potentially be standardized. Preliminary techniques used to address this problem are described in Section IV-F. A detailed description of each of the modules is given in following subsections. A. Segment Analyzer The aim of the segment analyzer module is to recover the tree representation of contract’s original document structure, i.e., to identify various segments and their hierarchical relationships - such as sections, subsections, and some internal ‘labels’ that are not sections such as activities and tasks. As explained in section III, this is a quite challenging task because of the noisy OCR output and loss of header metadata information. At the output of the OCR we get plain text of the contract images along with some additional

information about the font size as used in various parts of the text. Typically, segments (sections and sub-sections) of contracts are written with a section heading followed by the related content. Hence, identification of original document segments is equivalent to identifying segment headings that act as segment change-over points. Text used in such headings in various contracts across multiple IT services typically tends to follow a particular statistical distribution, i.e., specific forms of sentences are usually used as headings, such as ‘Scope of services’, ‘Service provider responsibilities’, ‘Charges’, ‘Terms and conditions’, etc, and their variations. Additionally, headings are usually accompanied by section numbers and are written with larger font sizes (although this information can be noisy and less reliable). All these features of heading text are utilized in our implementation of segment analyzer, that uses a combination of statistical text modeling techniques and heuristic rules, to recover the original document tree structure. Algorithm 1 gives pseudo-code of our document tree structure recovery algorithm. Two statistical language models [12] are built, one for heading text and the other content text, using training data extracted from templates and a small set of real contracts. These models utilize unigram and bigram probabilities of the word sequences to assign probabilities to each line observed in the contract, to discriminatively decide whether each line is part of the heading or the content, based on whichever model probability is higher. The set of lines that are classified as headings by statistical language models are further pruned using a set of heuristic rules including: the presence of section numbers, larger text font size, capitalization rules, and sentence length. The candidate segment headings obtained as above are used in a tree fitting algorithm to construct the document tree representing hierarchical relationships between them, i.e., to decide if they are top level sections or lower level sections (such as subsections, subsubsection, so on). For this purpose, the section level (tree depth) is inferred from the evidences obtained from section number (derived from the heading line, if present) and font sizes (compared to text in other part of document). Examples of document tree are given in Figure 4. Interestingly, the process of constructing tree itself cleans up wrongly identified section headings by the previous steps. For example, assume the following partial list of candidate segment headings: • 4. Services • 4.1. Project Management Services • 1. describing service provider responsibilities • 4.1.1. Our Project Management Responsibilities The third candidate, which looks to be wrongly chosen as a heading possibly from an itemized list, would get discarded during tree construction as it does not fit in the sequence of preceding and following sections numbers ‘4.1’ and ‘4.1.1’ respectively. In addition, the document tree also facilitates robust identification of headings that are missed out from initial candidate list. For example if due to OCR errors, section number 5 is missed out in the initial candidate

list (say because words are corrupted with some erroneous special characters), tree provides an easy framework to identify the fact that section 5 is missing and hence we can look into the part of text between section 4 and section 6 with relaxed constraints on statistical language model and heuristic rules to identify it reliably. Algorithm 1 Segment analyzer - tree construction algorithm. LMheading ← Language Model.train(dataheading ) LMcontent ← Language Model.train(datacontent ) new Tree doc tree for all line in document do scoreheading ← LMheading .score(line) scorecontent ← LMcontent .score(line) f lagheur ← Satisfy Heuristics For Headings(line) if scoreheading > scorecontent and f lagheur then [sec no, f ont] ← Get Section Number Font Size(line) depth ← Infer Tree Depth(sec no,f ont) n ← Generate Tree Node(sec no,depth,line) doc tree.insert node(n) end if end for for all subsequent node pairs (n1 , n2 ) ∈ doc tree do n2 docn ← Get Content Between Nodes(n1 ,n2 ,document) 1 2 n ← Search Potential New Node Relax LM Heur Constraint(docn n1 ) doc tree.insert node(n) end for

B. Heuristics-based filter This module uses two simple heuristic rules (arrived at by manual inspection of several contracts and templates) to definitively decide about one of the two possibilities: 1) One common feature present in all templates is work breakdown structure describing how to deliver the service. These sections always contain keywords ‘Activity’ and ‘Task’. An absence of these words in the text content of contract implies that it is clearly noncompliant with any of the templates and hence further steps to perform detailed deviation analysis with templates can be bypassed. 2) All the templates have unique identifiers. If an identifier of a template is present in a contract, clearly it has been developed from that template and hence further steps for deviation analysis just need to be performed against that template. Contract that do not satisfy any of the above heuristics are taken through the next step of content similarity-based filter, which is explained next. C. Content similarity-based filter The purpose of this module is to prune the search space of templates that a given contract has to be compared against in the following computationally expensive tree matching module, to improve the efficiency of the overall compliance checking pipeline. In order to identify the top candidate templates, this component uses document-level similarity measures. We considered options of using the standard cosine similarity in a vector space model [8] (considering every document as a vector) and the latent semantic indexing (LSI) [9] technique (which can deal with the noise effectively and capture the concept-based similarity of documents). For a

detailed description of vector space model and LSI please refer to the book by Frakesh and Baeza-Yates [7]. Two approaches were considered for identifying the best candidate templates for a given contract. The first approach selects the top ranked K templates (where K is small, e.g., 3), and the ranking is based on cosine similarity scores (computed with and without LSI). The second approach selects all templates whose similarity measure for a contract is above a given threshold. We summarize the experimental results in section V-B. For more details on experiments please refer to [17]. The candidate templates chosen using the contentsimilarity filter module for each contract are further analyzed using document tree matcher module for a detailed deviation analysis as described next. D. Document tree matcher The aim of document tree matcher module is to align segments of a contract with similar segments of a template in order to identify deviations (if any) and to perform a detailed structure-plus-content level comparison to evaluate a similarity score. This similarity score is used to make a final determination of whether a contract matches a template (based on a threshold) and which is the best matching template. Figure 4 illustrates an ideal contract-template alignment. As can be seen from this figure, alignment is a complex task because: 1) sections in a contract can appear in a different order than its counterparts in the matching template, 2) a section and its sub-tree in a contract can be similar to a sub-section and its sub-tree in the template at a different tree depth, and 3) the heading and content of similar sections may not be exactly the same in terms of textual similarity. Algorithm 2 gives pseudo-code of document tree matching algorithm. The segment alignment problem is formulated as a search for the best path through a 2-dimensional matrix that results in the best accumulated score. The rows and columns of the matrix are respectively the indices of the contract and the template being compared. Each cell (i, j) of the matrix is the measure of similarity between contract segment i and template segment j. The segment pair similarity score is calculated as a weighted sum of segment heading similarity and segment content similarity. Heading similarity is computed as an average of edit distance [13], normalized by string length and cosine similarity. Edit distance is word-order sensitive while cosine similarity is not. Content similarity is the cosine similarity of the content text. Assuming the size of the matrix is N xM , searching for best path can be computationally very expensive especially because of the possibility that any contract segment can get aligned to any template segment. This computation using a brute force search algorithm, which enumerates all possible pairings of contract and template segments, is especially infeasible to complete in a practical amount of time for large values of N and M . Hence, we use some heuristics to reduce the search space. For every row and column in the matrix,

Algorithm 2 Document tree matcher algorithm.

the contract.

new Matrix scoremat for all nodes nc ∈ doc treecontract do E. Deviation analysis for all nodes nt ∈ doc treetemplate do This module performs an aggregate analysis of contracts sh ← Heading Text Similarity(nc .heading, nt .heading) for which tree matching has already been performed, the sc ← Content Text Similarity(nc .content, nt .content) scoremat [nc , nt ] ← 21 (sh + sc ) best template (if any) has been identified, and the list of end for deviations from the template, at segment level, has been end for computed. The segment level deviations computed for all new List best path the contracts matching a specific template are analyzed, new Float scoreaccumulated ← 0.0 and trends regarding specific types of deviations (e.g., for for all row and column [r, c] in scoremat do f lagr ← scoremat [r, c] > scoremat [l, c] ∀ l : l 6= r template X, Activity Y is omitted in N% of the contracts) f lagc ← scoremat [r, c] > scoremat [r, l] ∀ l : l 6= c are reported. Such an analysis can be performed using if f lagr and f lagc then commercial tools such as Cognos [14]. Domain experts scoreaccumulated ← scoreaccumulated + scoremat [r, c] could analyze consistent deviation patterns found using best path.add([r, c]) this module to explore if they reflect any new business disable all elements in row r and column c of scoremat end if requirements and/or customer needs, and if templates can if scoremat [r, c] > T hresholdscore then be modified to include them as a part. disable element [r, c] in scoremat end if F. Clustering of non-matching contracts end for The aim of clustering module is to process all the noncomscoremat ← Discard Paths Through Disabled Elements(scoremat ) [scotmp , pathtmp ] ← Search Best Accumulated Score Path(scoremat ) pliant contracts in order to identify groups in them, such that scoreaccumulated ← scoreaccumulated + scotmp members of each group belong to a common type of service best path.merge(pathtmp ) (e.g., setting up a storage replication environment using a

Figure 4.

Example of tree matching.

the highest valued elements in the matrix are selected. For each of those elements, if there is no conflict (i.e., no higher value exists) in the row and column in which that element occurs, then that element is selected as a component of the best path, resulting in a reduction of the matrix size. Similarly, if the element in the matrix has a value less than a manually determined threshold, then that element need not be considered as part of the path since it can be assumed that there is a large mismatch between the corresponding segment pairs. Once the tree alignment is complete, the final contract-template matching score is computed as the total accumulated score divided by total number of segments in

specific type of storage system). In the current prototype, we considered two clustering algorithms, a standard off-theshelf clustering algorithm and a custom algorithm. The input to the clustering approaches is the contract document term vectors from the concept-document matrix produced by the singular value decomposition (SVD) step of latent semantic analysis (LSI) [7] (described in section IV-C). The term vector size (representing number of concepts) was set to 60. One of the most common clustering approaches, K-means [15] clustering, cannot be usefully applied to this problem since it requires the user of to specify (guess) the number of clusters, but that is precisely what is to be determined. We used the standard Expectation Maximization (EM) [16] algorithm, which can determine the optimal number of clusters. However, it yielded a very small number of clusters, which is an inaccurate result based on ground truth determined manually. We also explored a custom clustering algorithm explained as follows. For each pair of contract cosine similarity is computed for term vectors from the SVD concept-document matrix, and clusters were formed consisting of members where the similarity measure for each pair of cluster members was greater than a threshold (the experiments used 0.75). As will be explained in section V-D, this approach yielded better results than that of the EM algorithm. V. R ESULTS

AND DISCUSSIONS

This section presents the results of experiments conducted to evaluate the performance of various components of the proposed system. The data used for the experiments consists of a randomly chosen subset (of size 30) of a total of 340 service contracts in one service area (networking communications services). This particular service area has

a total of 50 templates. Manual inspection of the service contracts that passed the heuristics-based filter (as explained in section IV-B) was used to determine the ground truth of best template that matched each contract, or to determine that no template matched the contract sufficiently. A. Segment analyzer As explained in section IV-A, segment analyzer achieves document tree structure reconstruction in two steps: 1) detection of segments (sections, sub-sections, and non-section headings such as Activity and Task), and 2) fitting the segments into a tree. Table I gives the segment detection performance in terms of the relative contribution from various sub-components used such as statistical language model, heuristics, and tree fitting (explained in section IV-A). An interesting observation from the table is that the tree fitting step improves the overall segmentation performance significantly when applied to the candidate list of segments obtained using statistical language models and heuristics. And a combined use of heuristics and language model performs better than the use of either one alone (mainly in precision rate). Parameters are set to achieve better precision than recall, since at a later stage tree fitting can reliably improve the recall rate further. A fall in precision leads to a greater chance of tree fitting failure than a fall in recall. For example, the presence of an itemized list early in a contract may pose itself as section headings with section numbers. Table I S EGMENT DETECTION PERFORMANCE . Approach Language Model (LM) H+LM H+LM+Tree fitting

Recall, % 65.3 58.2 81.7

Precision % 65.7 85.2 86.9

Interestingly, a closer look at segmentation outputs of individual contracts show that for almost all the contracts per contract level segmentation accuracy is either 100% or 0%. In cases where segmentation fails, the main reason is severe OCR errors, as such errors can potentially remove important evidences for segment detection. The performance of the tree derivation algorithm is very much dependent on the segmentation accuracy. Evaluation shows that the accuracy of tree derivation is 100% whenever segmentation accuracy is 100%. This is mainly because, as explained in Section IV-A, segmentation and tree fitting are not independent components. They work concurrently with each other to achieve their goals. B. Content-based similarity filter Experimental evaluation of the content-based similarity filter module to perform full document matching, as explained in section IV-C, show that the method of selecting top-K (with K = 3) candidate templates using cosine similarity measure without LSI performs the best. No false negatives were observed, i.e., correct template was always present in the top-K matching candidates. In

this evaluation, we have done a manual verification for the presence of correct template in the top ranked candidates. However, top-K approach gives a fixed computational load (of matching against fixed K templates) to the following computationally expensive tree matching module. From this perspective, threshold-based approach could potentially be more efficient in reducing such candidate list by choosing lower number of template candidates. However, initial analysis show that full document level similarity computation is affected in many contracts by the presence of extraneous contents such as signature page, very long equipments/parts list required for executing service, etc. Hence an alternative approach to improve the performance of full document level similarity computation is to compute similarity be considering only standard sections the contracts and templates (such as front matter including the title, Scope of Work, Service Provider Responsibilities, etc.) and discarding problematic sections such as equipments/parts list, etc. A planned future work in this direction is as follows: Since input to the content-based similarity filter is also the inferred document tree (from segment analyzer module) it is possible to prune specific sections (for example, using statistical modeling of relevant section headings) before the full document matching to improve the similarity computation. C. Document tree matcher The final arbiter of whether a service contract matches a given template is based on the tree matching algorithm, which computes the deviations and an overall matching score, as described in Section IV-D. A manual verification of the results of experiments performed to measure effectiveness of algorithm show that the final matching score obtained after tree alignment is able to distinguish between matching and non-matching templates based on threshold, and is able to identify the segment level deviations. An formal evaluation metric to measure the accuracy of such deviation is a work in progress. For service contracts with matching templates, initial experimental results of runs on a small fraction of data yielded similarity scores of ≥ 0.85 for all the contract-template pairs tested. D. Clustering of non-matching contracts In the experimental evaluation of clustering module, explained in section IV-F, using pairwise document similarities and a threshold to determine cluster membership, with the documents being term vectors in the SVD concept-term matrix, many clusters were formed. Some small clusters correctly identified similar contracts about nonstandard service types. However, contracts about the most common type of nonstandard service were split across multiple clusters. An initial analysis identified the problem of clustering algorithm being sensitive to all the textual differences. Hence a planned future work is to assign a relatively high (and separate) weights to the title and to the Scope of Work sections. However, we found that extracting the title of a contract from OCR text itself is a nontrivial problem. The statistical language model approach, as used

for identifying section headings, is less effective because there is no predictable pattern of contract titles. Heuristics based on text patterns which occur before and after titles in the contracts will have to be developed for an improved solution. VI. R ELATED WORK Previous work in the area of standardization of service contracts are either based on restrictive editing tools or adopt semi-automatic (involving manual efforts) means to measure the compliance. Also, to the best of our knowledge, the scope of any previous work in this area is not as broad as that of our system involving detailed deviation analysis to evolve the standards (templates) themselves over time. Simon and Rischbeck [2] discuss generation of service contracts from templates in the context of SOA based web services. Lamparter et al. [3] describe a semi-automated method for creating service contracts for web services. There is prior work on compliance and compatibility checking of service contracts, especially for web services. Nepal et al. [4] describe a method for checking compatibility of contracts. Bhuiyan et al. [5] present a tool for checking compliance between business processes and Web services contracts. Governatori et al. [6] discuss mechanisms for checking compliance of business processes with business contracts. Chieu et al [1] have proposed a service-oriented, electronic contract system with facilities for managing, tracking and storing contracts in a common repository with workflow and dynamic routing services. Authoring of contracts based on standard templates could form a natural element of such workflows. Zhang et al. [11] and Shasha et al [10] present algorithms for matching trees where the order among siblings is unimportant and hence are not as general as the tree matching algorithm proposed in this paper. VII. C ONCLUSIONS In this paper, we described a system for checking the standards compliance of service contracts. Such a system is critical for ensuring effective adherence to an asset-based contract development procedure. In large IT service organizations, asset-based approach for contract development is essential to achieve higher standardization, quality, and cost reduction. We proposed a document tree matching based algorithm to align tree representation (sections and subsections) of contract and template, to ensure robust measurement of compliance in the presence of moderate structural deviations (such as some sections and sub-sections organized in difference order) and valid content deviations (resulting from necessary customization for each customer). Other key distinguishing feature of our system is its ability to evolve the templates themselves to keep them up-to-date with evolving business requirements and customer needs. Contracts that comply with a template but show moderate deviations are analyzed together to discover consistent patterns of deviations to

consider including such deviations as part of the templates. Contracts that do not comply with any template are analyzed to discover consistent similarity patterns across them to consider generation of new templates. We reported experimental results and analysis of an initial prototype of various components of the system developed. R EFERENCES [1] Trieu C. Chieu, Thao Nguyen, Sridhar Maradugu, Thomas Kwok, An Enterprise Electronic Contract Management System Based on Service-Oriented Architecture,, SCC, IEEE International Conference on Services Computing, 613-620, 2007. [2] Arnaud Simon, Thomas Rischbeck, Service Contract Template, SCC, IEEE International Conference on Services Computing, 2006. [3] Steffen Lamparter, Stefan Luckner, Sybille Mutschler, Formal Specification of Web Service Contracts for Automated Contracting and Monitoring, HICSS, 40th Annual Hawaii International Conference on System Sciences, 2007. [4] Surya Nepal, John Zic, Thi Chau, Compatibility of Service Contracts in Service-Oriented Ap, SCC, IEEE International Conference on Services Computing, 28-35, 2006. [5] Jenny Bhuiyan, Surya Nepal, John Zic, Checking Conformance between Business Processes and Web Service Contract in Service Oriented Applications, ASWEC, Australian Software Engineering Conference, 80-89, 2006. [6] Guido Governatori, Zoran Milosevic, Shazia Sadiq, Compliance checking between business processes and business contracts, EDOC, 10th IEEE International Enterprise Distributed Object Computing Conference, 221-232, 2006. [7] W. Frakes and R. Baeza-Yates Information Retrieval: Data Structures and Algorithms, 1992. [8] G. Salton, A. Wong, and C. S. Yang, A Vector Space Model for Automatic Indexing, Communications of the ACM, 18(11):613620, 1975. [9] S. Deerwester, Susan Dumais, G. W. Furnas, T. K. Landauer, R. Harshman, Indexing by Latent Semantic Analysis, Journal of the American Society for Information Science, 41(6):391-407, 1990. [10] D. Shasha, J. T. L. Wang, K. Zhang and F. Y. Shih, Exact and Approximate Algorithms for Unordered Tree Matching, IEEE Transactions on Systems, Man and Cybernetics, 24(4):668-678, 1994. [11] K. Zhang, R. Statman, and D. Shasha, On the editing distance between unordered labeled trees, Information Processing Letters, 42:133-139, 1992. [12] J. M. Ponte and W. B. Croft, A Language Modeling Approach to Information Retrieval, Research and Development in Information Retrieval, 275-281, 1998. [13] G. Navarro, A guided tour to approximate string matching, ACM Computing. Surveys, 33:31-88, 2001. [14] D. Volitich, IBM Cognos 8 Business Intelligence: The Official Guide, McGraw-Hill Osborne Media, 2008. [15] Hartigan, J. A., Clustering Algorithms, New York: John Wiley. 1975. [16] Dempster, A.P., Laird, N.M., Rubin, D.B., Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38, 1977. [17] Asad Sayeed, Soumitra Sarkar, Yu Deng, Rafah Hosn, Ruchi Mahindru, and Nithya Rajamani, Characteristics of document similarity measures for compliance analysis, Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), 1207-1216, 2009.

Measuring Compliance and Deviations in A Template ...

standard evolution; deviation analysis; document tree match- ing;. I. INTRODUCTION .... is two sets of reports: one listing contracts that do not match any templates ..... Service Oriented Applications, ASWEC, Australian Software. Engineering ...

Download PDF

276KB Sizes 2 Downloads 382 Views

Report

Measuring Compliance and Deviations in A Template ...

Recommend Documents