Requirement Validation Tool via Natural Language ...

Viewer
Transcript

Requirement Validation Tool via Natural Language Processing and Expert Based Rule Systems Mrs M. Hima Bindu, Aanchal Arora et al 1. Introduction According to the software, one key role for software or systems development is the Analyst role which is further broken into distinct areas: Database Designer, Systems Engineer, and Business Analyst. Other than this, several requirements management tools are available widely in the market. A recent review on current literature delineates that the basic problems which cause these ambiguities are not perceived by any of these. Mostly, customers are themselves unaware of what they exactly want. Sometimes, project may change during course of time or the problem may even arise due to inadequate review and feedback of the customer. An automated tool could decrease the time required during the requirement analysis phase as well as improve process considering the fact that requirement analysis is a long and tedious process. However, the question still lies on how can this be achieved. Before that, it is mandatory to get acquainted with some of the processes to be used in order to reach to the solution. A virtual-reality-based information requirement analysis tool (VR-RA tool) by Bal, Manesh and Hashemipour (2008) was developed for an existing methodology, which can be used by system analysts for determining of information requirements of computer integrated manufacturing (CIM). A set of rules and a knowledge base is appended to the virtual environment to remove any inconsistency that could arise between the material and the information flows during the requirement analysis. However, these rules are pre-defined and are not processed while reviewing. The goal based requirement analysis tool by Anton, Liang and Rodenstein (1996) is a typical goal driven application which identifies, elaborates, refines and organizes the goals in order to validate the specifications of requirements or just to make the customer specify them. It has a web based GUI which allows multiple user to drag and drop the goals and define relationships between them intuitively to order to

Software requirements are precise statements of need intended to convey an understanding about a desired result. They have to be derived, captured on paper and interpreted by developers. However, such specifications are large, English-like documents which are often ambiguous, imprecise and incomplete. This paper discusses on how to automate a part of this process by analyzing and validating user requirements in order to allow the clients to list their requirements and to specify certain parameters with regard to different domains. A comprehensive approach via expert based rule systems for the above stated statement has been provided in this paper for further reference.

2. Background Software development usually begins with recognition of the user’s requirements, followed by implementation of a software system which satisfies the requirements. These requirements are defined through dialogue between users and system analysts. This process is influenced by many unstated expectations and assumptions held by both users and developers. These include perceptions about the feasibility of a software solution, the behavior of the future system, and the pre and post dynamics of environment. Requirement Specification written in natural language might cause miscommunication among developers depending on how they are understood. Without removing ambiguities, we cannot construct systems satisfying the need for software safety. Removing ambiguities should be performed in early steps during the development. An application that performs comprehensive automated analysis does not exist. Existing tools available for use provide limited guidance or automated assistance in identifying potentially problematic requirements. IBM rational with new open beta program, leading project management and requirement analyzer application recognizes the many different talents required to make a project successful in a project.

1

visualize the whole scenario. However, this is also not an automated process.

1) Appropriate – What is appropriate? 2) Standards – Specifically which standards are required? 3) Where necessary – Under what conditions will these standards be applied? Various POS taggers are available to resolve the above stated ambiguity by analyzing the part of speech used in the language. These POS taggers help by retaining one part of speech and discarding the remaining ones. Open source code of "bidirectional inference with the easiest-first strategy" by Tsuruoka and Tsujii (2005) for POS tagger has been used. It takes input a sentence(or multiple sentences), tag each word i.e. assigns to each word a particular part of speech like noun, pronoun, adverb, adjective, conjunctions, propositions etc. Output produced is the sentence with each word tagged.

2.1 The requirements The requirements can be basically a specification of what should be implemented. They can be the descriptions of how the system should behave or of a system property or attribute. They may be constraints on the development process of the system. The methodology needs to address three levels of requirements, which come from different sources at different project stages: a) Business Requirements b) User Requirements c) Functional Requirements

3. Main Focus (with methodology) The problem lies in finding the required ambiguities correctly with minimal false cases. The basic need still is the analysis of different English statements and processing them for their validation. “How to achieve the optimal solution with an efficient algorithm? “; the issues regarding this question are discussed further.

3.1 Tagging part of speeches in a sentence It is easy to understand the use of tagging by a simple example shown below. Consider the Sentences : We gave the monkeys the bananas because they were hungry . We gave the monkeys the bananas because they were over-ripe. These two sentences have the same surface grammatical structure. However, in one of them the word they refers to the monkeys, in the other it refers to the bananas. The sentence cannot be understood properly without knowledge of the properties and behavior of monkeys and bananas. Now, if we look into an actual example taken from a sample SRS document for e.g. Consider this Statement “Appropriate system standards shall be used where necessary. “ The following questions still remain unanswered in this sample request (Further discussed in section 2.2).

Sample input file

Tagged output file

A passenger plane has crashed shortly after take-off from Kyrgyzstan's capital, Bishkek, killing a large number of those on board. The head of Kyrgyzstan's civil aviation authority said that out of about 90 passengers and crew, only about 20 people have survived.

A_DT passenger_NN plane_NN has_VBZ crashed_VBN shortly_RB after_IN take-off_NN from_IN Kyrgyzstan_NNP 's_POS capital_NN ,_, Bishkek_NNP ,_, killing_VBG a_DT large_JJ number_NN of_IN those_DT on_IN board_NN ._.The_DT head_NN of_IN Kyrgyzstan_NNP 's_POS civil_JJ aviation_NN authority_NN said_VBD that_IN out_IN of_IN about_RB 90_CD passengers_NNS and_CC crew_NN ,_, only_RB about_RB 20_CD people_NNS have_VBP survived_VBN ._.

Table 1: Table demonstrating tagging of a source file

This tagger functions by labeling sequence data by finding the sequence of tags t1---tn that maximizes the following probability given the observation o = o1-----oN :: P(t1-----tN|o)

3.2 Applying Backward Chaining Horn clause mechanism After Tagging, the main aim was to achieve questions for the ambiguous words. To achieve the desired output, trigram (3 words/things) model is used which consists of word’s POS (part of speech), previous word (with respect to word under consideration), next word (with respect to word under consideration). The

2

questions in the above examples are intelligent pertaining to a particular domain, so instead of just using if-then-else (which cannot solve the purpose), Expert systems with the help of Prolog's backward chaining horn clause mechanism is used. For this, first of all the tagged file generated has been saved with the tagged word (as discussed in the section above) and its POS (Part of Speech) as the token in a separatefile.

appended in the list of goals. Rule (1) is chosen because then-clause of this rule is our goal. Finally, if it is known that Person A is above 25 years, then the goal that Person A is the mother is concluded. Now, Horn clause logic can be simply stated by this example: Relation ("A", "B") Relation is a predicate/relation taking two arguments, where the second is in relation specified with the first. X is the grandfather of Z, if X is the father of Y where X, Y and Z are and Y is the father of Z persons. In Horn Clause Logic it can be formalize in the rule by this: grandFather(Person, GrandFather) :father(Person, Father), father(Father, GrandFather). Thus, now what remains is that the system should create a separate file which defines the horn clauses i.e. facts consisting of token type, previous words, next word enabling trigram model. The clauses are prepared for every word in the input SRS using prolog. Finally, using inference engine of prolog, this particular file can be searched for a particular combination of token type, previous word, next word of a given ambiguous word (word which is having several meaning in the SRS unless clarified), and if match is found , it can be put forward to user, a certain number of questions to clarify them. The user input is taken and stored in another file for further processing.

Figure 1: Intermediate files with the token type, previous word and next word files of the trigram model

Backward chaining is the process which works backwards from the list of goals to validate its goal with some data which may allow it to reach any of the specified goals. An inference engine which uses backward chaining initially checks and searches through all its inference rules to find an appropriate then-clause for its desired goal and then finally checks its corresponding if-clause for validity. However, if the if-clause is itself unknown, then its corresponding if-clause is searched through the inference rules, adding the prior if-clause to the thenclause list i.e. list of the desired goals. (1) If Person A is above 25 years of age then Person A is married. (2) If Person A is married then Person A is the mother. In this example, one of the goal is to find whether Person A is the mother. This method is also known as goal driven because the list of goals has to determine which inference rules are selected. The inference rules are thus, searched and rule is selected because Person A is the mother is in its Then-clause, which is our desired goal. Now, if-clause is checked. However, it is known that Person A is married. Thus, Person A is married is our next goal which is

Figure 2: Requirement Validation Tool: The first figure shows the tagged file with various part of speeches. The second figure has the output generated in the form of questions for the ambiguous parts of the source.

In the example shown, word is passenger, its next word is plane, passenger`s previous word is A, token of passenger is NN. So rule is defined as if passenger`s(current word) next word is plane,

3

previous word is A and token type; type of passenger is NN and if these are found in clauses of combined.txt file, then question is generated for the user “Please tell what type of passenger plane is it” else go to next rule. The user has to type particular answer if the rule is generated.

sequence data, 467 - 474 , University of Tokyo, Tokyo, Japan [2] Rosenberg L.H., Hyatt L. E. (1997). Automated analysis of requirement, pages : 161 - 171 [3] Noh S., Gadia S.K. (2003). Requirement Analysis Support Tool based on Linguistic Information, Department of Computer Science, Iowa State University, Ames, IA [4] Barnes A., Gray A. (2000). Software Process Management: An exploration of Software Engineering tool development, pp.221, Australian Software Engineering Conference, [5] Bal M., Manesh H. F., Hashemipour M. (2008). Virtual-reality-based information requirements analysis tool for CIM system implementation: a case study in die-casting industry, Vol 21 , Issue 3 pp 231-244, Eastern Mediterranean University, Famagusta, TRNC, Turkey. [6] Anton A. I., Liang E., Rodenstein R. A. (1996). A Web-Based Requirements Analysis Tool, Proceedings on Enabling Technologies: Infrastructure for Collaborative Enterprises [7] Nuseibeh B., Easterbrook S. (2000). Requirements Engineering : A Roadmap, Proceedings of the Conference on The Future of Software Engineering, pp: 35 - 46 [8] Burns C. (1994). PROTO-A Software Requirements Specification, Analysis and Validation Tool. [9] Ryan K. (1993). The Role of Natural Language in Requirements Engineering, San Diego California [10]Georgiades, M. G., Andreou A. S., Pattichis C. S. (2002). A Requirements Engineering Methodology Based on Natural Language Syntax and Semantics, University of Cyprus [11] Fabbrini F., Fusani M., Gnesi S., Lami G., Automated Quality Analysis of Natural Language Requirement Specifications, Istituto di Elaborazione dell’Informazione del C.N.R. – Pisa, Italy [12] Kof L. (2004). Natural Language Processing For Requirements Engineering, Technische Universitat Munchen [13] Kof L. (2004). An Application of Natural Language Processing to Domain Modelling, Technische Universitat Munchen

4. Future Trends There are some problems associated with the above specified approach. It is not appropriately reliable yet as more resources are needed to upgrade the application to achieve better accuracy. The questions pertaining to a domain that are put forward to user are hard coded i.e. if particular combination of trigram is found, certain pre-decided questions are put forward to the user and the system is unable to manipulate questions on its own by looking at the sentence. However, there exist specific methods for appropriate solutions associated with the above stated problems which is beyond the scope of this article.

5. Conclusion The application Requirement Analysis Tool demonstrates the applicability of the validation of the requirements specified by the user and specifying of the certain parameters that are pertaining to the domain and thus, help to analyze the requirements by automating the process of finding inconsistencies and ambiguities. It also allows listing and verifying their requirements and inserting appropriate comments at instances of vague statements before their final appearance. The proposed algorithm uses the trigram model for tokenized words which can be further implemented by an expert system.

6. Acknowledgement The author wishes to place on record sincere thanks to Yoshimasa Tsuruoka and Jun'ichi Tsujii (2005) for providing the open source code for POS (Part of Speech) Tagger using their work: Bidirectional inference with the easiest-first strategy for tagging sequence data.

7. References [1] Tsuruoka Y., Tsujii J. (2005). Bidirectional inference with the easiest-first strategy or tagging

4