IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 575-581

International Journal of Research in Information Technology (IJRIT) www.ijrit.com

ISSN 2001-5569

Implementing Query Expansion for Improvement of Prior Art Search in Patent Retrieval Ms. Priti D. Dhope1, Ms. M. A. Potey2 1

2

PG Student, Department of Computer Engineering, D. Y. Patil COE Pune, Maharashtra, India [email protected]

Head of the Department, Department of Computer Engineering, University of Pune Pune, Maharashtra, India [email protected]

Abstract Prior art search is very important task in patent retrieval. The objective of prior art search is to identify all relevant data which may invalidate the originality of claim of patent application. Patent information search has more importance for information recall rather than precision. Some patents are difficult to search by prior art queries or cannot be discovered via any query. Prior art queries can be expanded using various query expansion methods for improving retrievability. We have investigated SynSet method for query expansion and compared with Pseudo Relevant Feedback (PRF) based approach which results to improvement in performance of retrieval.

Keywords: Patent Retrieval, Query Expansion, PRF, SynSet.

1. Introduction Patent retrieval comes under the recall-oriented information retrieval application domain, where not missing a relevant patent is considered more important than retrieving only set of relevant patents at top rank results [1]. A patent is a set of exclusive legal rights for the use and exploitation of an invention in exchange for its public disclosure. The exclusive rights are given by a governing authority and are limited in time [18]. Example of a United States Patent and Trademark Office (USPTO) patent is given in Fig 1.

Ms. Priti D. Dhope,IJRIT

575

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 575-581

Fig. 1 Example of USPTO Patent One of the most common tasks at patent office is prior art search as it helps invalidation patent application quickly. The interest in Patent Information Retrieval is growing and there is a need to better understand the context associated with patent users and their needs. Patent search is challenging task and require the Patent Examiner to spend a substantial amount of time.Some tasks related to the patent search are[8], 1. In Ad- hoc search number of topics is used to search a patent collection with the objective of retrieving a ranked list of patents that are relevant to this topic. 2. Invalidity search has objective to search for all relevant patents to a given claim to find out whether this claim is novel or not. 3. Passage search performed by sorting the passages in the retrieved documents from the patent invalidity search task according to their relevance to the claim topic. 4. Prior-art search objective is to examine all patents relevant to a patent application which can invalidate the novelty of the patent application or at least describe prior art work in the area of the patent application. Patent examiner is presented with combination of relevant and non relevant patent which needs manually search. Searching for prior art patents is an essential step for the patent examiner to validate or invalidate a patent application. Therefore patents are transformed into prior art queries [16]. In this paper we are focusing on prior art search, to increase retrievability of patents in prior art here we used SynSet (Synonym Set) method. The paper is organized as; section 2 contains information about related work in prior art patent retrieval. The section 3 explains an implementation details which includes architecture of the system. The section 4 contains results and discussion of the project work done so far. Finally, section 5 concludes research work with possible extension.

2. Related Work The goal of searching a patent database for prior art is to find all previously published patents on a given topic [2][3]. Word mismatch makes information retrieval more difficult, query expansion supplements a base query with more words in an attempt to improve search results it can be manual or automatic. Query expansion methods are based on providing supplementary terms to the original user’s query, which typically are short in most IR applications. Different approaches have been proposed for the selection of these additional terms (expanded term). Query expansion has two major classes such as global methods and local methods. Global methods [3] are techniques for expanding or reformulating query terms independent of the query and results returned from it, so that changes in the query wording will cause the new query to match other semantically similar terms.

Ms. Priti D. Dhope,IJRIT

576

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 575-581

Global methods: 1. Query expansion/reformulation with a thesaurus or WordNet 2. Query expansion via automatic thesaurus generation 3. Techniques like spelling correction. Local methods adjust a query relative to the documents that initially appear to match the query. Local methods: 1. Relevance feedback 2. Pseudo relevance feedback (Blind relevance feedback) 3. Indirect relevance feedback. Expansion can be per term such as using WordNet [2] or per query as in the case of relevance feedback and can be selected from feedback process [4]. PRF is used to improve the patent retrievability in patent search rather than improving the retrieval effectiveness directly. The problem addressed in this research was that some patents have a low chance of being retrieved or sometimes cannot be retrieved by any query. The objective for this research was to enrich the patent queries with additional terms using the PRF method to improve the retrievability score for patents in the collection. They succeeded in significantly improving the Gini coefficient, which is used to measure the retrievability [5]. However, they did not test how this would affect the retrieval effectiveness for a patent search task. PRF performs query expansion by assuming that the top ranked retrieved documents from an initial search run are relevant. In this research study a novel mechanism for PRF was introduced and compared to the standard Rocchio method. Kishida developed a term selection formula for terms from the top retrieved based on the Taylor formula of the linear search functions. The main feature that distinguish the Taylor formula from other term selection formulae is using the document retrieval scores to give higher weights for terms extracted from documents at higher ranks[6]. Experiments were carried out on the NTCIR-3 patent retrieval task, but none of the feedback techniques introduced led to any significant improvement in the retrieval results.

3. Implementation Detail Query expansion is a widely used technique that attempts to increase the likelihood of a match between the query and relevant documents by adding semantically related terms (called expansion terms) to a users query. The expanded query is supposed to retrieve more relevant documents for improving overall performance. We using a SynSet method that generates synonyms to expand query and comparing with Pseudo Relevant Feedback (PRF).

3.1 System Architecture The main idea is generating synonym sets from word translations. For a word in one language f which has possible translations to a set of words in another language {e1, e2,….en}, this set of words can be considered as synonyms or at least related to each other. The probability of e1 to be a synonym of word e2 can be computed using Eq. 1.

P (ee1|ee2) =

p (efi|ee2).p (ee1|efi)

(1)

Fig. 2 shows the architecture of the system. As per user interest, query is given for search with selected criteria like patent no, claim etc. That query is further preprocessed and expanded using SynSet or PRF method to get effective output patent.

Ms. Priti D. Dhope,IJRIT

577

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 575-581

Fig. 2 System Architecture p (e1 | e2) is the probability that e1 is a synonym of e2 ,{ f1, f2…… fn } are possible translations for word e2, p (fi | e2) is the probability that fi is a translation of e2, and p(e1 | fi) is the probability that e1 is a translation of fi. Steps for automatic SynSet creation [1]: 1. English patents title and claim sections were extracted and aligned by sentences. Long claims are split at punctuation points to produce shortened aligned sentences. 2. Stop word removal was applied. 3. Words in both languages were stemmed using Snowball. 4. GIZA++5 was used for cross-language word alignment. 5. Equation 1 was used to produce the SynSet for English terms. The SynSet contains a set of synonyms (related terms) for each term including the original term. Subjective analysis showed the SynSet to be reasonable, although containing some noisy terms (not exact correct) with low probabilities. In order to reduce the number of noisy synonyms, pruning was applied removing all terms with low probability (less than 0.1), and adding their probabilities to the original term (Equation 2). This step was found to improve the retrieval effectiveness when using the SynSet for QE.

P (ex|ex) |pruned = P (ex|ex) |original +

(2)

Applying Eq. 2 led to many terms not having any synonyms other than themselves (i.e. p (ex|ex) = 1), which means that these terms has no expansion terms added when they appear in a query. A further pruning step was applied which removed SynSet entries for all terms that appeared less than 20 times in the 8M sentences training set, since these terms could not have enough training instances to produce a reliable SynSet. The generated SynSet was then used to expand the queries. Example of SynSet [9]: • Motor {motor, engine} • Cloth {fabric, cloth, garment, tissue} • Area {area, zone, region, surface} • doghouse {dog, porch, crawling, beside, downstairs} • makeup {repellent, lotion, glossy, sunscreen, skin, gel} The above example shows how synonym sets are generated. After generating SynSet, it applied for patent retrieval.

Ms. Priti D. Dhope,IJRIT

578

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 575-581

Expanding queries using Query Expansion (QE) with Pseudo Relevance Feedback (PRF), Prior-art queries extracted from query patents may not contain all terms. Therefore, missing terms can be extracted from PRF documents. The relevant patents for PRF are identified based on their similarity with query patents via specific terms. For example, those terms which appear closely with terms of prior-art queries in the same claim, paragraph, sentence or phrase can identify better patents for PRF as compared to using all terms of a query patent. This term selection problem can be considered a term classification problem.

4. Results 4.1 Data set Experiments are carried out with the USPTO dataset from which 100 patents being selected manually so that they can be stored in database on which performance is being evaluated.

4.2 Result Set Patents are used for SynSet and PRF methods. Fig. 3 shows the improvement in recall for PRF verses SynSet over 100 input patents. Fig. 4 shows the improvement in precision for PRF verses SynSet over 100 input patents. Precision (also called positive predictive value) is the fraction of retrieved instances that are relevant,

While recall (also known as sensitivity) is the fraction of relevant instances that are retrieved.

Both precision and recall are therefore based on an understanding and measure of relevance.

Fig. 3 Result for Recall Ms. Priti D. Dhope,IJRIT

579

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 575-581

Fig. 4 Result for Precision

5. Conclusions The goal of prior art search task is to find existing relevant patent. Using Query Expansion by SynSet method helps in improvement of patent retrieval. We have compared our approach with Pseudo Relevant Feedback (PRF) method for query expansion for patent retrieval SynSet will give a better performance. It provides the automatically generated synonym set to improve patent retrieval. Using SynSet for query expansion provide better recall. Our work investigates on limited dataset of patents and can be extended to investigate whether using SynSet for prior art search can achieve significant improvement on large patent dataset.

References [1] W. Magdy and G. J. Jones, “A study on query expansion methods for patent retrieval,” pp. 19–24,2011. [2] T. T. K. Konishi, “Invalidity patent search system at ntt data,” 2004. [3] P. F., “Retrieval experiments in the intellectual property domain task,” In Proceedings of the CLEF 2010,2010. [4] G. Cao, J.-Y. Nie, J. Gao, and S. Robertson, “Selecting good expansion term for pseudo-relevance feedback,” pp. 243–250, 2008. [5] A. R. Bashir S., “Improving retrievability of patents in prior-art search,” Proceedings of ECIR, 2010. [6] K. K., “Experiments on psuedo relevance feedback method using taylor formula at ntcir-3 patent retrieval task,” In: Proc. of NTCIR 2003: NTCIR-3, 2003. [7] I. H., “Ntcir-4 patent retrieval experiments at ricoh,” In Proceedings of NTCIR 4, 2004. [8] Walid Magdy, “Toward Higher Effectiveness for Recall-Oriented Information Retrieval: A Patent Retrieval Case Study”, January 2012 [9] H. S. D. Manning, P. Raghavan, “An introduction to information retrieval,” 2009. [10] K. Konishi, “Query terms extraction from patent document for invalidity search,” In: Proc. of NTCIR 2005: NTCIR-5 Workshop Meeting, 2005. [11] L. Larkey, “A patent search and classification system,” In: Proc. of 4th ACM Conference on Digital Libraries, Berkeley, 1999. [12] A. Fujii., “Enhancing patent retrieval by citation analysis,” SIGIR07, July 2007. Ms. Priti D. Dhope,IJRIT

580

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 575-581

[13] N. K. Atsushi Fujii, Makoto Iwayama, “Patent retrieval task at ntcir-5,” Proceedings of NTCIRWorkshop Meeting, Dec 6-9. [14] D. Pal, M. Mitra, and K. Datta, “Improving query expansion using wordnet,” CoRR, vol. abs/1309.4938, 2013. [15] S.-H. N. J.-H. L. Jungi Kim, Yeha Lee, “Postech at ntcir-6 English patent retrieval subtask,” Proceedings of NTCIR-6 Workshop Meeting, 2006. [16] W. B. C. Xiaobing Xue, “Transforming patents into prior-art search,” SIGIR 2009, 2009. [17] C. W. B. Xue X., “Automatic query generation for patent search,” In Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009), 2004. [18] Florina Piroi, “CLEF-IP 2010: Retrieval Experiments in the Intellectual Property Domain”, The Information Retrieval Facility (IRF), Vienna, Austria

Ms. Priti D. Dhope,IJRIT

581

Implementing Query Expansion for Improvement of Prior Art ... - IJRIT

IJRIT International Journal of Research in Information Technology, Volume 2, Issue ... 1PG Student, Department of Computer Engineering, D. Y. Patil COE ... Query expansion has two major classes such as global methods and local methods.

157KB Sizes 2 Downloads 275 Views

Recommend Documents

Implementing Query Expansion for Improvement of Prior Art ... - IJRIT
1PG Student, Department of Computer Engineering, D. Y. Patil COE ... Query expansion has two major classes such as global methods and local methods.

Query Expansion Based-on Similarity of Terms for ...
expansion methods and three term-dropping strategies. His results show that .... An iterative approach is used to determine the best EM distance to describe the rel- evance between .... Cross-lingual Filtering Systems Evaluation Campaign.

Query Expansion Based-on Similarity of Terms for Improving Arabic ...
same meaning of the sentence. An example that .... clude: Duplicate white spaces removal, excessive tatweel (or Arabic letter Kashida) removal, HTML tags ...

Improvement in Performance Parameters of Image ... - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, ... Department of Computer Science and Engineering, Technocrats Institute of Technology ... Hierarchical Trees (SPIHT), Wavelet Difference Reduction (WDR), and ...

Using lexico-semantic information for query expansion ...
Using lexico-semantic information for query expansion in passage retrieval for question answering. Lonneke van der Plas. LATL ... Information retrieval (IR) is used in most QA sys- tems to filter out relevant passages from large doc- ..... hoofdstad

Translating Queries into Snippets for Improved Query Expansion
Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP'07), Prague, Czech Re- public. Brown, Peter F., Stephen A. Della Pietra, Vincent.

Using lexico-semantic information for query expansion ...
retrieval engine using Apache Lucene (Jakarta,. 2004). Documents have been .... method (1.2K vs 1.4K, as can be seen in 1). The proximity-based method ...

Prior Art
... helpful comments. †Email: [email protected] and [email protected]. 1 .... 2008), available at http://www.patentlyo.com/patent/2008/04/tafas%v%dudas%p.html. 3 ...... propensity to add assignee%assignee self citations. According to Sampat ...

Ontology Based Query Expansion Framework for Use ...
them in a ontologically defined semantic space. Expansions origi- nate from .... relationships, and in the case of ontology based systems, very much so, only ..... relationships are stored in the MRREL file, and have several attributes. There are ...

Using lexico-semantic information for query expansion ...
back loop that feeds lexico-semantic alternations .... in the top-k passages returned by the system. The ..... http://lucene.apache.org/java/docs/index.html. Kaisser ...

Contextual Query Based On Segmentation & Clustering For ... - IJRIT
In a web based learning environment, existing documents and exchanged messages could provide contextual ... Contextual search is provided through query expansion using medical documents .The proposed ..... Acquiring Web. Documents for Supporting Know

Contextual Query Based On Segmentation & Clustering For ... - IJRIT
Abstract. Nowadays internet plays an important role in information retrieval but user does not get the desired results from the search engines. Web search engines have a key role in the discovery of relevant information, but this kind of search is us

Concept-Based Interactive Query Expansion - Research at Google
to develop high quality recommendation systems in e-commerce applications available in the Web [11, 16]. These applications take user sessions stored at ...

Improving Keyword Search by Query Expansion ... - Research at Google
Jul 26, 2017 - YouTube-8M Video Understanding Challenge ... CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding ... Network type.

Synonym-based Query Expansion and Boosting-based ...
large reference database and then used a conventional Information Retrieval (IR) toolkit, the Lemur toolkit (Lemur, 2005), to build an IR system. In the post-.

Improvement in convergence rate and stability ... - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 1 ... Associate Professor (Electronics Engineering Dept), Terna Engineering College, ...

Improvement in convergence rate and stability ... - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 1, Issue 11 ... Associate Professor (Electronics Engineering Dept), Terna Engineering ...

Summary of Prior Year's Obligations and Unpaid Prior Year's ...
Retrying... Summary of Prior Year's Obligations and Unpaid Prior Year's Obligations.pdf. Summary of Prior Year's Obligations and Unpaid Prior Year's Obligations.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Summary of Prior Year's Obl

Implementing Top-k Query in Duty-Cycled Wireless ... - CiteSeerX
s with unlimited energy supply which serves as a gateway between the sensor network and users ..... the center of the network region. The transmission radius of.

Implementing Top-k Query in Duty-Cycled Wireless ...
∗Department of Computer Science, St. Francis Xavier University, Antigonish, Canada B2G 2W5 ... Abstract—Top-k query is a very useful and important query in wireless sensor networks ..... 100 200 300 400 500 600 700 800 900 1000.

Implementing Top-k Query in Duty-Cycled Wireless ... - CiteSeerX
that are awake and black nodes represent the nodes that are asleep, node S ..... [8] V. Raghunathan, C. Schurgers, S. Park, M. Srivastava, and B. Shaw,.

WESTWARD EXPANSION
the wild-west was pushed further and further westward in two waves as land was bought, explored, and taken over by the United States Government and settled by immigrants from Europe. The first wave settled land west to the Mississippi River following