IJRIT International Journal of Research in Information Technology, Volume 3, Issue 3, March 2015, Pg. 234-238
International Journal of Research in Information Technology (IJRIT) www.ijrit.com
ISSN 2001-5569
Prediction of Hard Keyword Queries Deema Liyakath1, Deepika Ravichandran2, Divya Angappan3, Priyadharsini Murugesan4 and Mohankumar Bharathiyar5 1
UG Scholar, Department of Information Technology, Sri Ramakrishna Engineering College Coimbatore, TamilNadu, India
[email protected]
2
UG Scholar, Department of Information Technology, Sri Ramakrishna Engineering College Coimbatore, TamilNadu, India
[email protected]
3
UG Scholar, Department of Information Technology, Sri Ramakrishna Engineering College Coimbatore, TamilNadu, India
[email protected]
4
Assistant Professor, Department of Information Technology, Sri Ramakrishna Engineering College Coimbatore, TamilNadu, India
[email protected]
5
Assistant Professor, Department of Information Technology, Sri Ramakrishna Engineering College Coimbatore, TamilNadu, India
[email protected] Abstract
Keyword queries provide easy access to data over databases, but often suffer from low ranking quality. Using the benchmarks, to identify the queries that are like to have low ranking quality. System may suggest to the user for providing the alternative queries for such hard queries. This paper, analyze the characteristics of hard queries and to measure the degree of difficulty over a database. Prediction of Hard Queries over Databases by Attribute value, Attribute and Entity set. Properties of hard queries on databases follows: Less specificity, Attribute level ambiguity and Entity set level ambiguity. The expected outcome are effectively predicts the hard queries in structured data over databases, increases the performance and time consuming.
Keywords: Keyword Query Interfaces (KQIs), Noise Generation, Ranking Robustness Principle, Structured Robustness (SR) Score, Approximation Algorithm.
1. Introduction Keyword query interfaces (KQIs) provide flexibility and ease of use in searching and exploring the data in database. Keyword queries have many possible answers. KQI must identify the information needs behind keyword queries and rank the answers. Databases contain entities, and entities contain attributes that take attribute values. Based on entity set, attributes and attribute value in the database we predict the hard keyword queries. Some of the difficulties of answering a query are as follows: For instance, query Q1: King Deema Liyakath, IJRIT-234
IJRIT International Journal of Research in Information Technology, Volume 3, Issue 3, March 2015, Pg. 234-238
Kong on the IMDB database (http://www.imdb.com) does not specify if the user is interested in movies whose title is King Kong or movies distributed by the King Kong Cinema Industry. The KQI is to recognize the queries and give the alternate techniques to the user. In this paper, we use two databases original and corrupted. The noise is generated in the original database to obtain corrupted database. The query is entered in both databases then ranking is done using top-k result list. SR score measures the difficulty of a query based on the ranking. To estimate the SR score, approxi mation algorithm is used and evaluates the performance of the query.
2. Related Work In the past years, hard queries are predicted over unstructured text documents. The unstructured text documents are divided into two groups: Pre-retrieval and Post-retrieval methods. Pre-retrieval methods are used to predict the difficulty of the query without computing the results. Post-retrieval methods are used to predict the difficulty of the query by computing the results. Post-retrieval is based on three categories: Clarity-score based, Ranking-score based and Robustness-based. The clarity-score predicts the difficulty of a query more accurately then pre-retrieval methods in text documents. But the query is predicted poorly in database. In ranking-score, the queries are ranked by using top-k result and efficiently the query is predicted. By comparing clarity-score based and ranking-score based, the query prediction will be higher in Robustness-based.
3. Existing System Today it is difficult to predict the hard queries over databases. But existing methods are only applicable for unstructured data. For instance, there are two databases original and corrupted database. Both databases contain entity set, attribute, and attribute value. Here, the noise is generated only based on attribute value which is referred as corrupted database.
Query
Databases
Ranking Robustness Principle
Structured Robustness Algorithm
Approximation Algorithm
Quality result
Fig. 1 Existing System Structure
Deema Liyakath, IJRIT-235
IJRIT International Journal of Research in Information Technology, Volume 3, Issue 3, March 2015, Pg. 234-238
The Fig. 1 shows the architecture of existing system where the searching quality and reliability rate of the system is lowest. To overcome these drawbacks we perform noise generation at three levels.
4. Proposed System We propose a novel framework to measure the degree of difficulty for a keyword queries over database. Prediction of hard queries over databases includes Attribute value, Attribute and Entity set. The properties of hard queries on databases are: Less specificity, Attribute level ambiguity and Entity set level ambiguity.
Fig. 2 Proposed System Structure
Deema Liyakath, IJRIT-236
IJRIT International Journal of Research in Information Technology, Volume 3, Issue 3, March 2015, Pg. 234-238
4.1 Noise Generation in Databases The noise changes the attribute or entity set of an attribute value in the corrupted database. The noise is generated based on attributes, attribute values and entity sets. The query result is obtained from the original and corrupted database.
4.2 Ranking in Original and Corrupted Database After obtaining the query result from original and corrupted databases the cosine similarity value is computed and ranked it by using Top-k result list. The two ranking method PRMS (Probabilistic Relational Models) and Spearman rank correlation is used. Cosine Similarity is computed by using dot product operation as follows:
(1)
4.3 Structured Robustness Algorithm Structured Robustness (SR) score measures the difficulty of a query based on the differences between the rankings of the same query over the original and noisy (corrupted) versions of the same database.
Algorithm 1 CorruptTopResults(Q,L,M,I,N) Input: Query Q, Top-K result list L of Q by ranking function g, Metadata M, Inverted indexes I, Number of corrupted iteration N. Output: S R score for Q. 1: S R ĸC ĸ^`C caches λT, λS for keywords in Q 2: FOR i=1 → N DO 3: I′ ← I; M′ ← M; L′ ← L; // Corrupted copy of I, M and L 4: FOR each result R in L DO 5: FOR each attribute value A in R DO 6: A′ ← A; // Corrupted versions of A 7: FOR each keywords w in Q DO 8: Compute # of w in A′ by Equation 10; // If λT,w λS,w needed but not in C, calculate and cache them 9: IF # of w varies in A′ and A THEN 10: Update A′, M′ and entry of w in I′; 11: Add A′ to R′; 12: Add R′ to L′; 13: Rank L′ using g, which returns L, based on I′, M′; 14: S R += Sim(L,L′); // Sim computes Spearman correlation 15: RETURN S R ← S R / N; // AVG score over N rounds Algorithm 1: Structured Robustness Algorithm
4.4 Approximation Algorithms Approximation algorithms are to estimate the SR score and performance of the query. The techniques used in these are: Query-specific Attribute values Only Approximation (QAO-Approx) and Static Global Stats Approximation (SGS-Approx).
Deema Liyakath, IJRIT-237
IJRIT International Journal of Research in Information Technology, Volume 3, Issue 3, March 2015, Pg. 234-238
5. Conclusions In the existing work, analyzes the characteristics of hard queries and propose a novel framework to measure the degree of difficulty for a keyword query over a database, considering the structure and the content of the database and the query results. However, in this system numbers of issues are there to address. They are, searching quality is lower than the other system and reliability rate of the system is lowest. In order to overcome these drawbacks, we are performing the noise generation in three level includes attribute, attribute value and entity set in the database. This proposed system is well enhancing the reliability rate of the difficult query prediction system. In other words, this work is support these operators for efficient result. From the experimentation result, we are obtaining the proposed system is well effective than the existing system in terms of accuracy rate, quality of result.
References [1] Marti Hearst, Search User Interface chap.4, Cambridge University Press, 2009. [2] Journals of Computer Applications and Trends in Engineering( Monthly Publications). [3] A. Shtok, O. Kurland, and D. Carmel, "Predicting query performance by query-drift estimation", in Proc. 2nd ICTIR, Heidelberg, Germany, 2009, pp. 305–312. [4] C. Hauff, L. Azzopardi, D. Hiemstra and F. Jong, "Query performance prediction: Evaluation contrasted with effectiveness", in Proc. 32nd ECIR, Milton Keynes, U.K., 2010, pp. 204–216.
Deema Liyakath, IJRIT-238