Hub, Authority and Relevance Scores in Multi-Relational Data for Query Search Xutao Li1

Michael Ng2

Yunming Ye1

1 Department

of Computer Science, Shenzhen Graduate School, Harbin Institute of Technology, China

2 Department

of Mathematics, Hong Kong Baptist Univerisity, Hong Kong

SIAM International Conference on Data Mining, 2012

X.T. Li, et al.

Outline

• Motivation • Related Work • HAR (Idea + Theory + Algorithm) • Experimental Results • Concluding Remarks

X.T. Li, et al.

Motivation

• Link analysis algorithm is critical to information retrieval tasks, especially to Web related retrieval applications. much noise, low quality information link(hyperlink)structure is helpful e.g., Google

• There are many applications where the links/hyperlinks can be characterized into different types.

X.T. Li, et al.

Motivation - Examples of multi-relational data &LWDWLRQWKURXJKNH\ZRUG &LWDWLRQWKURXJKNH\ZRUG &LWDWLRQWKURXJKNH\ZRUG

(a) multi-relational citation net- (b) multi-semantic hyperlink network work 3KRQH 061 061 3KRQH 061 (PDLO

&RH[SUHVVLQWHUDFWLRQ XQGHUFRQGLWLRQ &RH[SUHVVLQWHUDFWLRQ XQGHUFRQGLWLRQ

(PDLO

3K\VLFDOLQWHUDFWLRQ XQGHUFRQGLWLRQ

(PDLO 3KRQH

3K\VLFDOLQWHUDFWLRQ XQGHUFRQGLWLRQ

061 3KRQH

(c) multi-channel communication (d) multi-conditional gene internetwork action network

How to exploit such multi-relational link structures to facilitate query search task is an important and open research problem. X.T. Li, et al.

Outline

• Motivation • Related Work • HAR (Idea + Theory + Algorithm) • Experimental Results • Concluding Remarks

X.T. Li, et al.

Related Work

• The hyperlink structure is exploited by three of the most frequently cited Web IR methods: HITS (Hypertext Induced Topic Search), PageRank and SALSA. • HITS was developed in 1997 by Jon Kleinberg. Soon after Sergey Brin and Larry Page developed their now famous PageRank method. SALSA was developed in 2000 in reaction to the pros and cons of HITS and PageRank. [The survey given by A. Langville and C. Meyer, A Survey of Eigenvector Methods for Web Information Retrieval, SIAM Review, 2005.] • In 2006, Tamara Kolda and Brett Bader proposed TOPHITS method to analyze multi-relational link structures by using tensor decomposition.

X.T. Li, et al.

New Challenge • PageRank: L. Page, S. Brin, R. Motwani and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. 1998. • HITS: J. Kleinberg. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 46: 604-632, 1999. • SALSA: R. Lempel and S. Moran. The Stochastic Approach for Link-structure Analysis (SALSA) and the TKC effect. The Ninth International WWW Conference, 2000. –single-type relation(hyperlink) • TOPHITS: T. Kolda and B. Bader. The TOPHITS Model for Higher-Order Web Link Analysis. Workshop on Link Analysis, Counterterrorism and Security, 2006. – The decomposition may not be unique. – Negative hub and authority scores can be produced. X.T. Li, et al.

Outline

• Motivation • Related Work • HAR (Idea + Theory + Algorithm) • Experimental Results • Concluding Remarks

X.T. Li, et al.

The Idea • In order to differentiate relations, we introduce a relevance score for each relation besides the hub and authority scores for objects. UHOHYDQFH VFRUH DXWKRULW\ VFRUH KXEVFRUH

The hub, authority and relevance scores have a mutually-reinforcing relationship. • Represent the data with a tensor → construct transition probability tensors w.r.t. hubs, authorities and relations→ setup tensor equations based on random walk → solve the tensor equations for obtaining the hub, authority and relevance scores X.T. Li, et al.

The Representation Example: five objects and three relations (R1: green, R2: blue, R3: red) among them. R1 R2

1 1

R3

1

1 1

2

1

1

R3

1 1

5 3 4

1 1 21 3 1 1 4 51 1 2 3 4 5 R1

(a)

R2

(b)

In the following, we assume that there are m objects and n relations in the multi-relational data. It is represented as a tensor T = (ti1 ,i2 ,j1 ). Here (i1 , i2 ) to be the indices for objects and j1 to be the indices for relations. X.T. Li, et al.

Transition Probability Tensors H = (hi1 ,i2 ,j1 ), A = (ai1 ,i2 ,j1 ) and R = (ri1 ,i2 ,j1 ) with respect to hubs, authorities and relations by normalizing the entry of T as follows: hi1 ,i2 ,j1

=

ti1 ,i2 ,j1 , m X ti1 ,i2 ,j1

i1 = 1, 2, · · · , m,

i1 =1

ai1 ,i2 ,j1

=

ti1 ,i2 ,j1 m X

,

i2 = 1, 2, · · · , m,

ti1 ,i2 ,j1 , n X ti1 ,i2 ,j1

j1 = 1, 2, · · · , n.

ti1 ,i2 ,j1

i2 =1

ri1 ,i2 ,j1

=

j1 =1

X.T. Li, et al.

Transition Probability Tensors

These numbers give the estimates of the following conditional probabilities: hi1 ,i2 ,j1

= Prob[Xt = i1 |Yt = i2 , Zt = j1 ]

ai1 ,i2 ,j1

= Prob[Yt = i2 |Xt = i1 , Zt = j1 ]

ri1 ,i2 ,j1

= Prob[Zt = j1 |Yt = i2 , Xt = i1 ]

where Xt , Yt and Zt are random variables referring to visit at any particular object as a hub and as an authority, and to use at any particular relation respectively at the time t respectively. Here the time t refers to the time step in the random walk.

X.T. Li, et al.

HAR - Tensor Equations

¯ hub score: x ¯ authority score: y ¯ relevance score: z ¯=x ¯, H¯ yz with

m X i1 =1

x ¯i1 = 1,

¯=y ¯, A¯ xz m X i2 =1

X.T. Li, et al.

y¯i2 = 1,

¯=z ¯, R¯ xy n X j1 =1

z¯j1 = 1.

HAR - Tensor Equations ¯ hub score: x ¯ authority score: y ¯ relevance score: z m X n X

hi1 ,i2 ,j1 yi2 zj1 = xi1 ,

1 ≤ i1 ≤ m

ai1 ,i2 ,j1 xi1 zj1 = yi2 ,

1 ≤ i2 ≤ m

hi1 ,i2 ,j1 xi1 yi2 = zj1 ,

1 ≤ j2 ≤ n

i2 =1 j1 =1 n m X X i1 =1 j1 =1 m X m X i1 =1 i2 =1

with

m X i1 =1

x ¯i1 = 1,

m X i2 =1

X.T. Li, et al.

y¯i2 = 1,

n X j1 =1

z¯j1 = 1.

Generalization

¯ to be a vector When we consider a single relation type, we can set z l/n of all ones, and thus we obtain two matrix equations ¯ H¯ yl/n = x

¯. A¯ xl/n = y

We remark that A can be viewed as the transpose of H. This is exactly the same as that we solve for the singular vectors to get the hub and authority scoring vectors in SALSA. As a summary, the proposed framework HAR is a generalization of SALSA to deal with multi-relational data.

X.T. Li, et al.

HAR - Query Search

To deal with query processing, we need to compute hub and authority scores of objects and relevance scores of relations with respect to a query input (like topic-sensitive PageRank): ¯ + αo = x ¯, (1 − α)H¯ yz ¯ + βo = y ¯, (1 − β)A¯ xz ¯ + γr = z ¯, (1 − γ)R¯ xy where o and r are two assigned probability distributions that are constructed from a query input, and 0 ≤ α, β, γ < 1, are three parameters.

X.T. Li, et al.

HAR - Theory

Ωm = {u = (u1 , u2 , · · · , um ) ∈ Rm |ui ≥ 0, 1 ≤ i ≤ m,

m X

ui = 1}

i=1

and Ωn = {w = (w1 , w2 , · · · , wn ) ∈ Rn |wj ≥ 0, 1 ≤ j ≤ n,

n X

wj = 1}

j=1

Clearly, the solution of HAR is in a convex set. Then we derived the following two theorems based on the Brouwer Fixed Point Theorem.

X.T. Li, et al.

HAR - Theory

Theorem 1 Suppose H, A and R are constructed, 0 ≤ α, β, γ < 1, and o ∈ Ωm and r ∈ Ωn are given. If T is irreducible, then there exist ¯ > 0, y ¯ > 0 and z ¯ > 0 such that (1 − α)H¯ ¯ + αo = x ¯, x yz ¯ + βo = y ¯ , and (1 − γ)R¯ ¯ + γr = z ¯, with x ¯, y ¯ ∈ Ωm (1 − β)A¯ xz xy ¯ ∈ Ωn . and z Theorem 2 Suppose T is irreducible, H, A and R constructed, 0 ≤ α, β, γ < 1 and o ∈ Ωm and r ∈ Ωn are given. If 1 is not the eigenvalue of the Jacobian matrix of the mapping from the tensor, ¯, y ¯ and z ¯ are unique. then the solution vectors x

X.T. Li, et al.

The HAR Algorithm Input: Three tensors H,PA and R, two initial Pnprobability distributions y0 and z0 with ( m [y ] = 1 and 0 i i=1 j=1 [z0 ]j = 1), the assigned probability distributions of objects and/or relations o and Pm Pn r ( i=1 [o]i = 1 and j=1 [r]j = 1), three weighting parameters 0 ≤ α, β, γ < 1, and the tolerance ² ¯ (authority scores), Output: Three stationary probability distributions x ¯ (hub scores) and z ¯ (relevance values) y Procedure: 1: Set t = 1; 2: Compute xt = (1 − α)Hyt−1 zt−1 + αo; 3: Compute yt = (1 − β)Axt zt−1 + βo; 4: Compute zt = (1 − γ)Rxt yt + γr; 5: If ||xt − xt−1 || + ||yt − yt−1 || + ||zt − zt−1 || < ², then stop, otherwise set t = t + 1 and goto Step 2. X.T. Li, et al.

Outline

• Motivation • Related Work • HAR (Idea + Theory + Algorithm) • Experimental Results • Concluding Remarks

X.T. Li, et al.

Evaluation metrics

• P@k: Given a particular query q, we compute the precision at position k as follows: P @k =

#{relevant documents in top k results} k

• NDCG@k: NDCG@k is a normalized version of DCG@k metric. • MAP: Given a query, the average precision is calculated by averaging the precision scores at each position in the search results where a relevant document is found. • R-prec: Given a query, R-prec is the precision score after R documents are retrieved, i.e., R-prec=P@R, where R is the total number of relevant documents for such query.

X.T. Li, et al.

Experiment 1

• 100,000 webpages from .GOV Web collection in 2002 TREC and 50 topic distillation topics in TREC 2003 Web track as queries • links among webpages via different anchor texts • 39,255 anchor terms (multiple relations), and 479,122 links with these anchor terms among the 100,000 webpages • If the i1 th webpage links to the i2 th webpage via the j1 th anchor term, we set the entry ti1 ,i2 ,j1 of T to be one. The size of T is 100, 000 × 100, 000 × 39, 255.

X.T. Li, et al.

HITS SALSA TOPHITS (500-rank) TOPHITS (1000-rank) TOPHITS (1500-rank) BM25+ DepInOut HAR (rel. query) HAR (rel. and obj. query)

P@10 0.0000 0.0160 0.0020

P@20 0.0000 0.0140 0.0010

NDCG@10 0.0000 0.0157 0.0044

NDCG@20 0.0000 0.0203 0.0028

MAP 0.0041 0.0114 0.0008

R-prec 0.0000 0.0084 0.0002

0.0040

0.0020

0.0088

0.0057

0.0016

0.0010

0.0040

0.0030

0.0063

0.0049

0.0011

0.0018

0.0280

0.0180

0.0419

0.0479

0.0370

0.0370

0.0560

0.0410

0.0659

0.0747

0.0330

0.0552

0.1100

0.0800

0.1545

0.1765

0.1035

0.1051

The results of all comparison algorithms on TREC data set.

X.T. Li, et al.

Parameters

0.07 0.06

performance

0.05

P@10,α=β=0 NDCG@10,α=β=0 MAP,α=β=0 R−prec,α=β=0

0.04 0.03 0.02 0.01 0 0

0.2

0.4

γ

0.6

0.8

1

The parameter tuning test: tuning γ with α = β = 0. X.T. Li, et al.

Parameters

0.16 P@10,γ=0.9 NDCG@10,γ=0.9 MAP,γ=0.9 R−prec,γ=0.9

performance

0.14

0.12

0.1

0.08

0.06 0

0.2

0.4

α=β

0.6

0.8

1

The parameter tuning test: tuning α and β with γ = 0.9. X.T. Li, et al.

Experiment 2

• five conferences (SIGKDD, WWW, SIGIR, SIGMOD, CIKM) • Publication information includes title, authors, reference list, and classification categories associated with publication • 6848 publications and 617 different categories • 100 category concepts as query inputs to retrieve the relevant publications • Tensor: 6848 × 6848 × 617, If the i1 th publication cites the i2 th publication and the i2 th publication has the j1 th category concept, then we set the entry ti1 ,i2 ,j1 of T to be one, otherwise we set the entry ti1 ,i2 ,j1 to be zero.

X.T. Li, et al.

HITS SALSA TOPHITS (50-rank) TOPHITS (100-rank) TOPHITS (150-rank) BM25+ DepInOut HAR (rel. query)

P@10 0.2260 0.4100 0.1360

P@20 0.1815 0.3105 0.1145

NDCG@10 0.3789 0.5606 0.1684

NDCG@20 0.3792 0.5352 0.1557

MAP 0.2522 0.3462 0.0566

R-prec 0.2751 0.3929 0.0617

0.1640

0.1340

0.2012

0.1857

0.0646

0.0732

0.1920

0.1410

0.2315

0.1998

0.0732

0.0765

0.0170

0.0145

0.0147

0.0138

0.0162

0.0109

0.5880

0.4155

0.7472

0.6760

0.4731

0.4683

The results of all comparison algorithms on DBLP data set.

X.T. Li, et al.

Outline

• Motivation • Related Work • HAR (Theory + Algorithm) • Experimental Results • Concluding Remarks

X.T. Li, et al.

Concluding Remarks

• Our framework is a general paradigm and it can be further extended to consider data with higher order tensors for potential applications in semantic web, image retrieval and community discovery. • For example, we can consider the query search problem in semantic web using a (1, 1, 1, 1)th order rectangular tensor to represent subject, object, predicate and context relationship. After constructing four transition probability tensors S, O, P and R for subject, object, predicate and context relationship respectively, based on the proposed framework, we expect to solve the following set of tensor equations: Sopr = s, Ospr = o, Psor = p, Rsop = r.

X.T. Li, et al.

Thank you!

X.T. Li, et al.

Hub, Authority and Relevance Scores in Multi ...

There are many applications where the links/hyperlinks can be characterized into different types. X.T. Li, et al. ... The hyperlink structure is exploited by three of the most fre- quently cited Web IR methods: HITS .... particular object as a hub and as an authority, and to use at any particular relation respectively at the time t ...

1008KB Sizes 0 Downloads 193 Views

Recommend Documents

Multimedia maximal marginal relevance for multi-video ...
Nov 26, 2013 - the crowdsourcing platform of Amazon Mechanical Turk. Several current algorithms [18, 48, 55] consider both audio track and video track ...... means, we cluster the frames into the clusters by their features and select one frame from e

Low Complexity Multi-authority Attribute Based ... - IEEE Xplore
Encryption Scheme for Mobile Cloud Computing. Fei Li, Yogachandran Rahulamathavan, Muttukrishnan Rajarajan. School of Engineering and Mathematical ...

Investigation and Treatment of Missing Item Scores in Test and ...
May 1, 2010 - This article first discusses a statistical test for investigating whether or not the pattern of missing scores in a respondent-by-item data matrix is random. Since this is an asymptotic test, we investigate whether it is useful in small

Authority and Centrality
a Department of Economics and CentER, Tilburg University; [email protected] b (Corresponding ..... approximately one hour. In order to study the ..... cooperation at the group level, which we call the 'net efficiency gain' ( ):. ≡ ..... Page 24 .

Meaning and Relevance
become a collective endeavour, with more than thirty books – here we will just mention the most deservedly influential of them ... tribute much more to the explicit side of communication than was traditionally assumed), and Part II, 'Explicit and .

TEXTBOOKS AND TEST SCORES IN KENYA Paul ...
were similar in geographic location, enrollment, and pre-program test scores. ...... giving them little incentive to focus on students who will not make it through 8th grade, .... farmers hear very little English until they go to school, have poor he

2015 Scores in 30s.pdf
08/04/15 Warren, Mike 4 4 5 5 5 3 4 3 4 37. 08/11/15 Brown, Keith 5 4 4 5 4 4 5 2 4 37. 08/11/15 Krafft, Jay 4 4 4 4 5 6 5 2 4 38. 08/11/15 Rieves, Dave 4 4 3 5 5 ...

Utilitarian relevance and face management in the ...
approach predicts that (3) the request interpretation will be comparatively ... threaten what Brown and Levinson call the negative face ... Often, requests are made in an indirect manner and phrased in such a way that they can also be construed.

Hopfield Networks in Relevance and Redundancy ... - Semantic Scholar
The introduced biomaterial is the target class and relatively small as compared to the ..... class. ♯features selection method. LP Gabors LC TC Int. RN RU Zeros. 4. mRmRQ, CC+CC .... [Online; accessed 9-October-2007]. 31. D.W. Tank and ...

Hopfield Networks in Relevance and Redundancy ... - Semantic Scholar
ysis, such as e. g. inspection of cell tissue and of anatomical structures, or in ..... Question 3 deals with the final result of our feature selection: which are the.

Hopfield Networks in Relevance and Redundancy Feature ... - Cogprints
the use of scoring methods in which inherent characteristics of the selected set of variables is optimized. This is contrary to wrapper-based approaches which treat selection as a “black-box“ optimizing the prediction ability according to a chose

Hopfield Networks in Relevance and Redundancy Feature ... - Cogprints
prediction, and a redundancy filter, which measures similarity between features. .... [36,23,8] select features in a framework they call “min- redundancy ...

INSURANCE R E GULATOrIY AND in DEVELOPMENT AUTHORITY
in DEVELOPMENT. AUTHORITY. 27th April, 2012. To. All Licensed Surveyors and Loss. Assessors. Reference is drawn to Regulation 19(b) of Insurance Surveyors and Loss. Assessors (Licensing ... f-mac : irda@ irda.gov.in 44: www.irda.gov.in E-mail ... INS

Pursuing relevance and sustainability
deliver the functionality, benefit or contribution to business objectives that was intended at initiation of ..... Industry & Offshore. 22 ..... respondents into account.

Relevance and Belief Change
Propositional Relevance through Letter-Sharing ... of letter-sharing, has been around for a long time. ...... that meeting is superseded by the present paper.

Formal and Real Authority in Organizations Author(s)
Finally, our approach enables us to provide a modest, but first, step toward the ..... models, what constitutes a wrong action is known in advance to both .... to the principal, either because they involve little cash flow (Bk low) or because the ...

INSURANCE R E GULATOrIY AND in DEVELOPMENT AUTHORITY
Apr 27, 2012 - Randip Sing. (HOD - Non Life) w WR' `r WR, 1-1, NUWK-500 004 . t7 ; Parisharam Bhavan , 3rd Floor, Basheer Bagh, Hyderabad-500 004.

Clean Energy Finance and Investment Authority Brochure
the successor organization to the Connecticut Clean Energy. Fund, invests its resources in an array of enterprises, ... Page 2 ... renewable sources by 2020.

Cheap Hot Sale 4 Ports High Speed USB 2.0 Hub Multi Splitter ...
Download. Connect more apps... Try one of the apps below to open or edit this item. Cheap Hot Sale 4 Ports High Speed USB 2.0 Hub Mult ... Windows 7,8, For Windows Vista, Windows XP OS.pdf. Cheap Hot Sale 4 Ports High Speed USB 2.0 Hub Mult ... Windo

Michigan's Education Achievement Authority and ... - Semantic Scholar
Dec 30, 2014 - Education in Detroit: The Challenge of Aligning Policy Design and Policy. Goals ...... given that the curriculum, technology and instructional.

ACT Scores Graph.pdf
Page 1 of 1. ACT MATH SCORES. Equivalencies. ACT. Scale Score. Mathematics. Test Score. %. Letter. Grade. 36 60 100. A. 35 58. -59 97. -98. 34 57 95.

Relevance and Limitations of Crowding, Fractal, and ...
recovery after photobleaching (FRAP) or any variant of this technique. (Bancaud .... 1978 (Jackson, 1978), can now provide genome wide contact maps of chro-.

signing authority and controls
Apr 12, 2016 - On District cheques, the computer-generated signatories are the Board ... major banks: Royal Bank, Canadian Imperial Bank of Commerce, ...