On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du

Northeastern University Boston, USA

9/2/2005

VLDB 2005, Trondheim, Norway

1

Outline   Problem

Definition   Related Work   The New Metric: minExistDNN   Data Structures and Algorithm   Experimental Results   Conclusions

9/2/2005

VLDB 2005, Trondheim, Norway

2

Problem Definition   Given:

a set of sites S   a set of weighted objects O   a spatial region Q   an integer t.  

  Top-t  

most influential sites query:

find t sites in Q with the largest influences.

  influence

of a site s = total weight of objects that consider s as the nearest site.

9/2/2005

VLDB 2005, Trondheim, Norway

3

Motivation   Which

supermarket in Boston is the most influential among residential buildings? Sites: supermarkets;   Objects: residential buildings;   Weight: # people in a building;   Query region: Boston;  

  Which

wireless station in Boston is the most influential among mobile users?

9/2/2005

VLDB 2005, Trondheim, Norway

4

Example o2

o1 s1

s2

o4

s3

o5 s4

o3

o6

  Suppose

all objects have weight = 1, Q is the whole space, and t = 1.   The most influential site is s1, with influence = 3. 9/2/2005

VLDB 2005, Trondheim, Norway

5

Example o2

o1 s1

s2

o4

s3

o5 s4

o3

o6

  Now

that Q is the shadowed rectangle and t = 2.   Top-2 most influential sites: s4 and s2.

9/2/2005

VLDB 2005, Trondheim, Norway

6

Outline   Problem

Definition   Related Work   The New Metric: minExistDNN   Data Structures and Algorithm   Experimental Results   Conclusions

9/2/2005

VLDB 2005, Trondheim, Norway

7

Related Work   Bi-chromatic

RNN query: considers two datasets, sites and objects.   The RNNs of a site s ∈ S are the objects that consider s as the nearest site. o2

o1 s1 9/2/2005

s2

o4

s3

o5 s4

o3 VLDB 2005, Trondheim, Norway

o6 8

Related Work   Solutions

to the RNN query based on precomputation [KM00, YL01].

o2

o1 s1

9/2/2005

s2

o4

s3

o5 s4

o3

VLDB 2005, Trondheim, Norway

o6

9

Related Work   Solution

to RNN query based on Voronoi diagram [SRAE01]. Compute the Voronoi cell of s: a region enclosing the locations closer to s than to any other sites.   Querying the object R-tree using the Voronoi cell.  

9/2/2005

VLDB 2005, Trondheim, Norway

10

Related Work [SRAE01]

o2

o1 s1

9/2/2005

s2

o4

s3

o5 s4

o3

VLDB 2005, Trondheim, Norway

o6

11

Our Problem vs. RNN Query   RNN

query:

A single site as an input.   Interested in the actual set of the RNNs.  

  Top-t

most influential sites query:

A spatial region as an input.   Interested in the aggregate weight of RNNs.  

9/2/2005

VLDB 2005, Trondheim, Norway

12

Straightforward Solution 1   For

each site, pre-compute its influence.   At query time, find the sites in Q and return the t sites with max influences.   Drawback

1: Costly maintenance upon

updates.   Drawback 2: binding a set of sites closely with a set of objects.

9/2/2005

VLDB 2005, Trondheim, Norway

13

Straightforward Solution 2 An extension of the Voronoi diagram based solution to the RNN query.

 

1.  2.  3. 

9/2/2005

Find all sites in Q. For each such site, find its RNNs by using the Voronoi cell, and compute its influence. Return the t sites with max influences.

VLDB 2005, Trondheim, Norway

14

Straightforward Solution 2   Drawback

1: All sites in Q need to be retrieved from the leaf nodes.

  Drawback

2: The object R-tree and the site R-tree are browsed multiple times. For each site in Q, browse the site R-tree to compute the Voronoi Cell.   For each such Voronoi Cell, browse the object R-tree to compute the influence.  

9/2/2005

VLDB 2005, Trondheim, Norway

15

Features of Our Solution   Systematically

browse both trees once.   Pruning techniques are provided based on a new metric, minExistDNN.   No need to compute the influences for all sites in Q, or even to locate all sites in Q.

9/2/2005

VLDB 2005, Trondheim, Norway

16

Outline   Problem

Definition   Related Work   The New Metric: minExistDNN   Data Structures and Algorithm   Experimental Results   Conclusions

9/2/2005

VLDB 2005, Trondheim, Norway

17

Motivation    

Intuitively, if some object in Oi may consider some site in Sj as an NN, Oi affects Sj. To estimate the influences of all sites in a site MBR Sj, we need to know whether an object MBR Oi will affect Sj. O1

O2 S1

S2

O1 only affects S1, while O2 affects both S1 and S2. 9/2/2005

VLDB 2005, Trondheim, Norway

18

maxDist – A Loose Estimation   If

maxDist(O1, S1) < minDist(O1, S2), O1 does not affect S2.   Why not good enough? minDist(O1,S2)=8

S2

O1 S1 9/2/2005

maxDist(O1,S1)=10

VLDB 2005, Trondheim, Norway

19

minMaxDist – A Tight Estimation? minDist(o1,S2) = 6

S1

o1

S2

minMaxDist(o1, S1) = 5

  An

object o does not affect S2, if there exists S1 such that minMaxDist(o1, S1) < minDist(o1, S2)

9/2/2005

VLDB 2005, Trondheim, Norway

20

minMaxDist – A Tight Estimation? minDist(O1,S2) = 6

s1 S1

S2

O1 7

6

o1

s2

minMaxDist(O1, S1) = 5

  Not

9/2/2005

true for an object MBR O1. VLDB 2005, Trondheim, Norway

21

A Tight Estimation? A metric m(O1, S1) should:

 

1)  2) 

9/2/2005

guarantee that, each location in O1 is within m(O1, S1) of a site in S1, and be the smallest distance with this property.

VLDB 2005, Trondheim, Norway

22

New Metric – minExistDNNS1(O1)   Definition:

minExistDNNS1(O1) = max {minMaxDist(l, S1) | ∀ location l∈ O1}

  O1

does not affect S2, if there exists S1, s.t. minExistDNNS1(O1) < minDist(O1, S2).

9/2/2005

VLDB 2005, Trondheim, Norway

23

Examples of minExistDNNS1(O1) O1

O1 S1

  How 9/2/2005

S1

to calculate it? VLDB 2005, Trondheim, Norway

24

Calculating minExistDNNS1(O1)  

Step 1: Space partitioning P1:b

P2:c P3:a

a

P4:d c

S1 b P8:a

9/2/2005

d P7:d P6:b

P5:c

Every location l in the same partition is associated with the second closest corner of S1 – the distance is minMaxDist(l, S1)!

VLDB 2005, Trondheim, Norway

25

Space Partitioning   O1

is divided into multiple sub-regions, one in each partition. P1:b

P2:c O1

a

c S1

b

9/2/2005

d

VLDB 2005, Trondheim, Norway

26

Calculating minExistDNNS1(O1)    

Step 2: Choose up-to 8 locations on O1’ border and compute the minMaxDist’s to S1. minExistDNN is the largest one! P1:b

P2:c O1

minExistDNNS1(O1)

a

c S1

b

9/2/2005

VLDB 2005, Trondheim, Norway

d

27

Outline   Problem

Definition   Related Work   The New Metric: minExistDNN   Data Structures and Algorithm   Experimental Results   Conclusions

9/2/2005

VLDB 2005, Trondheim, Norway

28

Data Structure   Two

R-trees: S of sites, O of objects.   Three queues: queueSIN: entries of S inside Q.   queueSOUT: entries of S outside Q.   queueO: entries of O.  

9/2/2005

VLDB 2005, Trondheim, Norway

29

Data Structure S3

O2

O1 S1

Q S4 S2

     

O3

O4

queueSIN: S1 S2 queueO: O1 queueSOUT: S3

9/2/2005

VLDB 2005, Trondheim, Norway

30

maxInfluence and minInfluence   For

each entry Sj in queueSIN,

maxInfluence: total weight of entries in queueO that affect Sj.   minInfluence: total weight of entries in queueO that ONLY affect Sj, divided by the number of objects in Sj.  

  queueSIN

is sorted in decreasing order of maxInfluence.

9/2/2005

VLDB 2005, Trondheim, Norway

31

Algorithm Overview   Expand

queues.

an entry from one of the three

Remove the entry from the queue.   Retrieve the referenced node, and insert the (unpruned) entries into the same queue.   Update maxInfluence and minInfluence if necessary.  

  If

top-t entries in queueSIN are sites, with minInfluences ≥ maxInfluences of all remaining entries, return.

9/2/2005

VLDB 2005, Trondheim, Norway

32

Example S3 O5

S8

 

S9

   

O6

O1

S5

S1

Q

S6

 

   

S7

 

 

queueSIN: S1 queueO: O1 queueSOUT: S3 queueSIN: S5, S7 queueO: O6 queueSOUT: S9

S6 is not affected by O1, prune S6. O5 does not affect S5 and S7, prune O5.

9/2/2005

VLDB 2005, Trondheim, Norway

33

A Pruning Case minExistDNNS3(O1)=4 minDist(S2, O1)=5

Expand S1

S1 S4

S3

S2 O1

minExistDNNS1(O1)=7

  S2

is pruned because of minExistDNNS3 (O1) < minDist(S2, O1)

9/2/2005

VLDB 2005, Trondheim, Norway

34

Choosing an Entry to Expand   Expand

top entries in queueSIN.

  Expand

the most important Oi.

 

Importance: |Oi| * #affected entries * area(Oi)

  Expand

Sj that contains the most important Oi.

9/2/2005

VLDB 2005, Trondheim, Norway

35

Choosing an Entry to Expand Estimate the probability of pruning Oi using some Sj in queueSOUT.

 

Q

Q S1

minDist(S1, O1)=5 minExistDNNS2(O1)=6

 

9/2/2005

S1 minDist(S1, O1)=5

O1

O1

S2

minExistDNNS2(O1)=6

S’2

After expanding S2, O1 is likely not to affect S1. VLDB 2005, Trondheim, Norway

36

Outline   Problem

Definition   Related Work   The New Metric: minExistDNN   Data Structures and Algorithm   Experimental Results   Conclusions

9/2/2005

VLDB 2005, Trondheim, Norway

37

Experimental Setup   Data

sets:

24,493 populated places in North America   9,203 cultural landmarks in North America  

  R-tree

page size: 1 KB   LRU buffer: 128 disk pages.   t = 4.   Comparing

diagram.

9/2/2005

to the solution using Voronoi VLDB 2005, Trondheim, Norway

38

Selected Experimental Results #sites : #objects = 1 : 2.5

9/2/2005

VLDB 2005, Trondheim, Norway

39

Selected Experimental Results #sites : #objects = 2.5 : 1

9/2/2005

VLDB 2005, Trondheim, Norway

40

Outline   Problem

Definition   Related Work   The New Metric: minExistDNN   Data Structures and Algorithm   Experimental Results   Conclusions

9/2/2005

VLDB 2005, Trondheim, Norway

41

Conclusions   We

addressed a new problem: Top-t most influential sites query.   We proposed a new metric: minExistDNN. It can be used to prune search space in NN/RNN related problems.   We carefully designed an algorithm which systematically browses both R-trees once.   Experiments showed more than an order of magnitude improvement. 9/2/2005

VLDB 2005, Trondheim, Norway

42

Q&A

9/2/2005

VLDB 2005, Trondheim, Norway

43

On Computing Top-t Most Influential Spatial Sites

Sep 2, 2005 - Problem Definition. □ Given: ▫ a set of sites S. ▫ a set of weighted objects O. ▫ a spatial region Q. ▫ an integer t. □ Top-t most influential sites ...

350KB Sizes 1 Downloads 151 Views

Recommend Documents

On Computing Top-t Most Influential Spatial Sites - Semantic Scholar
and algorithms were coded using Java, and ran on a. PC with 2.66-GHz Pentium 4 .... F. Korn and S. Muthukrishnan. Influence Sets. Based on Reverse Nearest ...

On Computing Top-t Most Influential Spatial Sites - Semantic Scholar
I/Os of small queries are much larger. Especially, as shown in Figure 13(b), the disk I/Os on the site R-tree dominates the total disk I/Os in the Voronoi method.

Computing with Spatial Trajectories - Semantic Scholar
services (LBS), leading to a myriad of spatial trajectories representing the mobil- ... Meanwhile, transaction records of a credit card also indicate the spatial .... that can run in a batch mode after the data is collected or in an online mode as.

pdf-1833\spatial-cloud-computing-a-practical-approach.pdf ...
pdf-1833\spatial-cloud-computing-a-practical-approach.pdf. pdf-1833\spatial-cloud-computing-a-practical-approach.pdf. Open. Extract. Open with. Sign In.

Read The 100: A Ranking Of The Most Influential ...
describes their careers and contributions. Explaining his ratings, he presents a new perspective on history, gathering together the vital facts about the world s.

Computing Large-Scale System Eigenvalues Most ...
for use in online SPA implementations. IV. NUMERICAL ... tensive comparison with other methods. The power system ..... Power Eng. Soc. Summer Meeting, Jul.