On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du
Northeastern University Boston, USA
9/2/2005
VLDB 2005, Trondheim, Norway
1
Outline Problem
Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
2
Problem Definition Given:
a set of sites S a set of weighted objects O a spatial region Q an integer t.
Top-t
most influential sites query:
find t sites in Q with the largest influences.
influence
of a site s = total weight of objects that consider s as the nearest site.
9/2/2005
VLDB 2005, Trondheim, Norway
3
Motivation Which
supermarket in Boston is the most influential among residential buildings? Sites: supermarkets; Objects: residential buildings; Weight: # people in a building; Query region: Boston;
Which
wireless station in Boston is the most influential among mobile users?
9/2/2005
VLDB 2005, Trondheim, Norway
4
Example o2
o1 s1
s2
o4
s3
o5 s4
o3
o6
Suppose
all objects have weight = 1, Q is the whole space, and t = 1. The most influential site is s1, with influence = 3. 9/2/2005
VLDB 2005, Trondheim, Norway
5
Example o2
o1 s1
s2
o4
s3
o5 s4
o3
o6
Now
that Q is the shadowed rectangle and t = 2. Top-2 most influential sites: s4 and s2.
9/2/2005
VLDB 2005, Trondheim, Norway
6
Outline Problem
Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
7
Related Work Bi-chromatic
RNN query: considers two datasets, sites and objects. The RNNs of a site s ∈ S are the objects that consider s as the nearest site. o2
o1 s1 9/2/2005
s2
o4
s3
o5 s4
o3 VLDB 2005, Trondheim, Norway
o6 8
Related Work Solutions
to the RNN query based on precomputation [KM00, YL01].
o2
o1 s1
9/2/2005
s2
o4
s3
o5 s4
o3
VLDB 2005, Trondheim, Norway
o6
9
Related Work Solution
to RNN query based on Voronoi diagram [SRAE01]. Compute the Voronoi cell of s: a region enclosing the locations closer to s than to any other sites. Querying the object R-tree using the Voronoi cell.
9/2/2005
VLDB 2005, Trondheim, Norway
10
Related Work [SRAE01]
o2
o1 s1
9/2/2005
s2
o4
s3
o5 s4
o3
VLDB 2005, Trondheim, Norway
o6
11
Our Problem vs. RNN Query RNN
query:
A single site as an input. Interested in the actual set of the RNNs.
Top-t
most influential sites query:
A spatial region as an input. Interested in the aggregate weight of RNNs.
9/2/2005
VLDB 2005, Trondheim, Norway
12
Straightforward Solution 1 For
each site, pre-compute its influence. At query time, find the sites in Q and return the t sites with max influences. Drawback
1: Costly maintenance upon
updates. Drawback 2: binding a set of sites closely with a set of objects.
9/2/2005
VLDB 2005, Trondheim, Norway
13
Straightforward Solution 2 An extension of the Voronoi diagram based solution to the RNN query.
1. 2. 3.
9/2/2005
Find all sites in Q. For each such site, find its RNNs by using the Voronoi cell, and compute its influence. Return the t sites with max influences.
VLDB 2005, Trondheim, Norway
14
Straightforward Solution 2 Drawback
1: All sites in Q need to be retrieved from the leaf nodes.
Drawback
2: The object R-tree and the site R-tree are browsed multiple times. For each site in Q, browse the site R-tree to compute the Voronoi Cell. For each such Voronoi Cell, browse the object R-tree to compute the influence.
9/2/2005
VLDB 2005, Trondheim, Norway
15
Features of Our Solution Systematically
browse both trees once. Pruning techniques are provided based on a new metric, minExistDNN. No need to compute the influences for all sites in Q, or even to locate all sites in Q.
9/2/2005
VLDB 2005, Trondheim, Norway
16
Outline Problem
Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
17
Motivation
Intuitively, if some object in Oi may consider some site in Sj as an NN, Oi affects Sj. To estimate the influences of all sites in a site MBR Sj, we need to know whether an object MBR Oi will affect Sj. O1
O2 S1
S2
O1 only affects S1, while O2 affects both S1 and S2. 9/2/2005
VLDB 2005, Trondheim, Norway
18
maxDist – A Loose Estimation If
maxDist(O1, S1) < minDist(O1, S2), O1 does not affect S2. Why not good enough? minDist(O1,S2)=8
S2
O1 S1 9/2/2005
maxDist(O1,S1)=10
VLDB 2005, Trondheim, Norway
19
minMaxDist – A Tight Estimation? minDist(o1,S2) = 6
S1
o1
S2
minMaxDist(o1, S1) = 5
An
object o does not affect S2, if there exists S1 such that minMaxDist(o1, S1) < minDist(o1, S2)
9/2/2005
VLDB 2005, Trondheim, Norway
20
minMaxDist – A Tight Estimation? minDist(O1,S2) = 6
s1 S1
S2
O1 7
6
o1
s2
minMaxDist(O1, S1) = 5
Not
9/2/2005
true for an object MBR O1. VLDB 2005, Trondheim, Norway
21
A Tight Estimation? A metric m(O1, S1) should:
1) 2)
9/2/2005
guarantee that, each location in O1 is within m(O1, S1) of a site in S1, and be the smallest distance with this property.
VLDB 2005, Trondheim, Norway
22
New Metric – minExistDNNS1(O1) Definition:
minExistDNNS1(O1) = max {minMaxDist(l, S1) | ∀ location l∈ O1}
O1
does not affect S2, if there exists S1, s.t. minExistDNNS1(O1) < minDist(O1, S2).
9/2/2005
VLDB 2005, Trondheim, Norway
23
Examples of minExistDNNS1(O1) O1
O1 S1
How 9/2/2005
S1
to calculate it? VLDB 2005, Trondheim, Norway
24
Calculating minExistDNNS1(O1)
Step 1: Space partitioning P1:b
P2:c P3:a
a
P4:d c
S1 b P8:a
9/2/2005
d P7:d P6:b
P5:c
Every location l in the same partition is associated with the second closest corner of S1 – the distance is minMaxDist(l, S1)!
VLDB 2005, Trondheim, Norway
25
Space Partitioning O1
is divided into multiple sub-regions, one in each partition. P1:b
P2:c O1
a
c S1
b
9/2/2005
d
VLDB 2005, Trondheim, Norway
26
Calculating minExistDNNS1(O1)
Step 2: Choose up-to 8 locations on O1’ border and compute the minMaxDist’s to S1. minExistDNN is the largest one! P1:b
P2:c O1
minExistDNNS1(O1)
a
c S1
b
9/2/2005
VLDB 2005, Trondheim, Norway
d
27
Outline Problem
Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
28
Data Structure Two
R-trees: S of sites, O of objects. Three queues: queueSIN: entries of S inside Q. queueSOUT: entries of S outside Q. queueO: entries of O.
9/2/2005
VLDB 2005, Trondheim, Norway
29
Data Structure S3
O2
O1 S1
Q S4 S2
O3
O4
queueSIN: S1 S2 queueO: O1 queueSOUT: S3
9/2/2005
VLDB 2005, Trondheim, Norway
30
maxInfluence and minInfluence For
each entry Sj in queueSIN,
maxInfluence: total weight of entries in queueO that affect Sj. minInfluence: total weight of entries in queueO that ONLY affect Sj, divided by the number of objects in Sj.
queueSIN
is sorted in decreasing order of maxInfluence.
9/2/2005
VLDB 2005, Trondheim, Norway
31
Algorithm Overview Expand
queues.
an entry from one of the three
Remove the entry from the queue. Retrieve the referenced node, and insert the (unpruned) entries into the same queue. Update maxInfluence and minInfluence if necessary.
If
top-t entries in queueSIN are sites, with minInfluences ≥ maxInfluences of all remaining entries, return.
9/2/2005
VLDB 2005, Trondheim, Norway
32
Example S3 O5
S8
S9
O6
O1
S5
S1
Q
S6
S7
queueSIN: S1 queueO: O1 queueSOUT: S3 queueSIN: S5, S7 queueO: O6 queueSOUT: S9
S6 is not affected by O1, prune S6. O5 does not affect S5 and S7, prune O5.
9/2/2005
VLDB 2005, Trondheim, Norway
33
A Pruning Case minExistDNNS3(O1)=4 minDist(S2, O1)=5
Expand S1
S1 S4
S3
S2 O1
minExistDNNS1(O1)=7
S2
is pruned because of minExistDNNS3 (O1) < minDist(S2, O1)
9/2/2005
VLDB 2005, Trondheim, Norway
34
Choosing an Entry to Expand Expand
top entries in queueSIN.
Expand
the most important Oi.
Importance: |Oi| * #affected entries * area(Oi)
Expand
Sj that contains the most important Oi.
9/2/2005
VLDB 2005, Trondheim, Norway
35
Choosing an Entry to Expand Estimate the probability of pruning Oi using some Sj in queueSOUT.
Q
Q S1
minDist(S1, O1)=5 minExistDNNS2(O1)=6
9/2/2005
S1 minDist(S1, O1)=5
O1
O1
S2
minExistDNNS2(O1)=6
S’2
After expanding S2, O1 is likely not to affect S1. VLDB 2005, Trondheim, Norway
36
Outline Problem
Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
37
Experimental Setup Data
sets:
24,493 populated places in North America 9,203 cultural landmarks in North America
R-tree
page size: 1 KB LRU buffer: 128 disk pages. t = 4. Comparing
diagram.
9/2/2005
to the solution using Voronoi VLDB 2005, Trondheim, Norway
38
Selected Experimental Results #sites : #objects = 1 : 2.5
9/2/2005
VLDB 2005, Trondheim, Norway
39
Selected Experimental Results #sites : #objects = 2.5 : 1
9/2/2005
VLDB 2005, Trondheim, Norway
40
Outline Problem
Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
41
Conclusions We
addressed a new problem: Top-t most influential sites query. We proposed a new metric: minExistDNN. It can be used to prune search space in NN/RNN related problems. We carefully designed an algorithm which systematically browses both R-trees once. Experiments showed more than an order of magnitude improvement. 9/2/2005
VLDB 2005, Trondheim, Norway
42
Q&A
9/2/2005
VLDB 2005, Trondheim, Norway
43