Enabling Efficient Content Location and Retrieval in Peer-to-Peer Systems by Exploiting Locality in Interests Kunwadee Sripanidkulchai, Bruce Maggs, Hui Zhang, Carnegie Mellon University
Challenges Challengesposed posedby bypeer-to-peer peer-to-peer • Want scalable and high performance content location and peer
Gnutella overlay
selection. Existing solutions provide scalable location, but have not addressed peer selection. • Retrieval performance between end-hosts is highly variable and dynamic.
Content Peer list overlay
May 1, 2001
High variability (σ ≈ 1 sec) in ping times over a 24-hour period to a random end-host on the Internet (typical for 1/3 of 2400 end-hosts pinged in our experiments)
4
Ping Time (ms)
10
3
10
3) The list evolves as more content is retrieved and more peers are discovered
Content location Queries for content are sent to peers in the list
18:00:00
00:00:00 Time
06:00:00
12:00:00
• Need to use up-to-date performance state to select a peer • For scalability, cannot maintain up-to-date state for all peers • Which peers should we maintain state for? - Peers that have locality in interests
Locality in interests Observation: people share common interests. Can we exploit this to improve content location and retrieval? D, E, F 0/3
Fine-grained dynamic performance state can be maintained for peers on the list
Potential benefits and overhead • Use Boeing corporate web proxy traces to drive the request stream for the simulations • Treat a request for a new document (a compulsory cache miss for a web cache) as a publish in peer-to-peer system • Ran simulations over a period of 5 minutes to 3 hours • Content location algorithms are based on - Asking random peers - Asking peers with same interests (1 hop) - Asking peers of peers with same interests (2 hops)
A, C, D, E
2/3 0/3
3/3 A, B, C, D
Peer selection and content retrieval
A, B, C
60
11
max
F, G, H
10
50 9 content−based 1 hop 8
Miss rate (%)
Proposedsolution solution Proposed
average over 16 runs
30
min
Number of peers
random
40
7
content−based 2 hops
6
5
20 4
A distributed algorithm for peers to self-organize into clusters based on interests (peer list) Why is it easier to incorporate dynamic performance state when using locality in interests to locate and retrieve content? - Only need to keep performance state for peers that are likely to provide the content one is looking for.
Peer list 1) Each peer maintains a list of peers who share the same interests 2) Peer lists are initially bootstrapped using existing protocols, such as Gnutella, Tapestry, Pastry, CAN, or Chord. We use the following heuristic: peers that have the content you are looking for have the same interests.
content−based 1 hop
3
content−based 2 hops
2
10
0
0
2000
4000
6000 Simulation length (s)
8000
10000
12000
Locating content amongst peers with locality in interests results in low miss rates
1
0
2000
4000
6000 Simulation length (s)
8000
10000
12000
Maintaining a small list of peers who share the same interests provides good hit rate
Implementationstatus status Implementation • Refining algorithm by ranking peers in one’s list to select peers that are more likely to have content • Exploring alternative mechanisms to bootstrap peer lists • Developing techniques for incorporating dynamic performance state into algorithm • Implementing our solution using Gnutella to bootstrap peer lists
service architectures peer-to-peer systems, and end-hosts participating in such systems .... we run simulations using the Boeing corporate web proxy traces [2] to.
Peer-to-Peer Systems by Exploiting Locality in Interests. Kunwadee ... Gnutella overlay. Peer list overlay. Content. (a) Peer list overlay. A, B, C, D. A, B, C. F, G, H.
May 1, 2001 - Retrieval performance between end-hosts is highly variable and dynamic. ⢠Need to ... Peer-to-Peer Systems by Exploiting Locality in Interests.
The wide-spread adoption of Internet access as a utility service is enabling ... Our interests lie in peer-to-peer content publishing and distribution, where peers ...
Section VIII, and related work in Section IX. II. ... First, shortcuts are modular in that they can work with ..... participate in a Web content file-sharing system.
relatively recent P2P-based storage services that allow data to be stored and retrieved among peers [3]. ... recently, for cloud computing services as well [2], [18]. ...... [45] R. O'Dell and R. Wattenhofer, âInformation dissemination in highly ..
Department of Computer Science, Bar-Ilan University, Israel. 2. School of Electrical .... computed using the top-N speedup technique [3] (N=5) and divided by the ...
(a GMM) to the target training data and computing the average log-likelihood of the ... In this paper we aim to (a) improve the time and storage efficiency of the ...
a case-insensitive match of full name or e-mail address [4]. For. CERC, we make use of publicly released ... merical placeholder token. During our experiments we prune V by only retaining the 216 ..... and EX103), where the former is associated with
training on NVidia GTX480 and NVidia Tesla K20 GPUs. We only iterate once over the entire training set for each experiment. 5. RESULTS AND DISCUSSION. We start by giving a high-level overview of our experimental re- sults and then address issues of s
BloomCast Efficient And Effective Full-Text Retrieval In Unstructured P2P Networks.pdf. BloomCast Efficient And Effective Full-Text Retrieval In Unstructured P2P ...
Epidemic routing [10], which floods the entire network. ... popular data at high social-level nodes to which most content ... 2015 International Conference on Computing, Networking and Communications, Wireless Ad Hoc and Sensor Networks.
AbstractâCloud computing economically enables the paradigm of data service outsourcing. However, to protect data privacy, sensitive cloud data has to be ...
the problem of secure ranked keyword search over encrypted cloud data. Ranked search greatly enhances system usability by enabling search result relevance ...
Dept. of Computer Science, UCLA. Los Angeles, USA. {tuanle, gerla}@cs.ucla.edu. AbstractâIn this paper, we address several security issues in our previously ...
content name toward the higher social level nodes, which are more popular in the network. If the Interest cannot be resolved in the requester's community, it will ...
Jan 18, 2001 - several different content distribution systems such as the Web and popular peer- .... (a) Top 20 most popular queries. 1. 10. 100. 1000. 10000. 100000 ..... host is connected to monitoring ports of the two campus border routers. .....
Jan 18, 2001 - several different content distribution systems such as the Web and ..... host is connected to monitoring ports of the two campus border routers.
In this study, we propose a novel framework for constructing a content-based human mo- tion retrieval system. Two major components, including indexing and matching, are discussed and their corresponding algorithms are presented. In indexing, we intro
(âMountainâ class); right: a transformed image (negative transformation) in the testbed ... Some images from classes of the kernel of CLIC ... computer science.
between the key term ti and the corresponding document class C(ti) is defined by .... initially large number of users can be further classified into cate- gories by ...