The Effect of Caching in Sustainability of Large Wireless Networks G. S. Paschos, S. Gitzenis, and L. Tassiulas,
Abstract—We study the scalability of multihop wireless communications, a major concern in networking, for the case that users access content cached across the nodes. In contrast to the standard paradigm of randomly selected communicating pairs, content replication is efficient for certain regimes of content volume and popularity, cache and network size. Assuming the Zipf popularity law, and investigating on the relative ways that the number of files, the cache size and the network nodes can all jointly scale to infinity, we derive asymptotic laws on required √ link capacity, which range from O N down to O(1), and identify regimes of network operation.
I. I NTRODUCTION The proliferation of video applications and the advent of new paradigms like multiview and 3D video, as well as other demanding applications push the operation of networks to their physical limits. To overcome these challenges, the networking community devises new technologies and architectures like the Peer-to-Peer (P2P) communication paradigm and the Content Centric Networking in an effort to improve the scalability and the efficiency for the Internet of the future. In this landscape, wireless networks are considered to hold an important role, supporting mobility of users, extending network connectivity and promoting ubiquitous computing. According to [1], traffic from wireless devices will exceed traffic from wired ones by 2015. Despite their worldwide deployment, wireless networks are mostly confined to one-hop access from the wired backhaul. Multihop operation of wireless networks is limited to specific applications like sensor networks where the supported communication rates are low, or engineered fixed point-to-point links with directional antennas. In the seminal work of GuptaKumar [2], the traffic-carrying of a planar wireless capacity network was shown to be O √1N , where N is the number of nodes. This implies that large multihop networks cannot sustain throughput among random pairs due to the increasing hop number between the source and the destination. There is anecdotal evidence that the average P2P video file travels back and forth on the same optical link multiple times causing detrimental effects to the network efficiency. This effect has a very large volume if one considers that video data account for almost 50% of the overall network traffic today, [1]. In this context, network caching has a key role, as it can mitigate these inefficiencies by storing the data close to the customer and avoiding excess traffic, thereby increasing significantly the efficiency of the network. In the wireless networks of the future, one can envisage the direct participation of a myriad of computing devices with
variable cache capabilities. These devices are expected to form new paradigms of networks based on multihop wireless connectivity, and deliver high quality services as the ones already described. Moreover, due to the ongoing research on memory and storage technologies, the size of these caches is expected to increase geometrically over time1 . The question we study is whether the trend of increasing cache size is sufficient to bring a measurable improvement in the operation of wireless networks, and, in particular, to change the asymptotic law of the wireless network capacity. In this work, we depart from the random-pairs communicating paradigm of [2] and important follow-ups [4]–[8], by defining a content-based communication paradigm, where nodes request content replicated inside the network, in the caches of other nodes. This paradigm gives rise to the joint problem of replication and routing. Assuming the symmetric topology of the square grid (a well-accepted model for various planar wireless networks), we set up a replication problem whose optimal solution results in the same order to the complex combinatorial joint problem. Then, we use this solution to identify the asymptotic laws of the required link capacity that can sustain the associated network traffic. In contrast to our prior works’ perspective of [9], [10], the cache size is assumed to increase to infinity. In such a study, the statistics of the applications are quite important. Therefore, in this work, we assume that the requested messages have a popularity described by the Zipf law with parameter τ , a wellknown approach for modeling file popularity. Due to space constraints, the proofs are omitted from this version of the paper, and will be appearing in an upcoming extended version (to assist the reviewer, we keep the appendix in this document). Table I provides the definitions on the asymptotic notation that we use throughout this work. II. BASIC D EFINITIONS AND THE D ENSITY P ROBLEM Assume a square lattice with N nodes, with N being the square of an integer; the set of nodes is indexed by n ∈ N , {1, 2, . . . , N }. Each node is connected to its four neighbors that lie next to it on the same row or column with undirected links. By keeping the node density fixed and increasing the network size N , we obtain a scaling network similar to [2]. Moreover, for simplicity, we consider a toroidal structure as in [11] to avoid boundary effects. 1 After a period of doubling of the areal density of hard disks per year, the growth rate has dropped to doubling every three years. Similarly, DRAM capacity quadruples every three years [3].
TABLE I D EFINITION OF ASYMPTOTIC NOTATION (f and g are positive functions). f = o(g) For any k > 0, there exists x ˆ: f = O(g) There exists k > 0 and x ˆ: x≥x ˆ ⇒ f (x) ≤ kg(x)
There exists x ˆ:
lim
f < k0 g For any 0 < k < k0, there exists x ˆ: f = ω(g) For any k > 0, there exists x ˆ: f = Ω(g) There exists k > 0 and x ˆ: lim
f ≥ kg lim
k0 g
x≥x ˆ ⇒ f (x) ≥ kg(x)
There exists x ˆ: k0 ,
B. Replication Density-based Problem
f > For any k > there exists x ˆ, f = Θ(g) Iff f = O(g) and f = Ω(g)
Nodes (or users located therein) generate requests to access files/data, indexed by m ∈ M , {1, 2, . . . , M }. Each node n is equipped with a cache/buffer, whose contents are denoted by the set Bn , a subset of M. If a request at node n regards a file m that lies in Bn , then it is served locally. Due to the limited buffer capacity, m will often be not available locally, thus, node n will have to retrieve m over the network from some other node w that keeps m in its cache. Thus for each n, m pair, a route (or set of routes) Rn,m should be decided to specify the path(s) followed from n to w in accessing m. Let, moreover, K be the storage capacity of nodes’ cache measured in the number of files it can store. This means that all M files are of the same (unit) size, placing a constraint on the cardinality of cache contents |Bn | ≤ K. The generalization to variable sized files can be still captured in this framework by splitting each large file into multiple unit segments, and then treating its segments as separate, independent files. For the problem of replication not to be trivial, it should be K < M,
(1)
which implies that each node has to select the files to buffer in its cache. Moreover, for the network to have sufficient memory to store each file at least once, it should be KN ≥ M.
(2)
Last, let each node n ∈ N generate requests for data at rate λn . In this work, we focus on the symmetric node request rate, that is, λn = λ = 1. Each request regards a particular file m ∈ M, depending on the file m’s popularity pm . In essence, [pm ] is a probability distribution, i.e., sets the probability of a request for a given file. Clearly, replication should be governed by the popularity: storing the popular files densely will maximize the gain of caching on the network traffic.
n∈N
Based on this metric, we can define a much simpler problem based on the the file densities: X 1 √ − 1 pm , s.t. P ROBLEM 1: Minimize C , dm m∈M 1) For P any m ∈ M, 2) m∈M dm ≤ K.
1 N
≤ dm ≤ 1,
In the above, the optimization variables are the densities dm , which express the fraction of caches containing file m. −1 In the objective, dm 2 − 1 approximates the average hop count from a random node to a cache containing m. Weighted by the probability pm of requests on m, the summation expresses the average link load per request. This optimization is shown in [10] to be a relaxed version of the actual general problem, and, moreover, whose optimal solution C is of the same order to the solution of the original problem; in particular, [10] presents an algorithm to assign the node cache contents [Bn ] from the densities dm and uses shortest path routing for the delivery paths [Rn,m ]. Thus, the asymptotic laws of the original problem and of C coincide, and, therefore, it suffices to study C’s scaling. It should be noted that a similar optimization is formulated in [12], without, however, the N1 ≤ dm ≤ 1 constraints. As seen next, these inequalities have a major impact on the solution, and, consequently, in the asymptotics. C. Density Problem Solution As explained in [9], [10], and easily seen from the functional form, the density problem admits a unique solution using the Karush-Kuhn-Tucker (KTT) conditions, and a computationally efficient algorithm which finds the solution in polynomial time. With regard to the constraints on dm about its minimum and maximum value, either one of them can be an equality, or none. This causes the partition of M into three subsets, one containing files of unit replication density (i.e., stored at every node) M = {m : dm = 1}, one containing files stored in just one node M = {m : dm = N1 }, and the complementary
A. General Replication-Delivery Problem Assuming a [Bn ] replication and delivery routes [Rn,m ], it is easy to compute the traffic load C` at each link ` of the network. The associated replication-delivery problem regards minimizing over the worst (or, in a relaxed form, average) traffic C` as in [9], [10] given the constraints of (i) node capacity and (ii) storing at least one copy of each file over the network. The resulting C` , then, sets the minimum capacity of each link so that the network operates properly (i.e., is stable).
Assuming a solution on the general problem, we can define a particularly important quantity, the frequency of occurrence of each file m in the caches, or replication density dm as the fraction of nodes that store file m in the network: 1 X 1{m∈Bn } . (3) dm = N
lim
f ≤ kg
Obviously, the entanglement between the positions [Bn ] where each file is stored and the delivery paths [Rn,m ] calls for a joint optimization. However, this is clearly a combinatorial complexity problem, and thus not amenable to an easy to compute solution. Therefore, we should seek suitable simplifications and approximations to derive a suboptimal but efficient solution. For the needs of our study, this translates to an orderoptimal solution, i.e., the proposed suboptimal solution lies within a constant to the optimal (but hard to compute) solution.
A. Zipf Law and Approximations The Zipf distribution is defined as follows: 1 pm = m−τ , Hτ (M )
(5)
ln
n+1 m+2 ,
if τ = 1. (6)
For any m < n such that n ∼ m, we will make use of the following approximation derived by counting the sum terms Hτ (n) − Hτ (m) =
n X
j −τ ∼ n−τ (n − m).
ln
n+1 m+1 ≤ Hτ (n)−Hτ (m)≤
2) Estimation of r: If M ∪M is not empty, dr−1 > N1 ⇔ i h 2τ (K −l+1)N −M +r−1 > (r−1) 3 H 2τ3 (r−1)−H 2τ3 (l−1) . (14) Again, if set M is not empty, i.e., r ≤ M , then dr = N −1 . Thus, if we attempted increasing index r by one, (4b) would violate the constraint resulting in a density less than N −1 : h i 2τ (K −l+1)N −M +r ≤ r 3 H 2τ3 (r)−H 2τ3 (l−1) . (15)
where τ is the power law parameter, indicating the Pnrate of popularity decline as m increases, and Hτ (n) , j=1 j −τ is the truncated (at n) zeta function evaluated at τ (also called the nth τ -order generalized harmonic number). The limit Hτ , lim Hτ (n) is the Riemann zeta function, which n→∞ converges when τ > 1. We derive an approximation for Hτ (n) by bounding the sum: for n ≥ m ≥ 0, Z n Z n (x + 1)−τ dx ≤ Hτ (n) − Hτ (m) ≤ 1 + x−τ dx, ⇒ m m+1 ((n+1)1−τ−(m+1)1−τ 1−τ 1−τ ≤ Hτ (n)−Hτ (m)≤ n −(m+1) +1, if τ 6= 1, 1−τ 1−τ
To derive concrete asymptotics of the link rate, we consider the Zipf law, a distribution well-known for the Internet’s traffic.
approximations in order to find the scaling of C. 1) Estimation of l: note that l − 1 represents the number of (4b) files cached in all nodes. If M ∪ M is not empty, dl < 1 ⇔ h i 2τ M −r+1 K −l+1− < l 3 H 2τ3 (r−1)−H 2τ3 (l−1) . (11) N If, moreover, the first set M is not empty, i.e., l > 1, then dl−1 = 1. This means that if we attempted to decrease index l by 1, this would violate the density constraints, and result in (4b) a number greater than 1 for dl−1 : h i 2τ M −r+1 K−l− ≥ (l − 1) 3 H 2τ3 (r−1)−H 2τ3 (l−2) . (12) N Thus, provided l > 1, it can be uniquely determined as the lowest integer that satifisfies (11)-(12). An approximation for l can be computed treating (11) as an approximate equality (as dl−1 = 1 and dl < 1): i M −r+1 ∼ 2τ h K −l+1− = l 3 H 2τ3 (r−1)−H 2τ3 (l−1) . (13) N
M = M \ (M ∪ M ). When pm is in decreasing order, these sets become ordered, too; then, we use variables l and r to identify the boundaries of these sets, as follows: M = {1, 2, . . . , l−1}, M = {l, l+1, . . . , r −1}, and M = {r, r + 2, . . . , M }, where l and r are integers with 1 ≤ l ≤ r ≤ M +1. Given these, the solution dm takes the form of (4a) 1, m∈M, M −r+1 2 K −l+1− N 3 pm , m ∈ M , (4b) Pr−1 23 dm = p j=l j 1, m∈M. (4c) N III. A SYMPTOTIC L AWS FOR Z IPF P OPULARITY
As before, (14) is an approximate equality, i.e., h i 2τ (K−l+1)N −M +r−1 ∼ = (r−1) 3 H 2τ3 (r−1)−H 2τ3 (l−1) . (16) 2τ dl r−1 3 l 3) Estimation of r : For all l, r, it is N > dr−1 = l . As before, whenever l and r are not equal to the extremes, i.e., 1 < l ≤ r < M + 1, it holds dl−1 /dr = N . Hence,
(7)
3
l∼ = rN − 2τ .
j=m
Up to this point, we have summarized the analysis of [9], [10]. Departing from them, we investigate the behavior of l and r as N, M and K go to infinity.
Substituting the solution (4) and plugging in the Zipf distribution into C, it follows that M X X −1 C, dm 2 − 1 pm = C + C − pm , (8)
(17)
m∈M
j=l
C. Almost empty M
PM
j=l
L EMMA 1 [l, r AND CONDITIONS FOR ALMOST E MPTY M ]: If M ≈ ∅, then r ∼ M and
As indices l, r are not provided in a closed form, we derive
B. Estimation of l and r
(10)
m∈M
C ,
τ
p H (M ) − Hτ (r − 1) (5) √ √m = N τ . Hτ (M ) dm
N
X
The first case of interest is when the solution results in an almost empty set M . Formally, M ≈ ∅ iff |M | = o(M ), i.e., the number of last set’s elements over the total files is negligible; thus, M = ∅ is a special case of M ≈ ∅. For M ≈ ∅, M should increase at a slow pace in respect to N and K, so that the constraint dm ≥ N −1 is satisfied for almost all (i.e., M − o(M )) files. Since |M | = M − r + 1, this condition is equivalent to M − r = o(M ).
pm = O(1) (as it lies always in [0, 1]), and h i 32 H 2τ3 (r − 1) − H 2τ3 (l − 1) X pm (5) √ C , , (9) = q dm K − l + 1 − M −r+1 H (M ) m∈M
where
•
for τ < 3/2, it is lim 2τ if K < M 1− 3 l → 1, lim 2τ3 K 2τ3 1− 2τ 3 l ∼ 3−2τ ≤ K = o(M ), 3 −1, if M 3 2τ M l ∼ αK, if K ∼ βτM, h
3−2τ (1−α) 3
ω(K) = M
K = Θ(M );
for τ = 3/2, it is l → 1
−3 i 3− 2τ
and
if K ≤ ln M,
l∼ l ∼ αK,
•
lim
if ln M < K = o(M ), if K ∼ γα M, α−1
where α ∈ (0, 1], γα , α1 e α , and lim M = ∅, if M = ω(K), and M ln M < KN, lim M ≈ ∅, if M = ω(K), and M ln M ≤ KN, M = ∅, if K = Θ(M );
•
for τ > 3/2, it is ( l∼
2τ −3 2τ K,
l ∼ αK,
if K = o(M ), if K ∼ δα,τ M,
−3 3−2τ 2τ where α ∈ 2τ2τ−3 , 1 , δα,τ , α 3−2τ 3−2τ3(1−α) and lim 3 M = ∅, if ω(K) = M < 2τ2τ−3 KN 2τ ,
lim
M ≈ ∅, if ω(K) = M ≤ M = ∅, if K = Θ(M ).
2τ −3 2τ KN
3 2τ
,
3
Else, if M 6= ∅, and KN − M = ω(1), then
L EMMA 2 [l AND r FOR NON -E MPTY M ]: If M 6= ∅, and KN − M = O(1), then l → 1 and r = Θ(1); in particular, r ≈ 1 + 3−2τ 2τ (KN − M ), if τ < 3/2, (r − 1) ln(r − 1) ≈ KN − M, if τ = 3/2, 2τ −3 KN −M if τ > 3/2. r ≈1+ N 1− 2τ
(KN − M ) ,
r∼
3−2τ 2τ
(KN − M ) ,
r ln r ∼ KN − M,
r∼
KN −M ln N ,
for τ > 3/2 lim 2τ if KN − M ≤ N, 2τ −3 3 3 then l → 1, r ∼ 2τ2τ−3 2τ (KN − M ) 2τ , lim 2τ if KN − M > N, 2τ −3 2τ −3 KN −M then l ∼ 2τ2τ−3 K − M , 1− 3 N , r ∼ 2τ N
2τ
Note, that for all the cases we have K − l + 1 = Θ(K). E. Capacity scaling We proceed to the asymptotic behavior of the system on the rate C, which regards the case of the number of nodes N , the number of files M and the size of caches K, all increasing to infinity in various relative rates. √ First, we establish the Gupta-Kumar rate [2] O N as an upper bound. This is intuitive: if replication is ineffective (e.g., due to large number of files or small size of caches), then the system and its performance essentially reduce to [2]. √ L EMMA 3 [U PPER B OUND ON C]: C = O N . Next, we begin the asymptotic analysis, partitioning the space of M, N, K parameters to whether they produce single replicated files (non-empty M ) or not. To study the asymptotics of C, we use the results for l and r obtained in Lemmas 1 and 2. The almost empty M implies that M − r = o(M ) and there are very few files stored once in the network in which case the required link capacity law depends on C . T HEOREM 4 [C APACITY FOR A LMOST E MPTY M ]: if K ∼ M , then C = o(1). Otherwise, √ • if τ < 1, C = Θ √M , K√ M • if τ = 1, C = Θ √K log M , 3 −τ 3 • if 1 < τ < 2 , C = Θ M√2K , 3 • if τ = 2 , and K = Θ(M ), C = Θ √1K ,
If M is non-empty, then M − r = Θ(M ).
3−2τ 2τ
D. Non-empty M
r∼
lim
Note, that we have K − l + 1 = Θ(K) if K < M . If, however K ∼ M , then l ∼ K ∼ M , in which case the majority of files are stored locally.
2τ
for τ = 3/2, lim if KN − M ≤ N ln N, then l → 1, lim if KN − M > N ln N, −M then l ∼ KN N ln N ,
lim
K ln M
for τ < 3/2, lim 3 2τ N 2τ , if KN − M ≤ 3 − 2τ then l → 1, lim 3 2τ if KN − M > N 2τ , 3 − 2τ KN −M then l ∼ 3−2τ , 3 2τ N 2τ
•
lim 3−2τ 3 KN, lim ≤ 3−2τ 3 KN,
ω(K) = M <
•
2τ
, α 3−2τ
where α ∈ (0, 1], βα,τ M = ∅, if M ≈ ∅, if M = ∅, if
•
•
if τ >
3 2,
C=
3/2 log√ M K Θ √1K .
C=Θ
,
guarantee this condition. Thus, if τ ≤ 1, investment in cache size makes sense only if it suffices to guarantee M ≈ ∅.
•
if τ = 32 , and K = o(M ),
The main result of the asymptotic laws regards the minimum link rate (of the bottleneck link) required to sustain a constant request rate from each node. As a preliminary comment, the link rates are subject to the information theory, e.g. Shannon’s capacity law. Thus, a rate C that scales to infinity should be rather interpreted as the inverse √ of the sustainable request rate λ, e.g., a result C = Θ( N ) for λ = 1 isequivalent to C = Θ(1) for the Gupta-Kumar law of λ = Θ √1N . Power law parameter τ sets two phase transition points for the values of 1 and 32 , leading to distinct asymptotics: the higher τ , the more uneven the popularity of files, and thus, the more advantageous the caching (i.e. lower rate C). As summarized in Table II, C = O(1) on τ > 23 , or, i.e., the wireless network is sustainable (it corresponds to a traffic carrying capacity of O(1) in the Gupta-Kumar setup).In real systems, the Zipf parameter ranges typically from 0.5 [13] to 3 [14] depending on the application: low values are typical in routers, intermediate values in proxies and higher values in mobile applications [15], [16]–see also references therein. More common are the cases of low and intermediate values of τ (representative also of the whole file population without any application bias), which flatten the popularity distribution towards the Replication is less effective, ending up √ uniform. to the Θ N law for τ < 1, a synonym for the Guptalim
Kumar law. When M ≤ 3−2τ 3 KN , then only a few files are cached once (the condition M ≈ ∅) in which case there is an improvement over [2], see Theorem 4. Comparing our results to [9], [10], we note the differences due to node cache capacity going to infinity as well. First, when M ≈ ∅, the improvement is significant. More precisely, the term √1K multiplies the asymptotic law, suggesting that the required link rate of the bottleneck can be partially mitigated by investment in caching. On the other hand, when M 6= ∅, the term K appears only for τ > 1, in the form of KN − M . Note, however, that the condition for M ≈ ∅ depends itself on K, see Lemma 1. In particular, a sufficient increase in K can
IV. D ISCUSSION ON A SYMPTOTIC L AWS
V. C ONCLUSIONS & F UTURE W ORK In this work, we investigated the effect of caching in the asymptotic capacity of wireless networks under the paradigm of content replication and delivery. We showed that depending on the file popularity distribution, there exist regimes of network expansion where caching can be effective tool in mitigating the problem of multihop wireless networks sustainability. More precisely, in the regime M ≈ ∅, increasing the cache size brings a √1K multiplicative term in the required capacity of the bottleneck link. Also, if M 6= ∅, increasing the cache size is helpful if τ > 1. A future extension of this work will focus on establishing the result on non-symmetric topologies and arrival conditions. Also, we are interested to investigate the effect of in-network caching in medium-sized wireless networks.
T HEOREM 5 [C APACITY FOR N ON -E MPTY M ]: √ • If τ < 1, C=Θ N , √ lim • if τ = 1, and M < KN, C = Θ logNM , √ • if τ = 1, and M ∼ KN, C=Θ N , √ N 3 C = Θ (KN −M • if 1 < τ < 2 , −1 , )τ 3 2 3 • if τ = 2 , C = Θ √log rM , K− N √ lim 3 N 2τ • if τ > 2 , and KN −M ≤ 2τ−3 N, C = Θ √ , N K−M lim τ −1 3 N 2τ • if τ > 2 , and KN −M > 2τ−3 N, C = Θ (N K−M )τ −1 .
R EFERENCES
[1] Cisco, “Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 20102015,” White Paper, 2011, Tech. Rep. [2] P. Gupta and P. R. Kumar, “The capacity of wireless networks,” IEEE Trans. Inf. Theory, vol. 46, pp. 388–404, Mar. 2000. [3] J. Hennesy and D. Patterson, Computer Architecture: A Quantitative Approach. San Francisco, CA, USA: Morgan-Kauffman publishers, 4th edition, 2007. [4] A. Zemlianov and G. de Veciana, “Capacity of ad hoc wireless networks with infrastructure support,” IEEE J. Sel. Areas Commun., vol. 23, pp. 657–667, Mar. 2005. ¨ ur, O. L´evˆeque, and D. Tse, “Hierarchical cooperation achieves [5] A. Ozg¨ optimal capacity scaling in ad hoc networks,” IEEE Trans. Inf. Theory, vol. 53, pp. 3549–3572, Oct. 2007. [6] M. Franceschetti, O. Dousse, D. Tse, and P. Thiran, “Closing the gap in the capacity of wireless networks via percolation theory,” IEEE Trans. Inf. Theory, vol. 53, pp. 1009–1018, Mar. 2007. [7] S. Toumpis, “Asymptotic capacity bounds for wireless networks with non-uniform traffic patterns,” IEEE Trans. Wireless Commun., vol. 7, pp. 2231–2242, Jun. 2008. [8] M. Franceschetti, M. D. Migliore, and P. Minero, “The capacity of wireless networks: information-theoretic and physical limits,” IEEE Trans. Inf. Theory, vol. 55, pp. 3413–3424, Aug. 2009. [9] S. Gitzenis, G. S. Paschos, and L. Tassiulas, “Asymptotic laws for content replication and delivery in wireless networks,” in Proc. of INFOCOM, Orlando, FL, USA, Mar. 2011. [10] ——, “Asymptotic Laws for Joint Content Replication and Delivery in Wireless Networks,” arXiv:1201.3095v1 [cs.NI], Tech. Rep. [11] M. Franceschetti and R. Meester, Random Networks for Communication. New York, NY, USA: Cambridge University Press, Series: Cambridge Series in Statistical and Probabilistic Mathematics (No. 24), 2007. [12] S. Jin and L. Wang, “Content and service replication strategies in multihop wireless mesh networks,” in MSWiM ’05: Proceedings of the 8th ACM international symposium on Modeling, analysis and simulation of wireless and mobile systems, Montr´eal, Quebec, Canada, Oct. 2005, pp. 79–86. [13] J. Chu, K. Labonte, and B. N. Levine, “Availability and popularity measurements of peer-to-peer file systems,” in Proceedings of SPIE, Boston, MA, USA, Jul. 2002. [14] T. Yamakami, “A Zipf-like distribution of popularity and hits in the mobile web pages with short life time,” in Proc. of Parallel and Distributed Computing, Applications and Technologies, PDCAT ’06, Taipei, ROC, Dec. 2006, pp. 240–243. [15] L. Breslau, P. Cue, P. Cao, L. Fan, G. Phillips, and S. Shenker, “Web caching and Zipf-like distributions: Evidence and implications,” in Proc. of INFOCOM, New York, NY, USA, Mar. 1999, pp. 126–134. [16] C. R. Cunha, A. Bestavros, and M. E. Crovella, “Characteristics of WWW Client-based Traces,” in View on NCSTRL, Boston University, MA, USA, Jul. 1995.
C
o(1)
o(1)
τ =1
1 < τ < 32
M +1 empty
C
M
r
l
M ÷K ÷N
C
M
r
l
∼
3
3
≤K
lim
3−2τ 2τ K 2τ 3 −1 3 M 2τ
M
1−2τ 3
→1
3−2τ 3
lim
(b) The Case of τ =
3 . 2
∼M almost empty
3−2τ KN M ∼ 3 KN
M +1 empty
lim
>K M <
lim 1−2τ 3
q M Θ K √ M Θ √K log M 3 −τ Θ M√2K
M
TABLE II 3 . 2
∼
√ M K
(c) The Case of τ >
3 . 2 3
o
√1 K
∼K
Θ
√1 K
empty
empty
2τ−3 K 2τ
M +1
∼
M +1
∼ αK
almost empty
∼M
2τ 3−2τ
N
Θ
∼
lim
→1
lim
√ N τ− 1 (KN −M )
(KN −M ) non-empty √ Θ N
lim
N τ−1 (KN−M )τ−1
2τ
Θ
non-empty
∼
3
N KN−M
(KN −M ) 2τ
→1 3 2τ
q
2τ−3
≥ KN −M
lim
N and 2τ N 2τ−3
3 2τ −3 KN 2τ 2τ
2τ > 2τ− N 3 K− M N
lim
lim
non-empty q 3 Θ log 2 r KNN−M
r ln r ∼ KN −M
KN−M ln N
→1
KN−M N ln N
M >
∼
∼
2τ−3 KN−M 2τ 1− 3 N 2τ
2τ−3 2τ
KN −M ∼
KN and
Θ
√ N
≥ KN −M = ω(1) KN −M = O(1)
3−2τ 3 lim
M ln M > KN and
Θ
N log M
√
∼
3 2τ
3−2τ 2τ
2τ N 3−2τ
Θ
3 2τ
lim
M>
KN −M > N ln N N ln N ≥ KN −M
3−2τ KN−M 3 2τ N 2τ
almost empty
∼M
3 2τ −3 3 K ∼ M K ∼ δα,τ M K = o(M ) M lim KN 2τ M ∼ 2τ KN 2τ < 2τ− 2τ
Θ
3 log 2
→1
empty
K ln M
empty o √1K Θ √1K
∼ M +1
∼ αK M +1
∼K
lim
KN −M >
(a) The Cases of τ < 1, τ = 1 and 1 < τ <
lim K ∼ M K ∼ γα M ln M lim < K ln M ≥ K M ln M < KN M ln M ∼ KN
∼ αK
M ÷K ÷N
o(1)
∼K
K ∼ M K ∼ βα,τ M
τ <1
r M
l
M ÷K ÷N
2τ
2τ
1− 3 2τ M M −r+1 −(l−1)1− 3 K −l+1− ∼l3 . (18) N 1 − 2τ 3 lim
M 1−
2τ 3
2τ 3
2τ
−(l−1)1− 3 . 1 − 2τ 3 (19)
In the asymptotic derivations that follow, we will use the underbracket notation to denote the significant quantities that carry on to the next steps, e.g. x +y signifies that only x will carry on, as y = o(x). Proof of Lemma 1: To compute l, we use (13); to find the conditions for M ≈ ∅ ⇔ M − r = o(M ) ⇔ r ∼ M , we use (14) when l = o(K), or (17) when l = Θ(K). If M = ∅, then (14) must be true for r = M + 1. Case τ < 32 : Using (6) and r ∼ M in (13) and (14), it follows that
(K −l+1)N −(M −r)−1 ≥ M
M = ∅. However, as we assumed l = o(K), we have to add the constraint of K = o(M ) or equivalently M = ω(K). These complete the first two subcases of the conditions for almost empty M . Case τ = 32 : As before, (6) and M − r = o(M ) in (13) and (14) lead to
A PPENDIX
M M −r+1 ∼ l ln (20) N l−1 lim M (K −l+1)N − (M −r)−1 ≥ M ln (21) l−1 Assuming l ∼ K, (20) becomes M <1 M o(K) +1 − o ∼ K (ln M − ln K) KN ⇒ N ln M ∼ ln K + o(1) ⇒ M ∼ K, K −l+1−
which corresponds to the third subcase for l for α = 1. Assuming M 6= ∅ as before, we can use (17) to get r ∼ KN , which is a contradiction, provided that N = ω(1). Thus, M = ∅. This completes the third subcase M . Assuming l ∼ αK, with α ∈ (0, 1), (20) becomes M M (1 − α)K +1 + o(K) − o ∼ αK ln ⇒ N αK
Assuming l ∼ K, and using M − r = o(M ), (18) leads to
2τ 3
where in the last step we used 2τ
M K
2τ 3
N
= K 1−
2τ 3
M KN
<
1−α α
,
using (2) which implies that o = 0, which completes the third subcase for l. Using (21) and substituting l and K with M , we get lim
that e α N ≥ 1, which is a strict inequality (provided N = ω(1)), and thus the third subcase for M is complete. For the other two subcases of l and M , it is l = o(K) = o(M ), and given that K = O(M ) from (1), (20) leads to
2τ
⇒
2τ 3
M KN
α−1
1− 2τ 3
−(αK)
2τ o(M ) ∼ (αK) 3 N 1− 3 3−2τ 3 − 2τ (1 − α) K, 3
(1 − α)K +1 − M ∼ α− 3−2τ
2τ 3
∼ αKe
M
(2)
K 1− 3 . This is part of the third subcase of l for α = 1. Assuming l ∼ αK, with α ∈ (0, 1), results from (18) to M 1−
M 1−
2τ
2τ
1− 3 2τ M −K 1− 3 o(M ) ∼K 3 ⇒ N 1 − 2τ 3 M 1− 2τ 1− 2τ 3 3 ⇒ M ∼ K, ∼K +o K −o 2τ K3N
o(K) + 1 −
M −r+1 K ∼ l (ln M − ln l) ⇒l ∼ . N ln M The above (second subcase for l) is true, provided that K −l + 1 −
which completes the third subcase of l. Note that α ∈ (0, 1] lim
covers all cases of M ≥ K = Θ(M ). Examining M in the two above cases of l = Θ(K) = not strictly empty), Θ(M ), and assuming r ≤ M (i.e. M is
3
lim
This is correct (second subcase of l) as long as l > 1, 2τ3 3 lim 3 i.e. if 1 − 2τ K 2τ ≥ M 2τ −1 . Otherwise, (13) was not 3 applicable, thus l → 1. Last, for the last two cases of l = o(K), (19) assures M ≈ lim ∅ when M ≤ 1 − 2τ 3 KN ; if the inequality is strict, then
lim
Last, (21) leads to KN ≥ M ln M so as to have M ≈ ∅. Case τ > 32 : (18) and (19) are applicable in this case, too, and they can be rewritten as h i 2τ3 −1 2τ 1 1 3 −1 − M l−1 2τ M −r+1 K −l+1− ∼l3 . (22) 2τ N 3 −1 h i 2τ3 −1 2τ 1 1 3 −1 − M lim l−1 2τ (K −l+1)N −(M −r)−1 ≥ M 3 . 2τ 3 −1 (23)
we can use (17) to get r = Θ KN 2τ , which is ω(M ) (provided that N = ω(1)). This is a contradiction as r ≤ M + 1. Thus, r = M + 1, and M = ∅. This completes the third subcase of the conditions for almost empty M . Turning the attention to the other two subcases, that is, when K = o(M ), it must be l = o(K) since l = Θ(K) leads to the above case. Evidently l = o(M ) and (18) becomes 3 3 1− 2τ 3 2τ M o(M ) 2τ 2τ K 2τ K− ∼l3 ⇒ l ∼ 1 − . 3 N 3 1 − 2τ M 2τ −1 3
lim
ln M < K = o(M ), otherwise l → 1 (first subcase) as (13) is not applicable.
Assuming l ∼ K, the same derivation as in τ < 3/2 leads to K ∼ M . Repeating, moreover, the derivation for l ∼ αK as in τ < 3/2, we have to take care ensuring that 3 − 2τ (1 − α) > 0, or equivalently, α > 2τ2τ−3 , which is the second subcase of l. Note, that in the previous cases this constraint was automatically satisfied.
Repeating the analysis on M as in the two above cases of l = Θ(K) = Θ(M ), and assuming M is non-empty, 3 leads from (17) to r ∼ KN 2τ , which is ω(M ) (provided that N = ω(1)). This is a contradiction, thus M = ∅. This completes the third subcase of the conditions on M . 2τ −3 Note that the range of α ∈ 2τ , 1 covers all the cases of K = Θ(M ). Hence, the last case to consider is K = o(M ): then, l = O(K) = o(M ), thus (18) leads to
Case τ > becomes
K− l+1−
2τ
1− 3 −M 1− 2τ (l − 1) o(M ) K −l− ∼l3 2τ N 3 −1
2τ 3
⇒l∼
2τ − 3 K, 2τ
which is the first subcase of l. Last, with this l, (23) leads to 1− 2τ3 2τ − 3 2τ −M 1− 3 K 2τ lim 2τ 3 KN −(M − r) − 1 ≥ M 3 2τ 2τ 3 −1 lim 2τ − 3 3 ⇒M ≤ KN 2τ , 2τ
which assures M ≈ ∅; if the inequality is strict, then M = ∅. Assuming l = o(K) leads through (13) to l = ( 2τ 3 − 1)K which is a contradiction, thus it is always l = Θ(K).
2τ
Then, invoking back (17), r ∼
3−2τ 2τ
(KN − M ). Clearly, the lim
condition for these to hold true will be l > 1, or equivalently lim
3
2τ KN − M > 3−2τ N 2τ . Otherwise, we conclude l → 1 and calculate the scaling of r from (16): 2τ 2τ 3 KN −M +r−1 ∼ r 3 r1− 3 −1 ⇒ 3−2τ 3 − 2τ r∼ (KN − M ). 2τ
and (17) to r ∼ lim
KN −M ln N ;
for these to be true, it has to be
KN − M > N ln N . Otherwise, l → 1, and r is computed from (16) as r ln r ∼ KN − M .
m
Proof of Theorem 4:
First, we will study the case of
lim
K < M . From Lemma 1, we have K − l + 1 = Θ(K) (this holds for all cases except the case α = 1 which is covered separately at the end) and r = Θ(M ) which yields q √ K − l + 1 − M −r+1 = Θ( K). Thus, the scaling of C N 3
[H 2τ (M )−H 2τ (K)] 2
3 3 is determined from the ratio Hτ (M ) C , we calculate the following term:
. In assessing
(7)
Hτ (M ) − Hτ (r − 1) = Θ(M −τ (M − r)) = o(M 1−τ ), √ 1−τ which combined with (10) yields C = o N HMτ (M ) . This is used on the cases τ < 3/2. Next, we examine each case separately and show that it is always C = O(C ), which implies that C alone determines the asymptotic law. Case τ < 1: First, we study the case where l ∼ αK, where α ∈ (0, 1). ! 3 2τ 3 2τ [H 2τ3 (M ) − H 2τ3 (l)] 2 [M 1− 3 − l1− 3 ] 2 √ √ =Θ KHτ (M ) KM 1−τ r " r ! 1− 2τ3 # 32 M l M =Θ = Θ 1− , K M K since lim Ml < lim Kl = α < 1. If l is given from any of the other two cases of Lemma 1, 3 [H 2τ (M )] 2 √ it is l = o(M ) and H3τ (M ) → M , which from (9), leads q M again to C = Θ K . From Lemma 1, if M = ∅, then C = 0. Else, 3−2τ 3 KN ∼ √ q M M and C = o N =o = o C . K Case τ = 1. The first case of l ∼ αKis covered from the above derivation, except Hτ (M ) in the denominator, which changes from M 1−τ to log M . Thus, ! √ 3 ! [H 2τ3 (M ) − H 2τ3 (l)] 2 M √ C = Θ =Θ √ . KHτ (M ) K log M
M −lN + 1 lN KN − M ∼ l ln ⇒l∼ , N l N ln N
Proof of Lemma 3: Since dm ≥ N1 , we have h i √ √ P P C , m∈M √d1 − 1 pm < m∈M pm N = N .
K −l + 1 −
Otherwise, l → 1, and from (16), 2τ 2τ 3 KN −M +r−1 ∼ r 3 1−r1− 3 ⇒ 2τ −3 3 3 2τ − 3 2τ ⇒r∼ (KN − M ) 2τ . 2τ
lim
Case τ = 32 : Assuming l > 1, (13) via (6) leads similarly to
lim
provided that KN − M >
3
M −lN 2τ +1 3 l 3 1−2τ 3 −1 1−2τ ∼ l 3 N 2τ −l 3 ⇒ K−l+1− N 3−2τ 3 − 2τ KN − M l∼ 3 2τ N 2τ
2τ −3 KN −M 3 , 2τ N 1− 2τ
lim
Case τ < 32 : Assuming that l > 1, we can invoke (17) to substitute r in (13), and get with the help of (6)
2τ 3 M −lN 2τ +1 3 l 3 1−2τ 1−2τ 3 −1 l 3 −l 3 N 2τ ⇒ ∼ N 2τ −3 2τ − 3 M l∼ K− 2τ N
and thus, r ∼ 2τ 2τ −3 N .
lim
Assuming l > 1, (13) through (6) and (17)
Proof of Lemma 2: First, assume KN − M = O(1); as KN − M are the total cache slots available for replication (beyond the primary copy), r = Θ(1), and thus l = Θ(1), 3 too. If l > 1, then (17) becomes r = Θ(N 2τ ) which is a contradiction. Thus l = 1. Using (16), we compute r. Next, we proceed to M 6= ∅ with KN − M = ω(1), and identify the following cases:
3 2:
If l is given by any of the other two cases, l = o(M ) 3 [H 2τ (M )] 2 3
√
→ logM M , which from (9), leads to C = √Hτ (M ) M √ Θ( K log M ). As before, if M = ∅, C = 0. Else, √ √ N 3−2τ √ M KN ∼ M and C = o = o C . = o 3 log M K log M 3 Case 1 < τ < 2 : Similarly, if l ∼ αK with α ∈ (0, 1), we and
use the above derivations to find 3
C = Θ
[H 2τ3 (M ) − H 2τ3 (l)] 2 √ KHτ (M )
!
3
=Θ
M 2 −τ √ K
! ,
and β = 1 − 2τ 3 . Then, replacing M − l from the above in C , 23 − 2τ 3 (M − l)M C = Θ √ =Θ K − l Hτ (M ) M −l =Θ = o(1), M
3
(M − l) 2 M −τ √ K − l M 1−τ
!
as l ∼ M implies that M − l = o(M ). Thus, C = o(1). Proof of Theorem 5: When M 6= ∅, for all τ , it is l = o(M ); 3 3 indeed, if l 6→ 1, it is l = rN 2τ < M N 2τ = o(M ); otherwise, if l → 1, it is l = o(M q ), too. Moreover, for all τ , it is M − r = Θ(M ), thus K − l + 1 − M −r+1 = N q K−M Θ N .
If l is given by any of the other two cases, l = o(M ) 3
[H 2τ (M )] 2
3
and H3τ (M ) → M 2 −τ , which from (9), leads to C = 3 −τ Θ M√2K . Similarly as above, if C 6= 0 then 3−2τ 3 KN ∼ 3 √ −τ M and C = o N M 1−τ = o M√2K = o C .
3
K− N log M
Next we discern two cases for the relation between r and M that give different laws for C : • if M ∼ KN , Lemma 2 yields r = o(M ). Thus, C = √ √ √ τ (M ) Θ NH N . In total, C = Θ( N ). Hτ (M ) = Θ
3
[H 2τ (M )] 2
lim
•
l = o(M ) and thus, H3τ (M ) → log 2 M , which from (9), 3 2 leads to C = Θ log√KM . Similarly as before, if C 6= 0, 1 √ 2 N then KN ∼ M log M and C = o √M = o log√KM = 3 2 o log√KM = o(C ).
In the above case of l = Θ(K), it is K = Θ(M ), and thus C = 0 (as M = ∅). If l is given from the two other cases of Lemma 1, it is
Case τ = 32 : First, let l ∼ αK with α ∈ (0, 1); then, ! 3 3 [H 2τ3 (M ) − H 2τ3 (l)] 2 [ln M − ln l] 2 √ √ =Θ KHτ (M ) K ! α−1 3 [ln M − ln e α M ] 2 1−α 1 √ √ √ =Θ =Θ =Θ . K α K K
lim
Case τ < 1: As M − r =√ Θ(M ), it has to√ be r < M , τ (M ) then from (6), it is C = N H Hτ (M ) = Θ( N ). In total, √ C = Θ( N ). Case τ = 1: First, note that from Lemma 1 we conclude lim that for M 6= ∅ we must have M > 13 KN . Also, from Lemma 2, it is r ∼ 21 (KN ). Then, using l = o(r), it is −M √ √ r C = Θ √ M = Θ logNM .
K− N
lim
Moreover, Lemma 2 implies that . Thus, √ r < M √ N = Θ N [Hτ (M ) − Hτ (r)] = Θ rτ −1 = C √ √ N N Θ (KN −M )τ −1 . In total, C = Θ (KN −M )τ −1 . 3 log 2 r 3 √ Case τ = 2 : As l = o(r), it is C = Θ . Moreover, K− M N √ h i q N C =Θ N √1r − √1M =O r . Now, we examine the asymptotic law of r from Lemma 2:
lim
lim
•
the above cases of K < M set an upper bound on M , i.e. moving to a regime of larger caches K ∼ M cannot increase the link rate C. In all the above cases for τ ≥ 1, if we set K ∼ M , we get that C = o(1). Thus, it remains to study what happens on τ < 1. From Lemma 1, it is M = ∅, thus C = 0. Then, (18) can be rewritten as " 2τ # 1− 3 M M l K −l+1 ∼ −1 ≈ l −1 = M − l. l l 1− 2τ 3
lim
Since it is 13 KN < M for M 6≈ ∅, there is no other case. 3−2τ − M ). Case 1 < τ < 32 : As before, 3 it isr ∼ 2τ (KN √ −τ N First note that C = Θ √r 2 M = Θ (KN −M . τ −1 )
Case τ > 32 : Now H 2τ3 (M ), H 2τ3 (l), Hτ (M ) are all constants, √ which from (9), leads to C = Θ 1/ K . Similarly as above, √ 3 if C 6= 0, then 2τ2τ−3 KN 2τ ∼ M and C = o N M 1−τ = 1 o = o √1K = o(C ). τ 2τ −1 K3M 3 Now, we consider the case of K ∼ M . It is easy to see that
if KN − M ≤ N lnNq then ∼ KN − M . r ln r q N log r N In this case, C = O = O = r KN −M 1 3 2 2 O √log rM = o C , thus C = Θ √log rM . K− N
K− N
lim
•
−M if KN −M > N ln Nthen r ∼ KN ln N . In this case, it is q 1 log 2 N N C =O =O √ . Observe now that the M r
where the approximation comes from the Taylor expansion of function xβ − 1 ≈ β(x − 1) for x = Ml → 1 (as l ∼ K ∼ M ),
if M < KN, then r ∼ βM with √0 < β ≤ 1/3, and √ log M N r thus, C = Θ N log M = Θ log M . In total, C = √ N Θ log M .
K− N
lim
lim
condition of KN − M ≥ N ln N implies that r ≥ N , 3
thus C = Θ
√log 2
r K− M N
.
Case τ > 23 : We treat the two cases of Lemma 2 separately: lim
•
2τ if KN − M ≤ 2τ−3 N , then l → 1, r √ 2τ−3 2τ3 , thus, C = Θ √KNN−M . 2τ (KN −M )
From Lemma 1, for M 2τ −3 2τ KN
lim
lim
6= ∅, it has to be M > 3
lim
. Thus, r ≤ (KN ) 2τ < M , with the last −3 case, step coming from 2τ 2τ < 1. As inthe previous √ √ 3 √ τ > 2 N C = Θ rτ−N1 = Θ = o √KNN−M . 3− 3 √(KN−M ) 2 2τ N √ In total, C = Θ . KN −M lim 2τ if KN − M > 2τ−3 N, then l = Θ K − M and N 3 −1 2τ r = Θ (KN −M )N . Thus, N τ−1 . Moreover, C = Θ τ−3 √1 M = Θ (KN−M τ−1 ) l 2 K− N √ √ 1− 3 (τ−1) N N N ( 2τ ) C = Θ rτ−1 = Θ = (KN−M )τ−1 3 τ> 2 1 N τ −1 = o C . Θ (KN−M 3 )τ−1 N 1−2τ N τ−1 In total, C = Θ (KN−M )τ−1 .
•
3 2τ
∼