Using Spikes to Deal with Elephants

Viewer
Transcript

Using Spikes to Deal with Elephants Dinil Mon Divakaran School of Computing and Electrical Engineering, IIT Mandi E-mail: [email protected]

Abstract Among the various strategies proposed for reducing or eliminating bias against small flows (in the presence of large flows), most require to identify and distinguish between small and large flows, besides having to track the ongoing sizes of all flows. Though these solutions do improve the response times of small flows (with negligible affect on the response times of large flows), they are not scalable with increasing traffic. In this context, we propose a new spike-detecting AQM that exploits TCP property in detecting large ‘spikes’, and hence large flows, from which packets are dropped, and importantly, only at times of congestion. We discuss two such AQM policies using spikedetection for improving the performance of small flows: one that drops packets deterministically, and other that drops packets randomly. We show, using simulations, by comparing a number of metrics, that these new policies, in particular the one that drops packets randomly, out-performs not only the traditional drop-tail buffer with FCFS server, but also the RED policy as well as a size-based scheduler (proposed specifically for improving the response time of small flows). The improvement in performance becomes more revealing in scenario where the router buffer is small (less than one-tenth of the bandwidth-delay-product).

1

Introduction

Internet flow-size distribution exhibits strong heavy-tail behaviour. This means that a small percentage of flows contribute to a large percentage of the Internet’s traffic volume [22]. It is also known as the 80-20 rule, as 20% of the flows carry 80% of the bytes. Since then, it has become customary to call the large number of small flows as mice flows (or just mice), and the small number of large flows as elephant flows (or just elephants). Examples of mice flows include tweets, chats, web queries, HTTP requests, etc., for which users expect very short response times1 ; while elephant flows are usually the downloads that run in the background (say, a kernel image or a movie, involving mega-bytes or more of 1 We

often use ‘completion time’ to refer to ‘response time’.

data), the response times for which are expected to be higher than that for the mice flows by orders of magnitude. The current Internet architecture has an FCFS server and a drop-tail buffer at most of its nodes. This along with the fact that most of the flows in the Internet are carried by TCP [13], hurt the response times of mice flows adversely. Specifically, some of the important reasons for the biases against small TCP flows are the following: • As mice flows do not have much data, they almost always complete in the slow-start phase, never reaching the congestion-avoidance phase; and thus typically having a small throughput. • A packet loss to a small flow most often results in a time-out due to the small congestion-window (cwnd) size; and time-outs increase the completion time of small flow manifold. On the other hand, an elephant flow is most probably in the congestion-avoidance phase, and therefore, packet losses are usually detected using duplicate ACKs. • The increase in round-trip-time (RTT) due to large queueing delays hurts the small flows more than the large flows. Again, for the large flows, the large cwnd makes up for the increase in RTT; whereas, this is not the case for small flows and is quite perceivable. The biases against small flows become more relevant today — recent studies show an increase in the mice-elephant phenomenon, with a stronger shift towards a 90-10 rule [5]. Most solutions to this problem can be grouped into a class of priority-scheduling mechanisms that schedule packets based on the ongoing sizes of the flows they belong to. The priority-scheduling mechanisms, which we hereafter refer as size-based schedulers, give priority to ‘potential’ small flows over large flows, thereby improving the response time of small flows. The different size-based scheduling mechanisms need to identify flows and distinguish between small and large flows. These mechanisms have multiple queues with different priorities, and use the information of the ongoing flow-sizes to decide on where to queue an incoming packet. We observe that most of the

works dealing with giving preferential treatment based on the size (or age) assume that the router keeps per-flow size information in a flow-table. This assumption (of tracking per-flow size) is challenged by the scalability factor, since tracking flow-size information requires flow-table update for every arriving packet. Given that the action involves lookup, memory access and update, this will require fast access as well as high power. Besides, all the flows need to be tracked, and as the number of flows in progress can grow to a large value under a high load, this can also become an overhead. Hence, most existing solutions face a roadblock when it comes to implementation. The spike-detecting AQM proposed here is inspired from the TLPS/SD (two-level-processor-sharing with spikedetection) system proposed in [6]; where, a large flow is served in a high-priority queue, until it is detected as ‘large’ when its cwnd is large enough (> 2η ) to ‘cause’ congestion (buffer length > β) in the link (for pre-determined values of η and β). Such detected large flows are de-prioritized by serving them in a low-priority queue thereafter. In this paper, we present spike-detecting AQM, or SDAQM in short. The major difference between SD-AQM and the existing works that improve the response times of small flows (in comparison to the drop-tail buffer with FCFS scheduler), is that, SD-AQM does not need to identify small and large flows. Second, it does not need to track sizes of flows. Third, it uses a single queue, removing the need to have two or more virtual queues. Based on the core idea of detecting spikes, we present two policies: (i) SD-AQM that drops packets deterministically, and (ii) SDIAQM that drops packets randomly. We show using simulations that with all the properties mentioned above, the performance attained by small flows under the proposed new policies is better than that attained under drop-tail buffer with FCFS scheduler. In fact, we compare the performance of SD-AQM not only with drop-tail, but also with RED, and PS+PS scheduler — a size-based scheduling strategy developed specifically to improve response times of small flows. We perform studies using various metrics and find that the spike-detecting AQM policies, specifically the SDI-AQM policy, outperforms RED and PS+PS policies in improving the performance of small and even medium size flows in the network. We also show that the performance is not affected in the scenario where the router buffer-size is small. The next section discusses the previous works on mitigating biases against small flows. In Section 3, we discuss two spike-detecting AQM policies for improving the performance of small flows. In Section 4, we give the goals, settings, and scenarios of the simulations. We then evaluate the performance of our proposed policies in Section 5, before concluding in Section 6.

2

Related works: Mitigating the bias

A general approach to solve the problems due the biases, and thereby improve the performance — most often, the completion time (among other metrics) — of small flows, is to prioritize small flows. Priority can be given in either or both of the two known dimensions: space and time. While scheduling algorithms can give priority in time, buffer management policies (and even routing, which we do not discuss here) can give priority in space.

2.1

Prioritization in space

Active queue management (AQM) schemes are used to send congestion signals from intermediate router to the endhosts. Once such AQM policy is random early detection, or RED [9]. Introduced as a mechanism to de-synchronize concurrent TCP flow for better link utilization, RED uses an (exponentially-averaged) queue-length to mark or drop packets (once the queue-length crosses a minimum threshold, min thresh). Simply put, the marking probability increases with the average queue length. Though, as per our knowledge, there has been no study showing how small flows would perform under RED, RED-based approaches were proposed to control large bandwidth consuming flows, for example [19]. Authors of [11] considered prioritizing small flows by isolating packets of small flows inside the bottleneck queue which implement RIO (RED with In and Out) [4]. By configuring different parameters, RIO can be set to drop packets of large flows at a much higher rate than packets of small flows. This requires assigning different drop functions to small and large flows. To facilitate this, an architecture was proposed where the edge routers mark packets as belonging to small or large flow, using a threshold-based classification.

2.2

Prioritization in time: Scheduling

Size-based scheduling strategies can be classified into two, based on whether the strategies have knowledge of the flow size (flow service demand) in advance or not. 2.2.1

Anticipating strategies

Anticipating strategies assume knowledge of job2 size on its arrival to the system. One such policy is the shortestremaining-processing-time (SRPT) policy, which always serves the job in the system that needs the shortest remaining service time. SRPT is known to be optimal among all policies with respect to the mean response time [18]. The improvement of the response time brought by SRPT, over processor-sharing discipline (or PS – an approximation of bandwidth-sharing in the Internet at the flow-level under 2 A job can be a process in an operating system, a file stored on a machine, or a flow in the Internet.

some assumptions [10]), becomes striking with the variability in the service time distribution. Therefore, it finds use in Internet scenarios, where the file-size distribution is known to have high variability [12]. The disadvantage of the policy comes from the need to anticipate the job size. While this information is available in web servers, routers do not have knowledge of the size of a newly arriving flow. 2.2.2

Non-anticipating strategies

Non-anticipating policies instead use the ongoing size, or age, of a flow for taking scheduling decisions. The use of the age of flows is particularly interesting in scenarios where the flows that have been served for a long time, are likely much larger, and thus have larger remaining size. This brings in the notion of hazard rate. If F denotes the cumulative distribution of flow sizes, then the hazard F 0 (x) rate, h(x) = 1−F (x) . Distributions have decreasing hazard rate (DHR) if h(x) is non-increasing for all x ≥ 0. If flow-size distribution comes from the class of DHR distributions, then, intuitively, it means that the flows with larger ongoing size have smaller hazard rate, and are thus less likely to complete. As many heavy-tailed distributions, like Pareto distribution, fall under the DHR class, non-anticipating strategies have been a focus of interesting research in the area of flow-scheduling. We brief some important non-anticipating scheduling strategies below. FB or LAS scheduling: Foreground-background policy serves the flow that has the minimum ongoing size among all the competing flows. FB policy is shown to be optimal with respect to the mean response time for DHR distributions [21]. This scheduling policy has been studied for flows at a bottleneck queue under the name LAS (leastattained-service) [17], where the flow to be served next is the one with the least attained service yet. The policy not only decreases delay and loss rate of small flows compared to a FCFS packet scheduler with drop-tail buffer, but causes negligible increase in delay for large flows. In a TCP/IP network, implementation of LAS requires the knowledge of the running number of the packets of each flow, so as to find the youngest ongoing flow. This, along with the other drawbacks, such as unfairness and scalability issue, have motivated researchers to explore other means of giving priority to small flows [3]. PS+PS scheduling: The PS+PS scheduling [3], as the name indicates, uses two processor-sharing queues, with priority between them. The first θ packets of every flow are served in the high-priority queue, say Q1 , and the remaining packets (if any) are served in the low-priority queue, say Q2 . Hence all flows of size less than or equal to θ get served in the high-priority queue. The service discipline is such that, Q2 is served only when Q1 is empty. Observe

that, for all flows with size x > θ, the first θ packets are also served in Q1 . It is proved that the PS+PS model reduces the mean overall response time (E[T ]) in comparison to PS, for the DHR class of distributions. In addition, the authors (of [3]) proposed an implementation of this model; but it relies on TCP sequence numbers, requiring them to start from a set of possible initial numbers. This not only makes the scheme TCP-dependent, but also reduces the randomness of initial sequence numbers that TCP flows can have. MLPS discipline: The PS+PS model can be seen as a specific case of the multi-level-processor-sharing (MLPS) discipline [14]. In the context of prioritizing small flows, [1] demonstrates that the mean delay of a two-level MLPS can be close to that of FB in the case of Pareto and hyperexponential distributions, belonging to the DHR class. Sampling and scheduling: An intuitive way to track large flows is to use (real-time) sampling to detect large flows (thus classifying them), and use this information to perform size-based scheduling. Since the requirement here is only to differentiate between small and large flows, the sampling strategy need not necessarily track the exact flow size. A simple way to achieve this is to probabilistically sample every arriving packet, and store the information of sampled flows along with the sampled packets of each flow. SIFT, proposed in [16], uses such a sampling scheme along with the PS+PS scheduler. A flow is ‘small’ as long as it is not sampled. All such undetected flows go to the higher priority queue until they are sampled. The authors analyzed the system using the ‘average delay’ (average of the delay of all small flows, and all large flows) for varying load, as a performance metric. Though it is an important metric, it does not reveal the worst-case behaviour in the presence of sampling. This is more important here, as the sampling strategy has a disadvantage: there can be false positives; i.e., small flows if sampled will be sent to the lower priority queue. Deviating from this simple strategy, [7] proposed to use a threshold-based sampling, derived from the famous ‘Sample and Hold’ strategy [8], along with PS+PS scheduling. In this policy, the size of a sampled flow is tracked until it crosses a threshold. This threshold can be the same as used in PS+PS scheduling to ensure that there are no false positives, but only false negatives. TLPS/SD: Another method of prioritizing small flows uses two-level-processor-sharing scheduling along with spike-detection, where packets are assumed to arrive as ‘spikes’ [6]. With TCP (as explained below), large spikes belong to large flows. Hence, to detect large flows, it is only required to detect large spikes, and that too, only at times of congestion. Detected large flows are sent to the low-priority queue, Q2 until completion, and other flows would continue to be served in the high-priority queue Q1 .

3

Spike-detecting AQM

The spike-detecting AQM we propose functions on the buffer of an outgoing link at a router. We refer to cwnd, the congestion-window of a TCP flow, as ‘spike’. A TCP flow in phase with a spike-size of 2η , has at least Pηthe slow-start i η+1 − 1 packets (assuming an initial window i=0 2 = 2 of size one packet). If the flow was in congestion-avoidance phase, the size will be larger than 2η+1 − 1 packets depending on when the flow switched from from slow-start phase to congestion-avoidance phase. The basic idea is to detect spikes of large sizes during times of congestion, and drop an arriving packet if it belongs to a ‘large’ spike. We define a large flow as a flow that has a spike greater than 2η packets. That is, any flow with size greater than or equal to 2η+1 packets is a large flow. Note that this definition of elephant flows is similar to that found in literature; i.e., flows with sizes beyond a pre-defined threshold are elephant flows. With the above definition, a spike is large if its size is greater than 2η packets. Since large spikes belong to large flows, this strategy will drop packets from large flows at times of congestion. The next question is, how to quantify congestion of a link. For this, as in [6], we observe the length of the buffer. Whenever the buffer-length exceeds β packets, we assume the link is congested, and the spike-detection mechanism is triggered. The spike-detecting mechanism classifies packets in the buffer as belonging to different spikes, and then finds the size of all spikes. If an arriving packet belonging to a large spike finds the buffer-length greater than β, it is dropped, or else it is queued. We assume that β < M , where M is the size of the buffer. Assumptions: We assume a TCP sender sends an entire cwnd in one go (thus forming a spike at a buffer). Since each spike is essentially a cwnd of a TCP flow, it can be identified using the common five tuple (source and destination IP addresses, source and destination ports, and protocol) used to identify flows. Observer that, as a TCP flow sends only one cwnd of packets in one round (RTT), no two spikes at the same time can belong to two different flows. On parameters η and β, the values are such that, 0 < 2η < β. Algorithm 1 lists the function for enqueueing an incoming packet at the AQM buffer (dequeueing is from the head, like in a FIFO, and hence not listed). The function enquefifo enqueues a packet at the end of FIFO buffer only if there is space for the packet, or else it is dropped. The variable Q denotes the physical buffer, and P the incoming packet at the router. We refer to this AQM policy as SD in short (to differentiate it from the improvement discussed below). Observe that, in the above algorithm, whenever an arriving packet belonging to a large spike finds the buffer-length greater an β, it is dropped. As we assume that packets arrive

Algorithm 1 Function: Enqueue(Packet P) 1: if (size(P ) + size(Q)) > β then 2: S ← set of identified spikes 3: find size of each spike s ∈ S 4: for s ∈ S do 5: if (size(s) > 2η ) then 6: large[s] ← 1 7: end if 8: if large[spike(P )] == 1 then 9: drop(P) 10: else 11: enque-fifo(P) 12: end if 13: end for 14: else 15: enque-fifo(P) 16: end if

in spikes, it might happen that a burst of packets belonging to a single spike (or even an entire spike) gets dropped. Such dropping of burst not only makes the network inefficient (as the packets in the dropped burst have to be resent) but may also lead to timeouts; whereas our requirement is only to slow down a large flow temporarily, by informing the TCP sender. This slowing-down can be achieved by dropping just one packet from a flow, as the sending TCP would cut down its congestion-window by half as soon as it receives three duplicated ACKs from the receiver (conveying it has not received a packet in between). For reducing dropping of bursts of packets, and thereby the number of packet-losses, a simple strategy would be to drop packets probabilistically. We do the following: for each spike that is large, compute the drop probability for a packet of the spike s as, p(s) = min

size(s) , 1.0 η+φ 2 +1

(1)

where φ ≥ 0. If φ = 0, then every arriving packet belonging to a large spike will be dropped (at times the link is congested). We call this improved AQM policy as SDI-AQM, or SDI in short. With the above computation of probabilities in place, when an arriving packet belonging to a large spike, say s0 , finds the buffer-length greater than β, the following would be done instead of line number 9 in Algorithm 1: A coin is tossed with probability p(s0 ) for heads. The packet is dropped if the packet gets a heads, else it is enqueued. Observe that, the probability is computed only for a large spike, hence the minimum size of a large spike (from Eq. 1) is 2η + 1 packets.

4

Router

Simulations: Goals and Settings

4.1

src 1

Goals

Router

C1

src 2 C2

The goal of the simulations is mainly to evaluate the performance of the spike-detecting AQM policies, both SD and SDI, and compare them with three other policies:

Cn−1

Bottleneck

src n-1

4.2

Settings

Simulations were performed in NS-2 [15]. A dumbbell topology as seen in Fig. 1 was used throughout. The bottleneck link capacity was set to 1 Gbps, and the capacities of the source nodes were all set to 100 Mbps. The delays on the links were set such that the base RTT (only propagation delays) on an end-to-end path is equal to 100 ms. There were 100 node pairs, with the source nodes generating flows according to a Poisson process. The flow arrival rate is adapted to have a packet loss rate of around 1.0% in the DT scenario (with traditional BDP buffer size, defined as in Scenario 1 below). Note that, using the ratio of sum of source capacities to bottleneck link capacity as load is not meaningful in a closed system. Flow-sizes were taken from a mix of Exponential and Pareto distributions. More precisely, 85% of flows were generated using an Exponential distribution with a mean 20 KB; the remaining 15% are contributed by large flows using Pareto distribution with shape α = 1.1, and mean flow size set to 1 MB. 20, 000 flows were generated during each run, all carried by TCP SACK version. Packet size was kept constant and equal to 1000 B. For post-simulation analysis, we define ‘small flow’ as a flow with size less than or equal to 20 KB, and the remaining as ‘large flows’. Here the flow-size is the size of data generated by the application, not including any header or TCP/IP information. Also note that, a small flow of 20 KB can take more than 25 packets to transfer the data, as it includes control packets (like SYN, FIN etc.) and retransmitted packets.

dst 2

Cn−1 Cn

dst n-1 dst n

Figure 1. Topology used for simulations

4.3

We consider the following metrics for our study: (i) Conditional mean completion-time (CT) of small flows, (ii) Number of time-outs encountered by small and all flows, (iii) Number of times the congestion-windows are reduced (congestion cuts) for small flows and all flows, (iv) Mean CT for range of flow sizes, (v) Mean CT for small flows, large flows and all flows, (vi) Maximum CT of small flows.

C2

Cn src n

3. RED: We study how RED would treat small flows.

dst 1

C

1. DT: A router today usually has a FCFS scheduler serving packets arriving at the drop-tail buffer. To keep it simple, we use ‘DT’ for denoting this system. 2. PS+PS: This policy uses a threshold θ, to differentially serve large and small flows (as discussed in section 2).

C1

Parameters

Spike-detecting AQM policies: For both the SD and SDI policies, we set η to 4 and β to 200. This means that, only a flow of size greater than or equal to 25 −1 = 31 packets can face packet drops, and this can happen only when the bufferlength exceeds 200 packets (under the assumption that the queue rarely gets full to experience a tail-drop). For SDI policy, the value of φ is set to 4. The values of η and β are motivated from [6]. RED: We use the Gentle version, as it is known to be more robust to the settings of various parameters of RED3 . The value for min thresh, the buffer size threshold beyond which the packets start getting dropped is set to 200 packets. PS+PS: The threshold θ used to differentiate between small and large flows in this policy is set to 31 packets.

4.4

Scenarios considered

• Scenario 1: The size of the bottleneck queue was set to the bandwidth-delay product (BDP) for 100 ms base RTT. That is, M = BDP = 12500 packets. • Scenario 2: Motivated by the need to experiment with small buffers in routers [20], here we set the size of the bottleneck queue to 1000 packets, i.e., less than onetenth of the BDP used in Scenario 1. M = 1000.

5

Performance Evaluation

In this section we analyze the performance of the two SD-AQM policies described earlier.

5.1

Scenario 1: M = BDP

As said before, here the bottleneck queue-size, M = 12500 packets. Fig. 2 gives the mean completion-times (CT) of flows with sizes not greater than 200 packets. All policies are seen to give less mean completion times for small and medium size flows than DT. Observe that for small flows (size ≤ 20 KB), PS+PS and the three AQM policies give almost the same mean completion times. Once the threshold θ is crossed, PS+PS approaches DT quickly. 3 Recommendation:

http://www.icir.org/floyd/red/gentle.html

DT PS+PS RED 2 SD SDI

Maximum completion time (in seconds)

Mean completion time (in seconds)

2.5

1.5

1

0.5

DT SD SDI

14 12 10 8 6 4 2

0

0 0

50

100

150

200

0

Flow sizes (in packets of 1000 B)

50

100

150

200

Flow sizes (in packets of 1000 B)

(a) DT, SD and SDI

Mean completion time (in seconds)

10

DT PS+PS RED SD SDI

1

8 Maximum completion time (in seconds)

Figure 2. Conditional mean CT for flow-sizes ≤ 200

7

PS+PS RED SDI

6 5 4 3 2 1 0 0

50

100

150

200

Flow sizes (in packets of 1000 B) 0.1 <20

20-200

200-2000

>2000

(b) PS+PS, RED and SDI

Range of flow sizes (in packets of 1000 B)

Figure 4. Maximum completion time Figure 3. Mean CT for ranges of flow sizes The AQM policies, RED, SD and SDI, are giving much lesser response time for flows with sizes greater than the threshold, with SD and SDI algorithms giving better performance (than RED). The improved performance of RED over PS+PS is due to the fact that in RED flows are punished only at times of congestion (queue-length > β), whereas in PS+PS all flows with sizes (even slightly) greater than θ are sent to the low-priority queue and hence get served only when the high-priority queue is empty. We do not plot the mean completion times of large flows due to space limitation. But in Fig. 3, we plot the mean values for different range of flow sizes. We see that large flows are slightly affected under PS+PS policy. The gains for small and medium size flows under policies other an DT, and in particular under the AQM policies are evident. While PS+PS is showing worst performance for this metric for flows with sizes greater than 200 packets, AQM policies are seen to give the best performance. This happens as flows face less number of time-outs under the AQM policies than under DT and PS+PS (as we will see in Table 1 later). Next we analyze the worst completion time of flows for a given size. Fig. 4 plots this metric for flow-sizes less than or equal to 200 packets. For clarity, two sub-figures are given: the first one, Fig. 4(a), compares DT and the two spike-

detecting AQM policies, and the second one, Fig. 4(b), compares PS+PS, RED and SDI policies. As expected, both SD and SDI perform better than DT. Of the two, SDI gives smaller maximum completion time for small and medium size flows, as SD may drop bursts of packets from large spikes at times of congestion (causing timeouts); whereas SDI drops only probabilistically depending on the size of the large spike to which the packet belongs. The second sub-figure, Fig. 4(b), shows that SDI outperforms not only PS+PS, but also RED. In RED, packets may be dropped (randomly) depending on the congestion level, and hence packets from flows of sizes less than a few hundreds might also be dropped causing the TCP sender to slow down and incur longer completion times. Whereas in SDI, only packets from large spikes are dropped randomly at times of congestion, thus having more affects on flows with larger sizes. Table 1 lists other performance metrics, supporting the arguments given above. For each policy, the first column lists the number of timeouts faced by small flows (size ≤ 20 KB), and the second column gives the number of congestion-window cuts encountered by small flows. A note on the second metric: small CC (standing for ‘Congestion Cuts’) gives the total number of times the small flows reduced their congestion-windows during their lifetimes. The third and fourth columns are for the total num-

ber of timeouts and congestion-window cuts, respectively, faced by all flows. The mean completion times (indicated by CT ) for small, large and all flows, are the remaining three metrics, in order. (PS+PS is denoted as PS in the tables.) sum TO CC 2003 5151 5552 6996 783 8321 1813 5495 745 5923

small CT 0.76 0.38 0.40 0.37 0.38

large CT 1.75 1.47 1.16 1.07 1.05

all CT 1.22 0.88 0.75 0.69 0.69

Table 1. Comparison of TOs and CC. Between DT and PS+PS, though PS+PS brings down the number of time-outs and congestion-cuts of small flows, it does so by inflicting a higher number of time-outs and congestion-cuts to large flows. This happens as PS+PS gives strict priority to the high-priority queue where flows with sizes not greater than the threshold are served. Though RED gives lesser number of time-outs for small flows in comparison to DT, the number is still high. Small flows face neither time-outs nor congestion-cuts under SD and SDI policies, in this scenario. This explains why these policies give the best performance in terms of the mean completion time as well the worst completion time of small flows. Note that large flows under SD faces more number of timeouts than in RED (as bursts of packets might be dropped in the former), whereas SDI policy brings down the total number of timeouts (by randomizing drop instances). Also note that the total number of congestion-cuts under SD and SDI policies are comparable to that under DT, while the count is much higher under PS+PS and RED. This bias against large flows under RED was also observed in [2]. For the remaining metrics (small, large and all CT s) SD and SDI AQM policies are seen to give better results than the rest.

5.2

Mean completion time (in seconds)

DT PS RED SD SDI

small TOs CC 792 1091 234 351 341 891 0 0 0 0

Scenario 2: M = 1000

As previous scenario has shown that SDI performs better than SD policy, we have excluded SD policy while simulating Scenario 2. In this scenario, the bottleneck queue-size is 1000 packets. For both RED and SDI policies the parameters (minimum threshold for RED, and η and β for SDI) were set to the same values as in Scenario 1. Fig. 5 plots the mean completion time for flow-sizes less than or equal to 200 packets. DT gives the highest values and SDI policy gives the lowest values for this metric, while RED and PS+PS are in between. As the flow-size increases RED and PS+PS moves farther away from SDI. The average values of this metric for different flow-size ranges are plotted in Fig. 6 shows that there is negligible effect on large flows. It also shows the improvement in mean completion times attained

1.5

DT PS+PS RED SDI

1

0.5

0 0

50

100

150

200

Flow sizes (in packets of 1000 B)

Figure 5. Conditional mean CT for flow-sizes ≤ 200

Mean completion time (in seconds)

PM

2

DT 10 PS+PS RED SDI

1

0.1 <20

20-200

200-2000

>2000

Range of flow sizes (in packets of 1000 B)

Figure 6. Mean CT for ranges of flow sizes by flows under SDI policy. Figure 7 (with sub-figures 7(a) and 7(b)) gives the maximum completion times. The performance of small and medium size flows are worse in Scenario 2 than in Scenario 1 under RED and PS+PS policies; whereas under SDI policy the performance is relatively same. The values of other metrics given in Table 2 also back this argument. Comparing DT and PS+PS in Table 2, we find that, though PS+PS fares better than DT, the improvement is relatively lesser in this scenario. The interesting point to note with the simulations in Scenario 2 is that, SDI policy performs better even in scenario with small buffers in comparison to other policies, while a policy like PS+PS, developed specifically for improving the performance of small flows shows decrease in performance. PM DT PS RED SDI

small TOs CC 1619 2068 1289 1722 473 1026 41 72

sum TO 4137 3515 1202 739

CC 10028 9794 8834 5928

small CT 0.56 0.43 0.41 0.38

large CT 1.43 1.28 1.20 1.05

Table 2. Comparison of other metrics

all CT 0.96 0.82 0.78 0.69

Maximum completion time (in seconds)

DT SDI

14 12 10 8 6 4 2 0 0

50

100

150

200

Flow sizes (in packets of 1000 B)

(a) DT, SD and SDI Maximum completion time (in seconds)

8 7

PS+PS RED SDI

6 5 4 3 2 1 0 0

50

100

150

200

Flow sizes (in packets of 1000 B)

(b) PS+PS, SD and SDI

Figure 7. Maximum completion time

6

Conclusions

In this work, we proposed and developed SDI-AQM policy for improving the performance of small flows that does not affect the performance of large flows. Different from existing works, this new policy used AQM without needing to track sizes of flows, besides working using a single queue. The policy is seen to outperform the traditional drop-tail buffer, a size-based scheduling policy and RED when compared using a variety of metrics, besides maintaining the performance in small-buffer scenario. A notable disadvantage is that the SD method needs to calculate the size of each active spike whenever the queuelength exceeds β packets. In future, we plan to quantify this ‘cost’, and explore ways of reducing it. Besides, we will study this new approach analytically for better understanding, in particular to find optimal values for η and β.

References [1] S. Aalto and U. Ayesta. Mean delay analysis of multi level processor sharing disciplines. In INFOCOM, Apr. 2006. [2] E. Altman and T. Jim´enez. Simulation analysis of RED with short lived TCP connections. Comput. Netw., 44:631–641, April 2004.

[3] K. Avrachenkov, U. Ayesta, P. Brown, and E. Nyberg. Differentiation Between Short and Long TCP Flows: Predictability of the Response Time. In INFOCOM, volume 2, pages 762–773, 2004. [4] D. Clark and W. Fang. Explicit allocation of best-effort packet delivery service. IEEE/ACM Transactions on Networking, 6(4):362 –373, aug 1998. [5] D. Collange and J.-L. Costeux. Passive estimation of quality of experience. J. UCS, 14(5):625–641, 2008. [6] D. M. Divakaran, E. Altman, and P. V.-B. Primet. Size-based flow-scheduling using spike-detection. In Proc. ASMTA, pages 331–345, 2011. [7] D. M. Divakaran, G. Carofiglio, E. Altman, and P. VicatBlanc Primet. A Flow Scheduler Architecture. In IFIP/TC6 Networking 2010, pages 122–134, May 2010. [8] C. Estan and G. Varghese. New directions in traffic measurement and accounting. SIGCOMM CCR, 32(4):323–336, 2002. [9] S. Floyd and V. Jacobson. Random early detection gateways for congestion avoidance. IEEE/ACM Trans. Netw., 1:397– 413, August 1993. [10] S. B. Fred, T. Bonald, A. Proutiere, G. R´egni´e, and J. W. Roberts. Statistical bandwidth sharing: a study of congestion at flow level. SIGCOMM CCR, 31(4):111–122, 2001. [11] L. Guo and L. I. Matta. The war between mice and elephants. In ICNP ’01: Proceedings of the Ninth International Conference on Network Protocols, page 180, 2001. [12] M. Harchol-Balter, B. Schroeder, N. Bansal, and M. Agrawal. Size-based scheduling to improve web performance. ACM Trans. Comput. Syst., 21(2):207–233, 2003. [13] W. John, S. Tafvelin, and T. Olovsson. Trends and Differences in Connection-Behavior within Classes of Internet Backbone Traffic. In PAM, pages 192–201, 2008. [14] L. Kleinrock and R. R. Muntz. Processor sharing queueing models of mixed scheduling disciplines for time shared system. J. ACM, 19(3):464–482, 1972. [15] The Network Simulator NS-2. http://www.isi.edu/nsnam/ns. [16] K. Psounis, A. Ghosh, B. Prabhakar, and G. Wang. SIFT: A simple algorithm for tracking elephant flows, and taki ng advantage of power laws. In 43rd Annual Allerton Conf. on Control, Communication and Computing, 2005. [17] I. A. Rai, G. Urvoy-Keller, M. K. Vernon, and E. W. Biersack. Performance analysis of LAS-based scheduling disciplines in a packet switched network. SIGMETRICS PER, 32(1):106–117, 2004. [18] L. Schrage. A proof of the optimality of the Shortest Remaining Processing Time Discipline. Operations Research, (16):687–690, 1968. [19] A. Smitha and A. Reddy. LRU-RED: an active queue management scheme to contain high bandwidth flows at congested routers. In Proceedings of Globecom, 2001. [20] A. Vishwanath, V. Sivaraman, and M. Thottan. Perspectives on router buffer sizing: recent results and open problems. SIGCOMM CCR, 39:34–39, March 2009. [21] S. F. Yashkov. Processor-sharing queues: some progress in analysis. Queueing Syst. Theory Appl., 2(1):1–17, 1987. [22] Y. Zhang, L. Breslau, V. Paxson, and S. Shenker. On the characteristics and origins of Internet flow rates. In Proc. SIGCOMM ’02:, pages 309–322. ACM, 2002.

Using Spikes to Deal with Elephants

large flows is to use (real-time) sampling to detect large flows (thus classifying them), and use this information to perform size-based scheduling. Since the requirement here is only to differentiate between small and large flows, the sampling strategy need not necessarily track the exact flow size. A simple way to achieve this ...

Download PDF

312KB Sizes 1 Downloads 194 Views

Report

Using Spikes to Deal with Elephants

Recommend Documents