TIMEFLIP: Using Timestamp-Based TCAM Ranges to Accurately ...

Viewer
Transcript

1

T IME F LIP: Using Timestamp-based TCAM Ranges to Accurately Schedule Network Updates Tal Mizrahi, Ori Rottenstreich, Yoram Moses

Abstract—Network configuration and policy updates occur frequently, and must be performed in a way that minimizes transient effects caused by intermediate states of the network. It has been shown that accurate time can be used for coordinating network-wide updates, thereby reducing temporary inconsistencies. However, this approach presents a great challenge; even if network devices have perfectly synchronized clocks, how can we guarantee that updates are performed at the exact time for which they were scheduled? In this paper we present a practical method for implementing accurate time-based updates, using T IME F LIPs. A T IME F LIP is a time-based update that is implemented using a timestamp field in a Ternary Content Addressable Memory (TCAM) entry. T IME F LIPs can be used to implement Atomic Bundle updates, and to coordinate network updates with high accuracy. We analyze the amount of TCAM resources required to encode a T IME F LIP, and show that if there is enough flexibility in determining the scheduled time, a T IME F LIP can be encoded by a single TCAM entry, using a single bit to represent the timestamp, while allowing a very high degree of accuracy. Index Terms—TCAM, SDN, network updates, time, clock synchronization.

I. I NTRODUCTION A. Background Network updates are a routine necessity; policy changes or traffic-engineered route changes may occur frequently, and often require network devices to be reconfigured. This challenge is especially critical in Software Defined Networks (SDN), where the control plane is managed by a logically centralized controller, and configuration updates occur frequently. Such configuration updates can involve multiple network devices, potentially resulting in temporary anomalies such as forwarding loops or packet loss. Network devices such as routers and switches use TCAMs for various purposes, e.g., packet classification, Access Control Lists (ACLs), and forwarding decisions. TCAMs are an essential building block in network devices. A typical example for the importance of TCAMs is OpenFlow [2], [3]. An OpenFlow switch performs its functionality using one or more flow tables, most commonly implemented by TCAMs (see, e.g., [4], [5]). The order of the entries in a TCAM determines their priority. Hence, installing a new TCAM entry often involves This manuscript is an extended version of [1], which was presented in IEEE INFOCOM ’15, Hong Kong, April 2015. Tal Mizrahi and Yoram Moses are with the Department of Electrical Engineering, Technion, Haifa 32000, Israel (e-mails: [email protected], [email protected]). Yoram Moses is the Israel Pollak academic chair at Technion. Ori Rottenstreich is with the department of Computer Science, Princeton University, Princeton, NJ 08544, USA. (e-mail: [email protected]). This work was supported in part by the ISF grant 1520/11.

rearranging existing entries, yielding high overhead for each TCAM update. It has been shown [6] that the latency of a TCAM rule installation may vary from a few milliseconds to a few seconds. A recently introduced approach [7]–[10] proposes to use accurate time and local clocks as a means to coordinate network updates. By using synchronized clocks, configuration changes can be scheduled in a way that guarantees a coordinated network-wide update, thereby reducing transient anomalies. One of the main challenges in this approach is to guarantee that scheduled updates are performed accurately according to the desired schedule. Even if the clocks in the network are perfectly synchronized, performing configuration changes requires a potentially complex procedure that may be completed at an uncertain time. B. Introducing T IME F LIPs In this paper we present a method that uses T IME F LIPs to perform accurate time-based network updates. We define a T IME F LIP to be a scheduled update that is implemented using TCAM ranges to represent the scheduled time of operation. We analyze TCAM lookups (Fig. 1) that take place in network devices, such as switches and routers. We assume that the device maintains a local clock, and that a timestamp T recording the local arrival time is associated with every packet that is received by the device. Typically, TCAM search keys consist of fields from the packet header, as well as some additional metadata. In our setting, the metadata includes a timestamp T . Hence, a TCAM entry can specify a range relative to the timestamp T , as a way of implementing timebased decisions. The timestamp T is not integrated into the packet, as it is only used internally in the device, and thus does not compromise the traffic bandwidth of the network device. Network Device

TCAM search key

su, …, s1

Network Device

T action

TCAM

s , …, s1 T T0 search u action key

Fig. 1: TCAM lookup: conventional vs. T IME F LIP. T IME F LIP uses a timestamp field, representing the time range T ≥ T0 . Using a simple microbenchmark on a commercial switch, we show that T IME F LIPs can be performed by existing net-

2

work devices, and analyze the achievable scheduling accuracy of T IME F LIPs. Using the Precision Time Protocol (PTP), based on the IEEE 1588 standard [11], network devices can typically synchronize with an accuracy on the order of 1 µsec [12]–[14].1 We show that the accuracy at which a T IME F LIP is executed compared to its scheduled time is two orders of magnitude more accurate, and hence that networkwide updates can be timed with a 1 µsec accuracy using PTP. T IME F LIPs enable two important scenarios: (i) Atomic Bundle. It is sometimes desirable to reconfigure a network device by applying a set of configuration changes as a bundle, i.e., every packet should be processed either before any of the modifications have been applied, or after all have been applied. The Atomic Bundle feature in OpenFlow [3] defines such functionality; the OpenFlow 1.4 specification suggests that Atomic Bundles can be implemented either by temporarily queuing packets during the update, or by using double buffering techniques. Both approaches may incur significant cost in terms of resources. T IME F LIPs allow a clean and natural way to implement Atomic Bundles; the set of configuration changes can be enabled at all times T ≥ T0 for some chosen time T0 , and the timestamp T defines when the bundle commands atomically come into effect. (ii) Network-wide coordinated updates. If network devices use synchronized clocks, then T IME F LIP can be used for updating different devices at the same time,2 or for defining a set of scheduled updates coordinated in a specific order [9]. T IME F LIPs require every TCAM entry to include a timestamp field. We show that this per-entry overhead is relatively small. Moreover, since TCAMs have fixed entry sizes, it is often the case that a portion of the TCAM entry is unused, and thus can be utilized for the timestamp field. For example, in many cases TCAM entries are used to store the IPv4 5-tuple, requiring 104 bits, while the smallest TCAM entry that can accommodate the 5-tuple is typically larger, e.g., 144 bits [15] or 160 bits [16], leaving a large number of unused bits that can be used for the timestamp field. As TCAM resources are scarce and costly, we aim to represent the timestamp field by as few bits as possible, and each T IME F LIP by as few TCAM entries as possible. Optimal representation of TCAM ranges is a problem that has been widely studied in the literature (e.g., [17], [18]). The problem we address has two unique properties that, to the best of our knowledge, have not been previously analyzed: • Scheduling tolerance. State-of-the-art TCAM range analysis studies the number of TCAM entries required 1 Accurate clock synchronization has become a common and affordable feature in network devices; typical off-the-shelf switch silicons, including lowend devices, have native support for PTP. 2 Subject to the accuracy of the clock synchronization mechanism.

•

to represent a given range of values. In the context of this paper, the range values can be chosen in a way that minimizes the number of TCAM entries. If the time T0 for which a time-based network update is scheduled can be selected within a scheduling tolerance (Fig. 2), given by a range of time values [Tmin , Tmax ], then the number of entries required to represent the range can be reduced. Notably, the scheduling tolerance does not compromise the accuracy of the T IME F LIP. It only presents some flexibility in choosing T0 ; we assume that an SDN controller may choose any T0 within the given range. Once T0 is chosen, the T IME F LIP should occur accurately at T0 . Periodic ranges. If some of the most significant bits of the timestamp value are known to be constant during the T IME F LIP, the network device can simply ignore these bits, by placing ‘don’t care’ values in the respective bits in the TCAM. This property is unique to time ranges, and is not applicable to previously analyzed TCAM field ranges, such as TCP port ranges. For example, if the portion of the timestamp that represents the date is known to be constant during a T IME F LIP, it can be ignored. We refer to time ranges that ignore some of the most significant bits as periodic ranges, and show that the use of periodic ranges allows to represent the time ranges by fewer TCAM entries.

C. Contributions The main contributions of this paper are: • We introduce T IME F LIP s and show how to accurately perform coordinated network updates and Atomic Bundle updates using them. • Our analysis provides a tight upper bound on the number of TCAM entries required for representing a T IME F LIP. We show that by correctly choosing the update time, T0 , the number of TCAM entries used for representing the timestamp range can be significantly reduced. • We present an optimal scheduling algorithm; no other scheduling algorithm can produce a timestamp range that requires fewer TCAM entries. • We analyze the number of bits required for representing the timestamp field in TCAM entries, and show that it is a function of the scheduling tolerance. • We show that using periodic ranges, the timestamp field can be represented by a single bit in the TCAM entry, and every T IME F LIP requires a single TCAM entry, provided that the scheduling tolerance is sufficiently relaxed. • We use a microbenchmark to demonstrate that our approach can be effectively used to schedule accurate timebased updates with existing commercial network devices. Due to space limits, proofs are presented in [19].

scheduled time time Tmin

T0

Tmax

Fig. 2: Scheduling tolerance: T0 ∈ [Tmin , Tmax ].

D. Related Work Consistent network updates have been widely analyzed in the literature. A common approach to avoiding inconsistencies during topology updates in routing protocols is to use a

3

sequence of configuration commands [20], whereby the order of execution guarantees that no anomalies are caused in intermediate states of the procedure. Another recently introduced approach for consistent updates [21] uses configuration version tags to guarantee consistency. Dynamic traffic engineering [6], [22] has been shown to require frequent topology updates that must be applied carefully to optimize the network utilization. In [21] the authors argued that a simultaneous network update does not guarantee consistency, since packets that are en-route during the update may be processed by a mixture of configurations. Indeed, the use of accurate time by itself does not guarantee consistency, but time can be used to implement update scenarios (e.g., [23]) that are not possible with the consistent updates of [21]. Specifically, accurate time can also be used for implementing timed update procedures that do guarantee consistent updates, as discussed in [24], [25]. Moreover, as shown in [26], by adding a timestamp to the header of all the packets in the network it is possible to perform consistent timestamp-based updates. The current paper presents T IME F LIP, a method for performing accurately timed updates. Time and synchronized clocks are used in various distributed applications, from mobile backhaul networks [13] to distributed databases [27]. OpenFlow [3] uses timeouts for expiring old forwarding rules. The controller can define a timeout for a flow rule, causing it to be removed when the timeout expires. However, timeouts are defined with a coarse granularity of 1 second, and thus do not allow delicate coordination. Moreover, since timeouts are by nature relative, they do not allow the accurate coordination that absolute time can provide. In [28] the authors observed that it would be interesting to explore using time synchronization to reconfigure routers at a specific time. The Interface to the Routing System (I2RS) working group of the Internet Engineering Task Force (IETF) has recognized the value of time-based state installations [29], but decided not to pursue this concept, as the ability to accurately perform timed installations was not considered viable [30]. In contrast, we show that accurate time-based updates are in fact viable, even when the TCAM rule installation latency is long and non-deterministic; the current paper proposes a novel approach that enables accurate time-based updates in switches and routers, and demonstrates their applicability to existing network devices. The simplest way to encode a range in TCAMs is known as prefix encoding [17]. In this scheme the set of values that match a rule is presented as a union of prefix TCAM entries (with a sequence of don’t cares as a suffix of the entry), representing disjoint sets of values. For instance, if we denote by W the number of bits used for encoding the range, then for W = 4 the range [4, 14] can be encoded by the four TCAM entries (01**), (10**), (110*), (1110) that respectively represent the ranges [4, 7], [8, 11], [12, 13] and [14, 14], whose union is the requested range [4, 14]. With this encoding, any range defined on W bits can be encoded with at most 2W − 2 entries. Moreover, an extremal range of the form [T0 , 2W − 1] or [0, T0 ] can be represented using at most W entries. By using complementary ranges these

two bounds were improved to W and W2+1 , respectively, in [31]. An alternative encoding named SRGE (Short Range Gray Encoding), that relies on the Gray-code representation of ranges, was shown to improve the maximal expansion to 2W −4 for general ranges [18]. While the above works focused on the encoding of a single range, a wide literature discusses efficient encodings of classifiers with an ordered list of range rules, [32]–[41]. II. U NDERSTANDING T IME F LIP VIA A S IMPLE E XAMPLE A. Timestamp Format In this example we use the 64-bit NTP timestamp format [42]. This time format represents the time elapsed since the base date, which is January 1st , 1900. The time format consists of two 32-bit fields: (i) Time.Sec: the integer part of the number of seconds since the base date, and (ii) Time.Frac: the fractional part of the number of seconds. We assume that the TCAM lookup key includes the 64-bit timestamp field, T , representing the time at which the packet was received by the switch. As we shall see in the example, only a small portion of this 64-bit field is used in practice. Recall (Sec. I-D) that if T is represented by 64 bits, then every extremal range of the form T ≥ T0 requires at most 64 entries. For instance, if T0 = 1073741824 = 230 , then the extremal range T ≥ T0 can be represented by two TCAM entries, as depicted in Fig. 3a. 01 1

*…* *…*

Time.Sec

*…* *…*

*…*

Time.Sec

Time.Frac

1

*…*

Time.Frac

(a) The extremal range T ≥ 230 (b) The update of Fig. 4 can be implemented by a T IME F LIP sec. Represented using the 31st using a single TCAM entry, and 32nd bits of Time.Sec. This and a single unmasked bit. makes use of two TCAM entries.

Fig. 3: Time range examples.

B. A Path Reroute Scenario Consider an SDN with five switches (Fig. 4). Two traffic flows, f1 and f2 are forwarded according to the ‘before’ configuration. Now, let us assume that the S1 → S2 path needs to be shut down for maintenance. Flow f1 must be rerouted to the ‘after’ path, and thus f2 is also diverted to the ‘after’ path to avoid congestion on the S4 → S5 path. In order to reroute the two flows the SDN controller needs to update the configurations of switches S1 and S3 . Note that

S1

f1

S2

S1 S5

S3

f2 before

S4

S3

f1

S2 S5

f2

S4 after

Fig. 4: Flows need to convert from the ‘before’ configuration to the ‘after’ configuration.

4

the maintenance task is not urgent, and thus the controller can tolerate a slight delay of the update; we assume a relaxed TOL = 1 sec. However, the two switches should be updated as close as possible to simultaneously, to avoid temporary congestion on S2 → S5 and S4 → S5 . We assume that at time Tsend the controller is notified about the required maintenance task, and sends update messages to S1 and S3 . In this example Tsend = 1111111110.1234 sec (see Fig. 5a). We assume that the update messages are guaranteed to be delivered and installed by the switches no later than 0.1 sec after Tsend . Thus, the controller can schedule the update to take place at or after Tmin = 1111111110.2234. Since we assumed that TOL = 1 sec, the controller can tolerate an update at any time between Tmin and Tmax = 1111111111.2234. scheduled time

insertion scheduled time

cleanup

time Tsend

Tmin

T0

time

Tmax

T0-

T0

T 0+

TOL

(a) Scheduling the update.

(b) Installation bounds.

Fig. 5: Scheduling timelines. If the controller chooses the scheduled time to be the integer T0 = 1111111111, then the 32-bit Time.Frac can be assigned don’t care, and thus the range T ≥ T0 effectively uses only the upper 32 bits, requiring up to 32 TCAM entries. Once a switch receives a message from the controller it can further reduce the time range representation; if it is guaranteed that the switch installs the new rule less than one second before T0 , and removes the timestamp constraint3 less than a second after T0 (see Fig. 5b), then assigning don’t care in the 31 most significant bits of the timestamp does not affect the range rule, requiring just a single TCAM entry (Fig. 3b). C. The Intuition Behind the Example This example combines two key techniques that are unique to T IME F LIP: (i) the controller chooses a time T0 within the scheduling tolerance that reduces the required number of entries, and (ii) the switch assigns don’t care to the most significant bits that are guaranteed to be constant during the procedure. These two techniques, the former implemented by the controller and the latter by the switch, allow the T IME F LIP to be implemented using a single TCAM entry, as shown in Fig. 3b. The scheduling tolerance captures the fact that there is no urgency in the required update, and thus the controller can tolerate some flexibility in the selection of T0 . However, once T0 is chosen, the update is performed accurately at time T0 ± δ, where δ is the scheduling error of the switch, typically around 1 µsec. Thus, despite the relaxed scheduling tolerance, the accuracy is not compromised; both switches perform the update within the time window 1111111111 ± δ. 3 After time T the switch can assign don’t care in the timestamp field, 0 making the TCAM rule permanent.

A periodic time range is a range that is represented using don’t care in some of the most significant bits of the timestamp field. In our example, assigning don’t care to the 31 upper bits defines a periodic range with a 2-second period, effectively reducing the number of entries required to represent the range by 31. This technique is possible in the current example since it is guaranteed that the value of the upper 31 bits does not change during the lifetime of the T IME F LIP. III. M ODEL AND N OTATIONS A. TCAM Entries A TCAM is an associative memory device that allows fast classification. It compares a search string against a table of stored entries, and returns the address of the matching data. Each address is associated with a specific action. Each TCAM bit can have one of three possible values, 0, 1, or ∗, with the latter representing the don’t care value. The order of entries in a TCAM determines their priority. A TCAM search returns the address of the first entry that matches the search key. Our analysis focuses on a TCAM lookup that is performed by a network device, or a device for short. We assume that the device has access to a clock, and that before a TCAM lookup the device produces a timestamp T , which is obtained by capturing the value of the clock at some instant before the TCAM lookup. For example (Fig. 1), the device can capture the timestamp T for each received packet immediately upon its arrival. The timestamp, together with the packet header, will serve as an input to the TCAM lookup. A TCAM entry is denoted by S → a, where S = (σU , . . . , σ1 ) ∈ {0, 1, ∗}U . The number of bits in a TCAM entry is denoted by U , where 0, 1 are bit values and ∗ stands for don’t care. We denote a sequence of m don’t care bits by (∗m ). A bit that is assigned the don’t care value is said to be masked. The set of possible actions is denoted by A, where an individual action is denoted by a ∈ A. Specifically, throughout our analysis S will have the form (su , . . . , s1 , tW , . . . , t1 ) such that u + W = U , and tW , . . . , t1 represent the bits corresponding to the timestamp T . We denote the m most significant bits of the timestamp T + by T m , and the k least significant bits of T by T k . We define a time-based TCAM entry to be an entry in which at least one of the bits tW , . . . , t1 has a value in {0, 1}, whereas in a time-oblivious entry all the bits tW , . . . , t1 are ∗. A time range is defined to be an interval [T1 , T2 ], where T1 and T2 are W -bit integers. A time range rule is denoted by (su , . . . , s1 , [T1 , T2 ]), or equivalently, (su , . . . , s1 , [T1 ≤ T ≤ T2 ]). Such a rule can be represented by one or more time-based TCAM entries. The rule expansion of a range [T1 , T2 ] is the minimal number of entries that can be used for representing the range. In the context of this paper we focus on prefixbased expansions [17], in which only prefix entries are used. Further discussion about the use of prefix encoding and other encoding schemes is presented in Sec. VII-E. An extremal range is a range that has one of two possible forms, either a right extremal range, which has the form [T1 , 2W − 1], also denoted by T ≥ T1 , or a left extremal range, [0, T2 ]. We denote by r(T1 ) the prefix-based expansion

5

TCAM

TCAM

su, …, s1, **

a

su, ...,s1, [T T0]

TCAM

a

su, …,s1, [T T0]

(i)

(ii)

(iii)

TCAM

TCAM

TCAM

su … s1, [T T0]

a

s’u … s’1, [T T0]

a’

(iv)

su, …, s1, [T T0] su, …, s1, ** (v)

a’ a

su, …, s1, [T T0] su, …, s1, **

a

a a’

(vi)

Fig. 6: A timed TCAM Update. Every line in the figure is a time range rule, represented by one or more TCAM entries. (i) Time-oblivious entry. (ii) Installation. (iii) Removal. (iv) Rule update. (v) Action update. (vi) Action update using a complementary timestamp range.

rule S → a is deleted, and (su , . . . , s1 , [T ≥ T0 ]) → a0 is replaced by (su , . . . , s1 , ∗, . . . , ∗) → a0 , requiring a single entry. As shown in Fig. 6(vi), an alternative representation of the action update uses a rule that maps T < T0 to a, and S to a0 . Complementary encoding [31], [43], also known as negative encoding, allows in some cases to represent the rule more efficiently, as further discussed in Sec. V. T IME F LIP can be used in any of these four scenarios. We believe that the action update is the most interesting of the four, since a scenario that requires changing the behavior of an existing flow under traffic is more likely to require a delicate update procedure, and hence is more likely to require T IME F LIPs. Related literature [6], [21] also focuses on action update scenarios rather than installation and removal scenarios. For the sake of clarity, we start by discussing the simpler case of timed installations (Sec. IV), and then extend these results to action updates (Sec. V). C. Timed Installation: Formal Definition

of a right range, [T1 , 2W − 1], and by `(T2 ) the expansion of the left range [0, T2 − 1]. Note that `(0) is undefined. B. T IME F LIP: Theory of Operation Consider a coordinated network update that is due to take place at time T0 and requires a TCAM update. The na¨ıve approach to update the TCAM is to schedule the device’s management software4 to perform the required modification as close as possible to T0 . This approach allows a limited degree of accuracy, which depends on the operating system’s ability to perform real-time operations, and on the load caused by other tasks that run on the CPU. In our approach the TCAM management software installs the time-based TCAM entries ahead of the activation time, T0 , allowing the update to be applied precisely at T0 . After time T0 the management software performs cleanup operations, e.g., removing rules that apply only to times T < T0 . We address four main classes of T IME F LIPs (see Fig. 6): (i) Installation. A new TCAM rule S → a is installed, effective from time T0 . A timed installation is a rule of the form (su , . . . , s1 , [T ≥ T0 ]) → a. (ii) Removal. An existing TCAM rule S → a is scheduled to be deactivated at time T0 , using a rule of the form (su , . . . , s1 , [T < T0 ]) → a. (iii) Rule update. An existing rule S → a is modified to S 0 → a0 at T0 . A rule update can simply be represented as a pair of rules, one for removal and one for installation. (iv) Action update. The action of an existing TCAM rule is modified from a to a0 at time T0 . Hence, prior to T0 the management software installs a rule of the form (su , . . . , s1 , [T ≥ T0 ]) → a0 that precedes the existing S → a. The first-match behavior of TCAMs implies that if T ≥ T0 , the search matches the newly installed rule, and a0 is performed, whereas at any time before T0 the S → a rule prevails. After time T0 the TCAM management software removes the excess rules: the 4 A network device typically runs a software layer that performs various tasks, including TCAM management.

Let R be a time range. Given a time-oblivious TCAM entry S → a with S = (su , . . . , s1 , ∗, . . . , ∗), we define a timed installation of S → a over R to be a TCAM rule SR → a, such that SR := (su , . . . , s1 , R). Hence, a is active during the time range R. We define the expansion of a timed installation over a time range R to be the expansion of the range R. Since R is a time range, SR is represented by one or more entries in the TCAM. We note that even if more than one entry is used to represent SR , the excess entries are required only for a brief period of time; we assume that shortly after SR is activated the TCAM management software performs a cleanup, leaving only a single entry, representing S → a. IV. O PTIMAL T IME - BASED RULE I NSTALLATION A. Optimal Scheduling It has been shown [17] that an extremal range of the form [T0 , 2W − 1] can be represented using at most W entries. However, we observe that a careful selection of the value T0 can significantly reduce the number of entries required for representing this update. The update time T0 may be tuned to an optimal value in the following scenarios: (i) In Atomic Bundle updates, a network device is required to perform a set of changes atomically, without strict timing constraints, and hence it is flexible to select the time T0 at which these changes are performed. (ii) In network-wide coordinated updates, optimal scheduling can be enforced by a central entity that determines the update time, e.g., a Network Management Station (NMS) or an SDN controller. The central entity’s goal is to find a value of T0 (within a set of allowed values) that will minimize the timestamp range expansion; the underlying assumption is that all network devices use the same format to represent the timestamp, and the same TCAM range encoding scheme. As depicted in Fig. 2, we assume that the value of T0 is determined by a scheduling algorithm, subject to the constraint Tmin ≤ T0 ≤ Tmax . We define the scheduling tolerance,

6

denoted by TOL, to be Tmax − Tmin + 1. This is the number of allowable values for T0 . We wish to study how T0 should be selected. As a first step we learn the expansion of a specific extremal range [T0 , 2W − 1]. Property 1, based on [17], shows that the expansion of this range is given by the number of ‘1’-s in a binary representation of 2W − T0 . Property 1. The expansion r(T0 ) of a right range [T0 , 2W −1] is given by the number of ‘1’-s in a binary representation of the number of values in the range, 2W −1−T0 +1 = 2W −T0 . Proof Outline. The property follows from the definition of the prefix encoding as defined in [17]. The encoding is composed of entries that consider disjoint sets of inputs. The cardinality of each set is a power of 2, and distinct sets have different cardinalities. The sum of the cardinalities equals the number of values in the range. Example 1. For W = 4, the range [T0 , 2W − 1] = [9, 15] includes 15 − 9 + 1 = 7 values. The binary representation of 7 is 0111, which has three ‘1’-s and accordingly the range can be encoded using the three entries (1001), (101∗), (11 ∗ ∗). Likewise, the range [11, 15] has 15 − 11 + 1 = 5 = 0101 values and can be encoded by the two entries (1011), (11 ∗ ∗). The range [15, 15] has a single value (1 = 0001) and can be encoded by the single entry (1111). By symmetry, we can show that the expansion `(T0 ) of the range [0, T0 −1] is given by the ‘1’-s in a binary representation of the number of values in the range, T0 − 1 − 0 + 1 = T0 . The next theorem relates the expansion of a right range [T0 , 2W − 1] and a left range [0, T0 − 1]. Theorem 1. For W ≥ 1 and T0 ∈ [1, 2W − 1] the expansion r(T0 ) of the right range [T0 , 2W − 1] and the expansion `(T0 ) of the left range [0, T0 − 1] satisfy r(T0 ) + `(T0 ) ≤ W + 1. Proof. Let Br = 2W −1−T0 +1 = 2W −T0 and B` = T0 −1+ 1 = T0 be the number of values in the right and the left ranges, respectively. Clearly, Br + B` = 2W . Let (br,W , · · · , br,1 ) and (b`,W , · · · , b`,1 ) be the binary representations of Br , B` . Let nr = Σi∈[1,W ] br,i and n` = Σi∈[1,W ] b`,i . By Property 1, we have that r(T0 ) = nr and `(T0 ) = n` . We show the result by induction. First, if W = 1, we have that T0 = 1. The right range [1, 1] and the left range [0, 0] can be both encoded in a single entry and r(T0 ) + `(T0 ) = 1 + 1 = 2 ≤ W + 1. For a general W , we distinguish between the two following subcases. If br,1 = 1, i.e. Br is odd (and accordingly b`,1 = 1, i.e. B` is odd as well since Br + B` = 2W ), we have that (2W − 1) − Br = B` − 1. Since (2W − 1) can be represented by a binary vector with W ‘1’-s, the number of ‘1’-s in (2W − 1) − Br is W − nr . By the last equality this equals n` − 1, the number of ‘1’-s in B` − 1. We then have that W − nr = n` − 1 and r(T0 ) + `(T0 ) = nr + n` = W + 1. In the second sub-case br,1 = b`,1 = 0 and Br , B` are even. Then, r(T0 ) = nr = Σi∈[1,W ] br,i = Σi∈[2,W ] br,i and `(T0 ) = n` = Σi∈[1,W ] b`,i = Σi∈[2,W ] b`,i . We now consider T00 = 0.5 · T0 with W 0 = W − 1 and examine the expansions 0 r(T00 ) and `(T00 ) within the space [0, 2W − 1]. We have that 0 2W − T00 = 0.5 · (2W − T0 ) and the number of values in

S CHEDULE(Tmin , Tmax , W ) 1 t0 ← 0; i ← 0 2 while ti ∈ / [Tmin , Tmax ] 3 i←i+1 4 if ti−1 < Tmin 5 ti ← ti−1 + 2W −i 6 else 7 ti ← ti−1 − 2W −i 8 TS CH ← ti 9 return TS CH Fig. 7: Optimal scheduling algorithm; no other scheduling algorithm produces an extremal range with a lower expansion.

0

[T00 , 2W − 1] is represented by (br,W , · · · , br,2 ) and r(T00 ) = Σi∈[2,W ] br,i = r(T0 ). Likewise, since T00 = 0.5·T0 the number of values in [0, T00 − 1] is represented by b`,W , · · · , b`,2 and `(T00 ) = Σi∈[2,W ] b`,i = `(T0 ). Accordingly, r(T0 ) + `(T0 ) (in W ) equals the sum of the expansions r(T00 ) + `(T00 ) in W 0 = W − 1. By the induction hypothesis we have that r(T00 ) + `(T00 ) ≤ W 0 + 1 = W − 1 + 1 = W ≤ W + 1 and the result follows. We can now introduce the S CHEDULE algorithm (Fig. 7), which computes an optimal value, TS CH , for a given range [Tmin , Tmax ]. Throughout the paper we use the notation TS CH , defined by TS CH := S CHEDULE(Tmin , Tmax , W ), and the range RS CH , defined by RS CH := [TS CH , 2W − 1]. Intuitively, S CHEDULE performs a binary search over the range [0, 2W − 1], and returns the first value that falls within [Tmin , Tmax ]. Notably, we shall see that due to the nature of the binary search, S CHEDULE returns the value TS CH with the fewest ‘1’-s in the binary representation of 2W − TS CH , and hence, by property 1 minimizes r(TS CH ). In terms of complexity, the number of iterations in S CHEDULE is bounded by W , as it is a binary search over a range of 2W values. The following theorem states that S CHEDULE is optimal, i.e., that no other scheduling algorithm produces an extremal range with a lower expansion. Theorem 2. If TS CH = S CHEDULE(Tmin , Tmax , W ), then r(TS CH ) ≤ r(T ) holds for all T ∈ [Tmin , Tmax ]. Proof. Clearly, if Tmin = 0, then TS CH = 0, and the range [TS CH , 2W − 1] can be represented by a single entry, (∗W ). For Tmin > 0, without loss of generality, TS CH is determined by S CHEDULE after m iterations, i.e., i = m on line 8 of S CHEDULE. We prove the claim by induction on m ≥ 1. Denote 2W − TS CH by B. By property 1, r(TS CH ) is equal to the number of ‘1’-s in the representation of B. For m = 1 we have that TS CH = 2W −1 , and thus B = 2W −1 . Since the binary representation of B is (10 . . . 0), we have r(TS CH ) = 1, which is of course optimal. Now we assume the claim holds for every TS0 CH that is computed when S CHEDULE returns after m iterations. Let TS CH be a value returned by S CHEDULE after m + 1 iterations. We distinguish between two cases: (i) TS CH > 2W −1 : we now ignore the most significant bit of

7

the timestamp field, and reexamine- the algorithm’s outcome. (W −1) (W −1)The algorithm S CHEDULE(Tmin , Tmax , W −1) returns TS CH (W −1) after m -iterations, and thus by the induction hypothesis TS CH (W −1) is optimal in [0, 2W −1 − 1]. Now assume by way of contradiction that there exists a time Tmin ≤ T 0 ≤ Tmax such that r(T 0 ) < r(TS CH ). Thus, the range [T 0 , 2W − 1] can be represented by fewer entries than the expansion of [TS CH , 2W − 1], and by removing the most significant bit of the rule [T 0 , 2W − 1] we get that (W −1)0(W −1)r(T ) < r(TS CH ), contradicting the induction hypothesis. (ii) TS CH < 2W −1 : similarly to the first case, by observing the range [0, 2W −1 −1] we deduce that TS CH (W −1) is obtained after m iterations, and is thus optimal. Assume by way of contradiction that there is a T 0 ∈ [Tmin , Tmax ] such that r(T 0 ) < r(TS CH ). Denote 2W − T 0 by B 0 . Note that [Tmin , Tmax ] ⊂ [0, 2W −1 − 1], since otherwise we would have 2W −1 ∈ [Tmin , Tmax ], and S CHEDULE would terminate after one iteration. Thus, T 0 < 2W −1 . It follows that + + B 01 = B 1 = 1. Since r(T 0 ) < r(TS CH ) in [0, 2W −1 − 1], by property 1 we have that the number of ‘1’-s in B 0 is smaller than the number of ‘1’-s in B. We conclude that there are less ‘1’-s in B 0(W −1) than in B (W −1) , yielding r(T 0(W −1) ) < r(TS CH (W −1) ), which is in contradiction to the induction hypothesis. An interesting property of S CHEDULE is presented in Lemma 3: the output of the algorithm, TS CH , has a long sequence of least significant ‘0’ bits. This property allows very efficient prefix encoding of extremal ranges of the form T ≥ TS CH . Lemma 3. The blog2 (TOL)c least significant bits of TS CH are all ‘0’. Proof. We denote blog2 (TOL)c by X. For TS CH = 0 we have W bits of ‘0’, and the claim is satisfied. For TS CH > 0, we prove this claim by induction on m, the number of iterations in S CHEDULE. For m = 1 we have TS CH = 2W −1 with W − 1 least significant bits of ‘0’, and since TOL < 2W , we have X ≤ W − 1, and thus the claim is satisfied. We assume that the claim holds for m − 1, and prove for m. We consider two distinct cases: (i) TS CH > 2W −1 : in this case [Tmin , Tmax ] ⊂ [2W −1 , 2W − 1]. By considering the (W − 1)-bit shifted range [2W −1 , 2W −1], S CHEDULE produces TS CH (W −1) after m−1 iterations, and thus by the induction hypothesis the X least significant bits are ‘0’, which is true also for TS CH . (ii) TS CH > 2W −1 : this case is similar to (i), except that [Tmin , Tmax ] ⊂ [0, 2W −1 − 1], and thus we can run S CHEDULE on [0, 2W −1 − 1], again concluding from the induction hypothesis that the X least significant bits are ‘0’. The following lemma presents an upper bound on the expansion of the range T ≥ TS CH , as a function of the scheduling tolerance. This captures a tradeoff between the scheduling tolerance and the time-based range expansion; it is possible to reduce the expansion of a timed installation by increasing the scheduling tolerance. Lemma 4. If TOL < 2W then r(TS CH ) ≤ W − blog2 (TOL)c.

Proof. Denote 2W − TS CH by B, and blog2 (TOL)c by X. By Lemma 3, the X least significant bits of TS CH are ‘0’, and hence the X least significant bits of B are ‘0’. Thus, the number of ‘1’-s in the representation of B does not exceed W − X, and by Property 1 we have r(TS CH ) ≤ W − X. B. Average Expansion In this section we study the influence of the scheduling tolerance on the average expansion. We concentrate on the average expansion of the prefix-based encoding of ranges of the form [T0 , 2W − 1] for a given W . Intuitively, for a larger scheduling tolerance TOL within which T0 should be selected, the flexibility is larger and the expansion for the best of the options is expected to be smaller. Our model is the following. For a given W and TOL ∈ [1, 2W ], we examine the possible [Tmin , Tmax ] values that enable TOL possible options. These are the 2W − TOL + 1 values [0, TOL−1], [1, TOL], [2, TOL+1], · · · , [2W −TOL− 1, 2W − 2], [2W − TOL, 2W − 1]. As we described, for each [Tmin , Tmax ] we use the S CHEDULE algorithm to calculate a T0 that has an expansion of minT0 ∈[Tmin ,Tmax ] r(T0 ). Based on Property 1, for [Tmin , Tmax ], TS CH is the value that minimizes the number of ‘1’-s in a binary representation of 2W −T0 ∈ [2W −Tmax , 2W −Tmin ]. We calculate the average value of r(TS CH ) among the 2W − TOL + 1 possible values of [Tmin , Tmax ]. We denote this value by ρ(W, TOL). Theorem 5. For W ≥ 1 and TOL ∈ [1, 2W ], let a = dlog2 (TOL)e. The average value of the expansion of the easiest-to-encode range according to a set [Tmin , Tmax ] with TOL possible values is given by 1 · 1 − 2W −a ρ(W, TOL) = W 2 − TOL + 1 + 2W −a−1 · (W − a + 2) · (2a − TOL + 1) −a−1 i−1 + ΣW 2 · (i + 2) · (TOL − 1) . i=0 Proof. Denote 2W − Tmin and 2W − Tmax by Bmin and Bmax , respectively. Likewise, let B0 represent the value of 2W − T0 for some T0 . Intuitively, a value of [Tmin , Tmax ] defines [Bmax , Bmin ] of which B0 can be selected. Let d ∈ [0, W ] be the maximal number of most significant bits d+ d+ such that Bmin = Bmax . We distinguish between several cases according to the value of d. We first assume that Tmin ≥ 1 (and accordingly Bmin ≤ 2W −1). There are TOL−1 possible values of [Bmax , Bmin ] with d = 0. These are the values that include 2W −1 − 1 and 2W −1 . The TOL − 1 options are [2W −1 − TOL + 1, 2W −1 ], · · · , [2W −1 − 1, 2W −1 + TOL − 2]. In each of these options we can select B0 = 2W −1 , T0 = 2W − B0 = 2W −1 and encode the range [T0 , 2W − 1] = [2W −1 , 2W −1] by a single entry. There are 2·(TOL−1) values of [Bmax , Bmin ] for which d = 1. These are [x + 2W −2 − TOL+1, x+2W −2 ], · · · , [x+2W −1 −1, x+2W −1 +TOL−2] for x ∈ {0, 2W −1 }. The (TOL − 1) options with x = 0 can be encoded with a single entry while the (TOL − 1) options with x = 1 require 2 entries. They require a total number of 1·(TOL−1)+2·(TOL−1) = 2d ·(TOL−1)·(1+d/2) entries. More generally, for i ∈ [0, W − a − 1] there are 2i · (TOL − 1)

8

values of [Bmax , Bmin ] with d = i that require a total number of 2i · (TOL − 1) · (1 + i/2) = 2i−1 · (TOL − 1) · (2 + i) entries. For d = W − a, there are 2W −a · (2a − TOL + 1) values of [Bmax , Bmin ]. In 2W −a of them, B0 can be selected such that it has a last bits of 0. Thus these 2W −a values of [Bmax , Bmin ] require a total number of 2W −a · ((W − a)/2) entries. Similarly to the previous detailed cases, the other 2W −a · (2a − TOL) are encoded each with ((W − a)/2 + 1) entries on average, requiring a total number of 2W −a · (2a − TOL) · ((W − a)/2 + 1) = 2W −a−1 · (2a − TOL) · (W − a + 2) entries. By summarizing these requirements together with the single entry required for the case of Tmin = 0, we deduce the suggested average for the 2W − TOL + 1 possible values of [Tmin , Tmax ]. Example 2. Again, let W = 4. For TOL = 2, we consider the 2W − TOL + 1 = 15 possible ranges [Tmin , Tmax ]: [0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7], [7, 8], [8, 9], [9, 10], [10, 11], [11, 12], [12, 13], [13, 14], [14, 15]. For TOL = 2, we have a = dlog2 (TOL)e = 1. By Theorem 5, we have 1 that the average expansion here is ρ(W, TOL) = 15 · (1 − 3 2 2 i−1 2 + 2 · (3 + 2) · (2 − 2 + 1) + Σi=0 2 · (i + 2) · 1) = 5 25 15 = 3 . Indeed, for the first of these 15 options, we can set T0 = 0 and encode the range [0,15] by the single entry (****). For [1, 2], [2, 3] we set T0 = 2 and encode [2, 15] by the three entries (001∗), (01 ∗ ∗), (1 ∗ ∗∗). Likewise, for [3, 4], [4, 5], [5, 6], [6, 7] two entries are required. For [7, 8], [8, 9] we set T0 = 8 and encode [8, 15] in a single range. For [9, 10], [10, 11] two entries are required (T0 = 10) while for [11, 12], [12, 13], [13, 14], [14, 15] a single entry is required (for T0 = 12 or T0 = 14). This yields an average number of 1 25 5 15 ·(1+2·3+4·2+2·1+2·2+4·1) = 15 = 3 = ρ(W, TOL), as suggested by Theorem 5. C. Installation Bounds and Periodic Ranges As noted in Sec. III-B, setting up a timed installation rule requires a two-step procedure (see Fig. 8); in the insertion step, the TCAM management software installs the timestampdependent TCAM rule representing the configuration that should take place starting at time T0 . In the cleanup step, the management software removes the timestamp dependency of the rules representing the new configuration, leaving a single time-oblivious TCAM entry. In this section we assume that the insertion and cleanup operations are performed within wellknown installation bounds,5 denoted by ∆, i.e., it is guaranteed that the time-based rule is inserted no sooner than ∆ before T0 , and is cleaned up by time T0 + ∆ − 1.

We shall show that guaranteed installation bounds significantly reduce the number of TCAM entries required for a T IME F LIP; rather than defining a range T ≥ T0 , one can define the range [T0 , T1 ] for some T1 ≥ T0 + ∆ − 1, using fewer TCAM entries with effectively the same impact. Two ranges, R1 and R2 , are said to be ∆-similar, denoted ∆ R1 ∼ R2 , if there exists a value T0 such that R1 ∩ R2 ⊇ [T0 , T0 +∆−1] and (R1 ∪R2 )∩[T0 −∆, T0 −1] = ∅. Given an extremal range RT0 = [T0 , 2W −1], since RT0 is only observed during the period [T0 − ∆, T0 + ∆ − 1], every range R that is ∆-similar to RT0 produces the same TCAM match results during this period. Hence, every timed installation over RT0 can be represented by an equivalent timed installation over R. We define the 2V -periodic continuation of a time range [T0 , T1 ], denoted by [T0 , T1 ]V , to be the range defined by masking the W − V most significant bits of the timestamp, 2W −V S −1 V - V i.e., Rpc := ([T0 , T1 ] + n · 2V ). Moreover, if n=0

T1V

<

T0V ,

then Rpc =

n=0

-

Lemma 6. If RBR = B OUNDED R ANGE(T0 , ∆, W ), then the expansion of every timed installation over RBR is bounded by dlog2 (2∆)e. Proof. We denote dlog2 (2∆)e by V . We analyze the periodic range [T0 , T0 + 2V −1 − 1]V , focusing on a single range of 2V values, and distinguish between two cases: + (W −V )+ (i) T0 = (T0 + 2V −1 − 1)(W −V ) : in this case (depicted in Fig. 9(i)) we have a shifted V -bit range, [T0V , (T0 + 2V −1 − 1)V ]. We shall show that this range has a worst-case expansion of V . We analyze the two sub-

T0

(i)

T1 time

cleanup

(n+1) 2V

T0

(ii)

T1 time

time T0 -

T0

T0 +

T1

Fig. 8: Installation bounds 5 In practice, the installation bounds may be high in some network devices. We further discuss how this affects T IME F LIP in Sec. VII-C

-

(([T0V , 2V −1]∪[0, T1V ])+

n · 2V ). The expansion of a periodic range Rpc is the number of entries used for representing the range. Intuitively, periodic ranges (Fig. 9) allow efficient representation of T IME F LIPs. A 2V -periodic range is encoded with don’t care in its W − V most significant bits, and thus the number of bits required to represent such a range is V . We now introduce the B OUNDED R ANGE algorithm. Given a scheduling time, T0 , and an extremal range, RT0 = [T0 , 2W − 1], the algorithm computes a periodic range ∆ RBR ∼ RT0 that, for a sufficiently small ∆, has a smaller expansion than RT0 .

n 2V insertion scheduled time

2W −V S −1

n 2V

(n+1) 2V

Fig. 9: Periodic ranges: the 2V -periodic continuation of the range [T0 , T1 ]; (i) For T1V > T0V , and (ii) For T1V < T0V . V −1 Note that in B OUNDED R ANGE T1 = T0 + 2 − 1.

9

D. Timestamp Field Size in Bits

B OUNDED R ANGE(T0 , ∆, W ) 1 V ← dlog2 (2∆)e 2 return [T0 , T0 + 2V −1 − 1]V Fig. 10: Determining a range with installation bounds ∆.

-

-

ranges [T0V , 2V −1 − 1] and [2V −1 , (T0 + 2V −1 − 1)V ]. Both sub-ranges are in fact (V − 1)-bit shifted extremal ranges. (V −1)The expansions of these two sub-ranges are r(T0 ) and (V −1) `(T0 ), respectively. By Lemma 1, over a (V − 1)-bit field we have r(T ) + `(T ) ≤ V for all T . It follows that the expansion of the two sub-ranges is V . + (W −V )+ (ii) T0 6= (T0 + 2V −1 − 1)(W −V ) : in this case (depicted in Fig. 9(ii)), by definition of a 2V -periodic continuation we have a shifted V -bit range [T0V , 2V −1]∪[0, (T0 +2V −1 − 1)V ]. As in case (i), we have two (V − 1)-bit complementary extremal ranges, and thus the worst-case expansion of the two sub-ranges is V .

In the analysis so far we have been assuming that the timestamp field is a W -bit field. This implies that in every TCAM that requires timed installations, W bits of every entry would be “wasted” on the timestamp field. In this section we analyze how the timestamp field can be significantly reduced, depending on the scheduling tolerance and installation bounds. We show that the size of the timestamp field (Fig. 11) is affected by two factors of the system: (i) If it is well-known that every T IME F LIP is scheduled with a scheduling tolerance TOL, then the X = blog2 (TOL)c least significant bits of the timestamp field are always don’t care, and thus can be omitted from the timestamp field. (ii) If there are guaranteed installation bounds, ∆, the use of a 2V -periodic range, for V = dlog2 (2∆)e, allows the W − V most significant bits to be omitted. X bits

W-V bits function of

timestamp field

* * * *

function of TOL

* * * * * * * W bits

The following theorem states that when the scheduling tolerance, TOL, is sufficiently large, a timed installation can be represented by a single TCAM entry. Theorem 7. If TOL ≥ 2dlog2 (∆)e , then there exists a range R ∆ such that R ∼ RS CH , and the expansion of every timed installation over R is 1. Proof. Since TOL ≥ 2V −1 , we have blog2 (TOL)c ≥ 2V −1 . There exists an integer n such that TS CH = n · X, since by Lemma 3 the X least significant bits of TS CH are ‘0’. We define RBR := B OUNDED R ANGE(TS CH , ∆, W ). By definition ∆ of B OUNDED R ANGE, RBR ∼ RS CH . Thus, at least one of the following must hold: •

•

There exists an integer n such that TS CH = n · 2V . Using B OUNDED R ANGE we have RBR = [0, 2V −1 ]V , which can be encoded by a single entry where the timestamp field has the value ∗W −V , 0, ∗V −1 , i.e., the only unmasked bit is tV = 0. There exists an integer n such that TS CH = (2n+1)·2V −1 . VThus, TS CH = 2V −1 . By using B OUNDED R ANGE we have RBR = [2V −1 , 2V −1]V , which can be encoded by the single entry ∗W −V , 0, ∗V −1 , i.e., the only unmasked bit is tV = 1.

Fig. 11: Example of 1-bit timestamp, per Theorem 10. Lemma 9. If TOL < 2W , then the range RS CH can be represented by a timestamp field of W − blog2 (TOL)c bits. Proof. We denote blog2 (TOL)c by X. By Lemma 3 the X least significant bits of TS CH are ‘0’. Thus, every prefix-based encoding of RS CH has don’t care on the X least significant bits. Hence, RS CH can be represented by the W − X most significant bits. Theorem 10. If TOL ≥ 2dlog2 (∆)e , then there exists a range R ∆ such that R ∼ RS CH , and R can be represented by a timestamp field of a single bit. Proof. The proof is very similar to the proof of Theorem 7; ∆ the range RBR satisfies RBR ∼ RS CH , and can be represented by a single rule, where the only unmasked bit is tV , for V = dlog2 (2∆)e. Theorem 11. If TOL < 2W , the installation bounds are given by ∆, and TOL < 2dlog2 (∆)e , then there exists a range R such ∆ that R ∼ RS CH , and R can be represented by a timestamp field consisting of dlog2 (2∆)e − blog2 (TOL)c bits. ∆

The following theorem generalizes the observations about the scheduling tolerance and installation bounds, and provides the worst-case expansion as a function of TOL and ∆. Theorem 8. If RBR = B OUNDED R ANGE(TS CH , ∆, W ), and TOL < 2dlog2 (∆)e , then the expansion of every timed installation over RBR is bounded by dlog2 (2∆)e − blog2 (TOL)c. Proof Outline. The proof is based on Lemma 4 and Lemma 6.

Proof. The range RBR satisfies RBR ∼ RS CH . By definition of B OUNDED R ANGE, the most significant W − V bits are masked, and can thus be omitted. By Lemma 9 the X least significant bits are masked and can thus be omitted. Hence, we are left with V − X = dlog2 (2∆)e − blog2 (TOL)c bits. It is well-known [17] that a W -bit extremal range has a worst-case expansion W , i.e., there is a tight coupling between the expansion of an extremal range and the number of bits used to represent it. Thus, it is not surprising that our results show that this coupling applies to time-based ranges as well, as seen in Theorems 8 and 11. Specifically, in a system that

10

uses a 1-bit timestamp, per Theorem 10, every T IME F LIP is represented by a single entry, as shown in Theorem 7. V. O PTIMAL T IME - BASED ACTION U PDATES In the previous section we analyzed timed installations (Fig. 6(ii)). In this section we briefly discuss these results in the context of timed action updates (Fig. 6(v)), and show that the number of entries required to represent a time-based action update is, in the worst case, roughly half of the number of entries required to represent a timed installation. Significantly, timed action updates can be represented either by positive encoding (Fig. 6(v)) or by negative encoding (Fig. 6(vi)). It was shown [31] that a W -bit extremal range can be represented by d W2+1 e entries, by choosing the best of the positive or the negative encoding. Let R be a time range, and define Rc := [0, 2W − 1] \ R to be the complementary range of R. Given a time-oblivious TCAM entry S → a with S = (su , . . . , s1 , ∗, . . . , ∗), we define a timed action update of S over R as a pair of TCAM rules (SR → aR , S → a), such that SR := (su , . . . , s1 , R). Hence, aR is activated during the time range R. Note that the order of the rules is of importance, since a TCAM lookup can match S only if it does not match SR . Given a timed action update, (SR → aR , S → a), we define its negative encoding as (SRc → a, S → aR ), such that SRc := (su , . . . , s1 , Rc ). We define the expansion of a timed action update over a time range R, denoted by e(R), as the expansion of R. In this context e(R) is the minimum between the positive encoding of R and its negative encoding, given by Rc . Note that in both cases, positive and negative, the expansion does not include the time-oblivious entry, S. The following theorem defines an upper bound on the expansion of a timed action update over a W -bit extremal range. Theorem 12. If e(RT0 ,W ) is the expansion of a timed action update over RT0 ,W = [T0 , 2W −1], then e(RT0 ,W ) ≤ b W2+1 c. Proof Outline. The proof is based on the d W2+1 e result of [31], with the exception that, in contrast to [31], our definition of timed action updates excludes the entry that assigns don’t care to the timestamp field. Due to this minor difference, the expansion is b W2+1 c rather than d W2+1 e.

The R EDUCED R ANGE algorithm (Fig. 12) computes a ∆similar range for a given [T0 , 2W −1]. As shown in Lemma 14, c. this algorithm has a worst-case expansion of b dlog2 (4∆)e+1 2 If ∆ is high enough, the expansion of the range of R E DUCED R ANGE is roughly half of the expansion produced by B OUNDED R ANGE. However, R EDUCED R ANGE computes a range over a range of 4∆ values, and hence the number of bits required to represent the timestamp in R EDUCED R ANGE is one bit higher than required by B OUNDED R ANGE. R EDUCED R ANGE(T0 , ∆, W ) 1 V 0 ← dlog2 (4∆)e 2 TR ← T0 + ∆ − 1 3 TL ← T0 − ∆ (W −V 0 )+

(W −V 0 )+

Proof Outline. The proof is similar to the proof of 4, but uses the result of Theorem 12 for the worst-case expansion when using both positive and negative encoding. In the previous section we presented the B OUNDED R ANGE algorithm (Fig. 10), and showed that the expansion of a timed installation using B OUNDED R ANGE is at most dlog2 (2∆)e. Indeed, B OUNDED R ANGE can be used for timed action updates as well, yielding the same expansion. However, we now present an algorithm that allows a lower expansion for timed action updates.

(W −V 0 )+

= T0

Fig. 12: Algorithm for finding reduced range with installation bounds.

Lemma 14. If RRE = R EDUCED R ANGE(T0 , ∆, W ), then e(RRE ) ≤ b dlog2 (4∆)e+1 c. 2 Proof. We denote dlog2 (4∆)e by V 0 . We distinguish between two cases: • The condition on line 4 of the algorithm is true (Fig. 13(a)), and R EDUCED R ANGE returns on line 5. In this case the timed action update is represented by the 00 0periodic range [T0V , 2V − 1]V , which is encoded by 00 the V -bit range [T0V , 2V −1]. By Theorem 12 the worst0 case expansion in this case is b V 2+1 c. • R EDUCED R ANGE returns on line 7. We consider two distinct cases: (W −V 0 )+ (W −V 0 )+ (i) TR = T0 : in this case (Fig. 13(b-i)) the 00 0range of line 7, [T0V , 2V −1 − 1]V , is encoded by the 0 00 0V 0 -bit range [T0V , 2V −1 − 1]. Since T0V ≤ 2V −1 − 1,

The following lemma presents the worst-case expansion of using both positive and negative encoding, as a function of the scheduling tolerance. The result generalizes Theorem 12. Lemma 13. If e(RS CH ) is the expansion of RS CH , and TOL < 2W , then e(RS CH ) ≤ b W −blog2 (TOL)c+1 c. 2

(W −V 0 )+

if TR = T0 and TL 00 0return [T0V , 2V − 1]V else 00 0return [T0V , 2V −1 − 1]V

4 5 6 7

TL

(a)

T0

TR time

n 2V’ (b-i)

TL

(n+1) 2V’ T0

TR

n 2V’

(n+1) 2V’ TL

T0

TR

(b-ii)

n 2V’

(n+1) 2V’

Fig. 13: R EDUCED R ANGE: proof of Lemma 14.

11

VI. E XPERIMENTAL E VALUATION

since timed action updates make use of both the positive and the negative encoding. Fig. 15 depicts the effect of the installation bounds, ∆, on the time range expansion. S CHEDULE was used for computing T0 , and B OUNDED R ANGE was used for selecting the time range. Fig. 15a illustrates the expansion for TOL = 1, and includes both the simulated values and the analytical values, based on Lemma 6. Fig. 15b depicts the worst-case expansion for several values of TOL. The star-shaped markers indicate the points where TOL = 2dlog2 (∆)e , illustrating that, as stated in Theorem 7, if ∆ is small enough, i.e., TOL ≥ 2dlog2 (∆)e , the time range can be represented by a single entry. 16

Avg (simulated)

16

TOL=1

14

Max (simulated)

14

TOL=65

12

Max (theoretical)

12

TOL=4097

expansion

expansion

(V 0 −1)-

0-

it follows that T0V = T0 . Thus, the range 00 T0V ≤ 2V −1 − 1 can in fact be represented by the (V 0 −1)- V 0 −1 (V 0 −1)-bit range, [T0 ,2 −1]. By Theorem 12 0 the expansion of this V −1-bit extremal range is bounded 0 V 0 +1 c < b c. by b (V −1)+1 2 2 (W −V 0 )+ (W −V 0 )+ (ii) TL = T0 : in this case (Fig. 13(b-ii)) 00 0the range of line 7, [T0V , 2V −1 − 1]V , is encoded 00 0 by two sub-ranges, [T0V , 2V − 1] ∪ [0, 2V −1 − 1]. 00 Note that the sub-range [T0V , 2V − 1] includes less (W −V 0 )+ (W −V 0 )+ than ∆ values, since TR = T0 . Thus, 00 [T0V , 2V − 1] is in fact a (V 0 − 2)-bit shifted extremal range, which by Theorem 12 has a worst case expansion 0 0 c. The second sub-range, [0, 2V −1 − 1], of b (V −2)+1 2 requires a single 0 entry, and thus we have a worst-case 0 expansion of b (V −2)+1 c + 1 < b V 2+1 c. 2 In both cases the worst-case expansion is bounded by c. b dlog2 (4∆)e+1 2

10 8 6

4 2 0

16

Avg (simulated)

16

14

Max (simulated)

14

12

Max (theoretical)

12

6

Avg (simulated)

expansion

expansion

We implemented the S CHEDULE and B OUNDED R ANGE algorithms, and computed the respective range expansion and the required timestamp bit size in various cases. All of our simulations were performed with W = 16.

8

10

100

1000

1

10000 100000

10

100

1000

10000 100000

installation bounds (Δ)

installation bounds (Δ)

(a) Expansion as a function of ∆ (b) The simulated worst-case for Tmax = Tmin . Theoretical expansion as a function of ∆ for values are based on Lemma 6. various values of TOL. The star-shaped markers indicate the points where TOL = 2dlog2 (∆)e .

Fig. 15: Expansion as a function of ∆ with B OUNDED R ANGE in a timed installation.

A. Simulation-based Evaluation

10

6

2 1

Avg (theoretical)

8

4

0

Our evaluation is composed of two parts: (i) A simulation-based analysis was used to evaluate the resources required for representing T IME F LIPs, and to verify our analytical results from the previous sections. (ii) A microbenchmark using a commercial switch was used to evaluate the accuracy of timed updates using our approach.

10

Fig. 16 illustrates the effect of the scheduling tolerance and the installation bounds on the number of bits required to represent the timestamp field. Again, the star-shaped markers indicate the points where TOL = 2dlog2 (∆)e , and thus by Theorem 10, if ∆ has a smaller value than the star-shaped marker, the timestamp field requires only a single bit.

Max (simulated) Max (theoretical)

10

16

TOL=1

8

14

TOL=65

12

TOL=4097

6

4

4

2

2

0

0

1

10

100

1000

10000 100000

TOL

bits

10 8 6 4

1

10

100

1000

10000 100000

TOL

2 0

(a) Timed installation: expansion as a function of TOL. The theoretical max is based on Lemma 4, average is based on Theorem 5.

(b) Timed action update: expansion as a function of TOL using both positive and negative encoding. Theoretical max is based on Lemma 13.

1

10

100

1000

10000 100000

installation bounds (Δ)

Fig. 14: Expansion as a function of TOL

Fig. 16: The number of bits as a function of ∆ for various values of TOL, using B OUNDED R ANGE in a timed installation. The star-shaped markers indicate the points where TOL = 2dlog2 (∆)e .

We evaluated the expansion of an extremal range as a function of the scheduling tolerance, TOL. For each value of TOL we simulated all the possible values of Tmin , and the graphs in Fig. 14 present both the worst-case expansion and the average expansion (as defined in Sec. IV-B). Fig. 14a depicts the results for timed installation, i.e., r(T0 ), while Fig. 14b illustrates the results for timed action updates. It can be shown that the expansion of the latter is roughly half of the former,

Fig. 17 compares the B OUNDED R ANGE algorithm and the R EDUCED R ANGE algorithm for timed action updates; the latter requires fewer entries (Fig. 17a), whereas the former allows the timestamp field to be represented by less bits (Fig. 17b). The simulations confirm our theoretical results, and demonstrate the tradeoff between the two parameters, TOL and ∆, and the TCAM resource consumption.

12

12

10

10

8

BoundedRange

BoundedRange 8

7 6

bits

expansion

ReducedRange

ReducedRange

9

5 4

6 4

3 2

2 1

0

0

1

1

10

100

1000

10000

10

100

100000

1000

10000 100000

Δ

Δ

(a) The worst-case expansion as a function of ∆ for W = 16 and Tmax − Tmin = 64.

(b) Number of bits as a function of ∆ for W = 16 and Tmax − Tmin = 64.

Fig. 17: Timed action updates: R EDUCED R ANGE vs. B OUND ED R ANGE .

B. Microbenchmark We performed a microbenchmark on a commercial switch in order to verify two key aspects of T IME F LIP: (i) that the method presented in this paper is applicable to real-life switches, and (ii) that the method can effectively provide a high degree of accuracy. As mentioned above, when an update is scheduled for time T0 , it is performed in practice at some time t ∈ [T0 −δ, T0 +δ]. The scheduling error, δ, is affected by two factors, the device’s clock accuracy, which is the maximal offset between the clock value and the value of an accurate time reference, and the execution accuracy, which is a measure of how accurately the device can perform a timed update, given a clock that is perfectly synchronized to real time. The achievable clock accuracy strongly depends on the network size and topology, and on the clock synchronization method being used. For example, the achievable accuracy using the Precision Time Protocol [11] is typically on the order of 1 µsec [13], [14]. Our microbenchmark is focused on the execution accuracy of time-based TCAM updates. 10 9 8 7 6 5 4 3 2 1 0

PDF

65.3

65.4

65.5

65.6

65.7

65.8

time between flips [microseconds]

(a) Experiment setup.

(b) Empirical PDF of the time between T IME F LIPs.

Fig. 18: Microbenchmark The experiment was performed using an evaluation board of the Marvell 98DX4251 [44] switch silicon. It is important to emphasize that we used the switch as-is, without modifications or extensions. The reason we used a switch silicon evaluation board is that it provides flexible configuration options compared to an offthe-shelf pizza-box switch. Specifically, the evaluation board allows the flexibility to define the structure of the TCAM

key. The TCAM key can be configured to include any of the packet header fields, as well as many metadata fields, including the ingress timestamp. The ingress timestamp is the time at which the packet was received by the switch. This timestamp is measured using the switch’s internal clock, which can be synchronized to other clocks using IEEE 1588 [11]. The 98DX4251 measures the ingress time of every packet that is received by the switch. This ingress timestamp is attached to the packet’s internal metadata, and thus it does not increase the packet length, and does not reduce the throughput of switch. The experiment setup is illustrated in Fig. 18a. We used an IXIA XM12 packet generator, that was connected to ports 0, 1, and 2 of the switch, and was configured to continuously transmit 64B-packets to port 0 of the switch at a full-wirespeed of 10 Gbps. Thus, a packet was transmitted to the switch every 67.2 ns (nanoseconds). The switch was configured to perform a TCAM lookup on all incoming packets, with the following two entries: 15 • (in port = 0, T = (∗ . . . ∗, 1, ∗ )) → out port = 1 • (in port = 0, T = (∗ . . . ∗)) → out port = 2 The only unmasked bit in the timestamp field of the first entry was t16 = 1. T is measured in nanoseconds, and therefore the 16th bit, t16 , represents 216 ns. Consequently, the two rules produce periodic behavior where each rule is matched for a duration of 215 ns; the first rule is matched for a duration of 215 ns, and then the second rule is matched for 215 ns, and so on. In the context of this experiment, every T IME F LIP between out port=1 and out port=2 is a timed action update. Our analysis focuses on the question how accurately the timed action updates occur. To answer this question, we measured the time between two consecutive T IME F LIPs from out port=1 to out port=2. We repeated this measurement 50 times, and the empirical Probability Density Function (PDF) of these measurements is illustrated in Fig. 18b. The expected mean time interval between T IME F LIPs was 216 = 65536 ns. The precision of our measurements was affected by two factors: (i) the packet generator timestamped the incoming packets with a 10 ns resolution, and (ii) the packet generator transmitted a packet exactly every 67.2 ns. Due to these two factors, the precision of the measurement was on the order of tens of nanoseconds. Notably, since the measurement is performed by the packet generator, as a difference between two T IME F LIP events, no synchronization is required between the packet generator and the switch. As shown in Fig. 18b, the timed action updates were all performed within tens of nanoseconds of the expected time, which is well within the margin of error of our measurement method. Hence, the execution accuracy in our experiment was no worse than tens of nanoseconds, which is negligible compared to the clock accuracy in a typical network, on the order of 1 µsec. Thus, the microbenchmark indicates that using the method we present in this paper, updates can be timed in a typical network with a microsecond accuracy. Notes. In this experiment we observed that T IME F LIP can be implemented by existing switch silicons with a sub-1 µsec accuracy at full-wire-speed. Note that this microbenchmark provides no indication regarding large-scale networks or stress

13

scenarios, and does not necessarily reflect on real-life values of TOL and ∆. We did not evaluate the clock accuracy, but we note that the switch we evaluated has hardware support for IEEE 1588 [11], a mature technology that provides a sub-microsecond clock accuracy in typical network scenarios (e.g., [12]), with typically less than 100 packets per second per port [45], a negligible overhead in high-speed networks. VII. D ISCUSSION A. Scheduling Accuracy T IME F LIP allows accurate scheduling while allowing efficient TCAM resource consumption. As discussed in Section VI-B, the accuracy of the T IME F LIP scheduling is affected by two factors, the execution accuracy, and the clock accuracy. Notably, the execution accuracy of T IME F LIPs is not affected by the scheduling tolerance, TOL, and the installation bound, ∆, whereas the resources required, namely the number of bits per timestamp and the number of entries per T IME F LIP is indeed affected by these two parameters. The execution accuracy in our microbenchmark was on the order of tens of nanoseconds (Sec. VI-B), and since the clock accuracy in large-scale networks has been shown to be on the order of 1 µsec [12], we deduced that the scheduling accuracy is on the order of 1 µsec. However, an accuracy of 1 µsec may not be sufficient in high-speed networks, where the network latency can be as low as tens of µsec. Fortunately, low network latency allows the synchronization protocol to achieve a higher accuracy [46], and thus in low-latency networks we expect the scheduling accuracy to be better than 1 µsec. B. Timestamp Size in Real-Life The required size of the timestamp field in the TCAM is a function of TOL and ∆, as per Theorems 8 and 11. Different SDN applications may yield different TOL values, and the value of ∆ may vary according to the switch types. Therefore, the timestamp size should be designed according to the worstcase values of TOL and ∆. Some switches, such as the 98DX4251 which was used in the experiment of Section VI-B, provide the flexibility to determine the size of the timestamp field in the TCAM entry. Clearly, the timestamp field should be as compact as possible, allowing the timestamp to fit into unused spare bits in the TCAM entry. If the timestamp size exceeds the number of spare bits in the TCAM entry, then using the timestamp requires to increase the size of the TCAM entries, causing a respective decrease in the number of TCAM entries. Example 3 provides some intuition as to what the timestamp field size should be in typical systems. Example 3. If ∆ is 10 seconds, and TOL is 100 milliseconds, then by Theorems 8 and 11, a 9-bit timestamp can be used to represent any extremal time range with at most nine TCAM entries. We believe that Example 3 presents a pragmatic real-life timestamp size, as even in highly stressed conditions the installation bounds are not expected to exceed 10 seconds, and a TOL of 100 milliseconds should be enough to satisfy urgent update requirements.

C. TCAM Update Performance Previous work [6] has demonstrated large fluctuations in TCAM rule installation latencies, since a TCAM update often requires multiple TCAM entries to be moved or reordered. These latencies have been shown to vary from a few milliseconds to a few seconds. In the context of our analysis, these high latencies are represented by a high value of ∆. Notably, a high value of ∆ may yield high resource consumption by each T IME F LIP, but does not compromise the T IME F LIPs’ execution accuracy. When the rate of TCAM updates is high, we expect ∆ to have a high value. Thus, a system should be designed assuming a sufficiently high value of ∆, considering the most stressed update scenarios. Based on the analysis of [6], [47], it is safe to assume that ∆ = 10 sec in typical systems (as in Example 3), as the worst-case installation latencies have been shown to be on the order of a few seconds. In some cases ∆ may be lower, allowing to further reduce the amount of TCAM resources for each T IME F LIP. Another aspect of the update performance is the TCAM’s access throughput. Since a TCAM has limited throughput, in some switches TCAM rule installations or updates may temporarily suspend the traffic, causing a slight degradation in the switch’s full-wire-speed performance. Each T IME F LIP may require a few TCAM entries, which may reduce the TCAM’s throughput compared to untimed update approaches. D. Timed Updates of Non-TCAM Memories The concepts presented in this paper can be used for applying timed updates to non-TCAM lookup tables in network devices. We provide an example of performing a timed update in an IP routing table. Assume that at time T0 a set of entries in the routing table should be updated to a new value. As shown in Fig. 19, a time-based TCAM range is used for defining the time range T ≥ T0 , and the corresponding action is a version metadata field, indicating whether routing should be performed based on the old version or on the new one. The version value is then used to access the routing table, along with the destination IP address. This approach bears some resemblance to the version tag approach of [21], although our approach uses the version indication internally in the network device, and it is not added to the packet header as in [21].

IP routing lookup

TCAM T incoming packet

[T T0]

ver, IP outgoing packet

Fig. 19: Timed updates in non-TCAM lookups E. On the TCAM Encoding Scheme Using Prefix Encoding Throughout this paper, the analysis assumes that time ranges are represented by prefix encoding [17] to describe ranges in

14

TCAMs. While this is the simplest and most common coding scheme for TCAM ranges, other schemes might be considered. Although for some specific ranges the alternative schemes might achieve improved expansions, this specific scheme has an important advantage. It suggests an upper bound of b W2+1 c on the maximal (worst) case expansion of complementary extremal ranges [31]. As shown in [31], no encoding scheme can achieve a smaller bound on its worst case expansion. Accordingly, as discussed in Sec. V, the expansion of a timed action update equals this upper bound, i.e. the bound b W2+1 c for the prefix encoding, cannot be improved using any other alternative scheme. Alternative Encoding Schemes Despite of the above, we might have been thinking of using alternative schemes to improve the representation of some specific ranges. We briefly discuss two such encodings, Short Range Gray Encoding (SRGE) [18], and Database Independent Range PreEncoding (DIRPE) [48]. SRGE. We show that the analysis and the optimal selection of T0 (Fig. 7) applies also to SRGE, a common encoding scheme that relies on Gray code. This means that there are no ranges for which this encoding can improve the performance of prefix encoding. We rely on an observation that any range of the form [T0 , 2W − 1] has exactly the same expansion in the two schemes. This is summarized in the following lemma. Lemma 15. A right extremal range [T0 , 2W − 1] has the same expansion in the prefix encoding and in SRGE encoding scheme. Proof. The proof directly follows from the algorithm of the SRGE encoding. Generally, SRGE splits a range [s, e] to two disjoint parts [s, p], [p + 1, e] in two disjoint subtrees. It encodes the shorter of them using prefix encoding and uses the selection property of the Gray code to replace a digit in these entries by ‘*’ to cover a subset of the same size from the longer part. Then, if required, it completes the encoding of the longer part using at least one additional entry. Given a range with a length that is a power of two, both schemes require a single entry. Otherwise, in the case of a right extremal range of the form [T0 , 2W − 1], the right part is longer than the left and takes the whole relevant subtree. Then, both schemes are composed of the entries in a prefix encoding of the left part together with one additional entry for the right part. Based on the latter lemma, we can deduce that the selection of TSCH based on the S CHEDULE algorithm is optimal also for the SRGE scheme, and that the average expansion is equal in the two schemes. DIRPE Much like T IME F LIP, several encoding schemes such as DIRPE make use of the unused bits that are available in TCAM entries due to the non-flexible width of TCAMs. These bits can be used to further reduce the range expansion. To avoid competition on the same resource, it is preferable not to use T IME F LIP with these specific schemes. We demonstrate that in practice the observed expansion is often small even when using the prefix encoding or the SRGE encoding, which do not rely on these additional bits.

VIII. C ONCLUSION We introduced T IME F LIP, a practical method of implementing accurate time-based network updates and a natural implementation of Atomic Bundles, using time-based TCAM ranges. We have shown that in practical conditions, a small number of timestamp bits are required to accurately perform a T IME F LIP using a small number of TCAM entries. At the heart of our analysis lie two properties that are unique to time-based TCAM ranges. First, by carefully choosing the scheduled update time, the range values can be selected to minimize the required TCAM resources. Second, if there is a known bound on the installation time of the TCAM entries, then by using periodic time ranges, the expansion of the time range can be significantly reduced. We have shown that T IME F LIPs work on existing network devices, making accurate time-based updates a viable tool for network management. R EFERENCES [1] T. Mizrahi, O. Rottenstreich, and Y. Moses, “TimeFlip: Scheduling network updates with timestamp-based TCAM ranges,” in IEEE INFOCOM, 2015. [2] N. McKeown, T. Anderson, H. Balakrishnan, G. M. Parulkar, L. L. Peterson, J. Rexford, S. Shenker, and J. S. Turner, “Openflow: enabling innovation in campus networks,” ACM Computer Communication Review, vol. 38, no. 2, pp. 69–74, 2008. [3] Open Networking Foundation, “Openflow switch specification,” Version 1.4.0, 2013. [4] A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalagandula, P. Sharma, and S. Banerjee, “DevoFlow: scaling flow management for high-performance networks,” in ACM SIGCOMM, 2011. [5] J. Naous, D. Erickson, G. A. Covington, G. Appenzeller, and N. McKeown, “Implementing an OpenFlow switch on the NetFPGA platform,” in ACM/IEEE ANCS, 2008. [6] X. Jin, H. H. Liu, R. Gandhi, S. Kandula, R. Mahajan, J. Rexford, R. Wattenhofer, and M. Zhang, “Dionysus: Dynamic scheduling of network updates,” in ACM SIGCOMM, 2014. [7] T. Mizrahi and Y. Moses, “Software Defined Networks: It’s about time,” in IEEE INFOCOM, 2016. [8] ——, “Time4: Time for SDN,” IEEE Transactions on Network and Service Management (TNSM), 2016. [9] ——, “Time-based updates in software defined networks,” in ACM HotSDN, 2013. [10] ——, “Time-based Updates in OpenFlow: A Proposed Extension to the OpenFlow Protocol,” technical report, 2013. [Online]. Available: http://tx.technion.ac.il/∼dew/OFTimeTR.pdf [11] IEEE TC 9, “1588 IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems Version 2,” 2008. [12] H. Li, “IEEE 1588 time synchronization deployment for mobile backhaul in China Mobile,” IEEE ISPCS, keynote presentation, 2014. [13] ITU-T G.8271/Y.1366, “Time and phase synchronization aspects of packet networks,” 2012. [14] IEEE Std C37.238, “IEEE Standard Profile for Use of IEEE 1588 Precision Time Protocol in Power System Applications,” 2011. [15] F. Long, Z. Sun, Z. Zhang, H. Chen, and L. Liao, “Research on TCAMbased OpenFlow switch platform,” in IEEE ICSAI, 2012. [16] Renesas, “R8A20410BG QUAD-Search TCAM,” datasheet, 2010. [Online]. Available: http://www.renesas.com/media/products/memory/ TCAM/r10pf0001eu0100 tcam.pdf [17] V. Srinivasan, G. Varghese, S. Suri, and M. Waldvogel, “Fast and scalable layer four switching,” in ACM SIGCOMM, 1998. [18] A. Bremler-Barr and D. Hendler, “Space-efficient TCAM-based classification using gray coding,” IEEE Trans. Computers, vol. 61, no. 1, pp. 18–30, 2012. [19] T. Mizrahi, O. Rottenstreich, and Y. Moses, “TimeFlip: Using Timestamp-based TCAM Ranges to Accurately Schedule Network Updates,” technical report, arXiv preprint, 2016.

15

[20] P. Franc¸ois and O. Bonaventure, “Avoiding transient loops during the convergence of link-state routing protocols,” IEEE/ACM Trans. Netw., vol. 15, no. 6, pp. 1280–1292, 2007. [21] M. Reitblatt, N. Foster, J. Rexford, C. Schlesinger, and D. Walker, “Abstractions for network update,” in ACM SIGCOMM, 2012. [22] C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer, “Achieving high utilization with software-driven WAN,” in ACM SIGCOMM, 2013. [23] T. Mizrahi and Y. Moses, “Time4: Time for SDN,” technical report, arXiv preprint arXiv:1505.03421, 2015. [24] T. Mizrahi, E. Saat, and Y. Moses, “Timed consistent network updates,” in ACM SIGCOMM Symposium on SDN Research (SOSR), 2015. [25] ——, “Timed consistent network updates in software defined networks,” IEEE/ACM Transactions on Networking (ToN), 2016. [26] T. Mizrahi and Y. Moses, “The case for data plane timestamping in sdn,” in IEEE INFOCOM Workshop on Software-Driven Flexible and Agile Networking (SWFAN), 2016. [27] J. C. Corbett, J. Dean, M. Epstein et al., “Spanner: Google’s globallydistributed database,” in OSDI, 2012. [28] A. G. Greenberg, G. Hj´almt´ysson, D. A. Maltz, A. Myers, J. Rexford, G. G. Xie, H. Yan, J. Zhan, and H. Zhang, “A clean slate 4D approach to network control and management,” ACM Computer Communication Review, vol. 35, no. 5, pp. 41–54, 2005. [29] A. Atlas, T. Nadeau, and D. Ward, “Interface to the routing system problem statement,” IETF, draft-atlas-i2rs-problem-statement-01, work in progress, 2013. [30] “Minutes for IETF87, SDNRG meeting,” IETF, meeting minutes, 2013. [Online]. Available: http://www.ietf.org/mail-archive/web/sdn/current/ msg00260.html [31] O. Rottenstreich, R. Cohen, D. Raz, and I. Keslassy, “Exact worst case TCAM rule expansion,” IEEE Trans. Computers, vol. 62, no. 6, pp. 1127–1140, 2013. [32] H. Liu, “Efficient mapping of range classifier into ternary-cam,” in IEEE Hot Interconnects, 2002. [33] A. Bremler-Barr, D. Hay, and D. Hendler, “Layered interval codes for TCAM-based classification,” Computer Networks, vol. 56, no. 13, pp. 3023–3039, 2012. [34] K. Kogan, S. I. Nikolenko, O. Rottenstreich, W. Culhane, and P. T. Eugster, “Exploiting order independence for scalable and expressive packet classification,” IEEE/ACM Trans. Netw., vol. 24, no. 2, pp. 1251– 1264, 2016. [35] N. Kang, O. Rottenstreich, S. G. Rao, and J. Rexford, “Alpaca: Compact network policies with attribute-carrying addresses,” in ACM CoNext, 2015. [36] O. Rottenstreich and J. Tapolcai, “Optimal rule caching and lossy compression for longest prefix matching,” IEEE/ACM Trans. Netw., 2017. [37] O. Rottenstreich et al., “Compressing forwarding tables for datacenter scalability,” IEEE Journal on Selected Areas in Communications (JSAC), vol. 32, no. 1, pp. 138 – 151, 2014. [38] K. Kogan, S. I. Nikolenko, P. Eugster, A. Shalimov, and O. Rottenstreich, “FIB Efficiency in Distributed Platforms,” in IEEE ICNP, 2016. [39] E. Norige, A. X. Liu, and E. Torng, “A ternary unification framework for optimizing TCAM-based packet classification systems,” in ACM/IEEE ANCS, 2013. [40] C. R. Meiners, A. X. Liu, E. Torng, and J. Patel, “Split: Optimizing space, power, and throughput for TCAM-based classification,” in ACM/IEEE ANCS, 2011. [41] C. R. Meiners, A. X. Liu, and E. Torng, “Bit Weaving: A non-prefix approach to compressing packet classifiers in TCAMs,” IEEE/ACM Trans. Networking, vol. 20, no. 2, pp. 488–500, 2012. [42] D. Mills, J. Martin, J. Burbank, and W. Kasch, “RFC 5905: Network time protocol version 4: Protocol and algorithms specification,” IETF, 2010. [43] O. Rottenstreich, I. Keslassy, A. Hassidim, H. Kaplan, and E. Porat, “Optimal in/out TCAM encodings of ranges,” IEEE/ACM Trans. Netw., vol. 24, no. 1, pp. 555–568, 2016. [44] “Marvell Prestera 98DX4251 Product Brief,” http://www.marvell.com/ switching/assets/Marvell Prestera 98DX4251-02 product brief final2. pdf, 2013. [45] ITU-T G.8275.1, “Precision time protocol telecom profile for phase/time synchronization with full timing support from the network,” 2014. [46] D. L. Mills, “Internet time synchronization: the network time protocol,” Communications, IEEE Transactions on, 1991. [47] C. Rotsos, N. Sarrar, S. Uhlig, R. Sherwood, and A. W. Moore, “OFLOPS: An open framework for openflow switch evaluation,” in Passive and Active Measurement (PAM), 2012.

[48] K. Lakshminarayanan, A. Rangarajan, and S. Venkatachary, “Algorithms for advanced packet classification with ternary cams,” in ACM SIGCOMM, 2005.

Tal Mizrahi is a PhD student at the Technion. He is also a switch architect at Marvell, with over 15 years of experience in networking. Tal is an active participant in the Internet Engineering Task Force (IETF), and in the IEEE 1588 working group. His research interests include network protocols, switch and router architecture, time synchronization, and distributed systems.

Ori Rottenstreich is a Postdoctoral Research Fellow at the department of Computer Science, Princeton university, working with Prof. Jennifer Rexford. He received the B.S. in Computer Engineering (summa cum laude) and Ph.D. degree from the Electrical Engineering department of the Technion, Haifa, Israel in 2008 and 2014, respectively. He is a recipient of the Rothschild Yad-Hanadiv postdoctoral fellowship, the Google Europe PhD Fellowship in Computer Networking and the Best Paper Runner Up Award at the IEEE Infocom 2013 conference.

Yoram Moses is the Israel Pollak academic chair and a professor of electrical engineering at the Technion. His research focuses on distributed and multi-agent systems, with a focus on fault-tolerance and on applications of knowledge and time in such systems. He is a co-author of the book Reasoning about Knowledge, recipient of the G¨odel prize in 1997 and the Dijkstra prize in 2009.

TIMEFLIP: Using Timestamp-Based TCAM Ranges to Accurately ...

(b-i). (b-ii). Fig. 13: RE DU CEDRAN GE: proof of Lemma 14. 11. it follows that TV0- ...... in IEEE INFOCOM Workshop on Software-Driven Flexible and Agile.

Download PDF

1MB Sizes 0 Downloads 151 Views

Report

TIMEFLIP: Using Timestamp-Based TCAM Ranges to Accurately ...

Recommend Documents