Live Streaming with Receiver-based Peer-division Multiplexing.pdf ...

Viewer
Transcript

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

1

Live Streaming with Receiver-based Peer-division Multiplexing Hyunseok Chang

Sugih Jamin

Wenjie Wang

Abstract—A number of commercial peer-to-peer systems for live streaming have been introduced in recent years. The behavior of these popular systems has been extensively studied in several measurement papers. Due to the proprietary nature of these commercial systems, however, these studies have to rely on a “black-box” approach, where packet traces are collected from a single or a limited number of measurement points, to infer various properties of traffic on the control and data planes. Although such studies are useful to compare different systems from end-user’s perspective, it is difficult to intuitively understand the observed properties without fully reverse-engineering the underlying systems. In this paper we describe the network architecture of Zattoo, one of the largest production live streaming providers in Europe at the time of writing, and present a large-scale measurement study of Zattoo using data collected by the provider. To highlight, we found that even when the Zattoo system was heavily loaded with as high as 20,000 concurrent users on a single overlay, the median channel join delay remained less than 2 to 5 seconds, and that, for a majority of users, the streamed signal lags over-the-air broadcast signal by no more than 3 seconds.

soon as the application assesses it has sufficient data buffered that, given the estimated download rate and the playback rate, it will not deplete the buffer before the end of file. If this assessment is wrong, the application would have to either pause playback and rebuffer, or slow down playback. While users would like playback to start as soon as possible, the application has some degree of freedom in trading off playback start time against estimated network capacity. Most video-ondemand systems are examples of delay-sensitive progressivedownload application. The third case, real-time live streaming, has the most stringent delay requirement. While progressive download may tolerate initial buffering of tens of seconds or even minutes, live streaming generally cannot tolerate more than a few seconds of buffering. Taking into account the delay introduced by signal ingest and encoding, and network transmission and propagation, the live streaming system can introduce only a few seconds of buffering time end-to-end and still be considered “live” [1].

Index Terms—Peer-to-peer system, live streaming, network architecture

The Zattoo peer-to-peer live streaming system was a freeto-use network serving over 3 million registered users in eight European countries at the time of study, with a maximum of over 60,000 concurrent users on a single channel. The system delivers live streams using a receiver-based, peerdivision multiplexing scheme as described in Section II. To ensure real-time performance when peer uplink capacity is below requirement, Zattoo subsidizes the network’s bandwidth requirement, as described in Section III. After delving into Zattoo’s architecture in detail, we study in Sections IV and V large-scale measurements collected during the live broadcast of the UEFA European Football Championship, one of the most popular one-time events in Europe, in June, 2008 [2]. During the course of the month of June 2008, Zattoo served more than 35 million sessions to more than one million distinct users. Drawing from these measurements, we report on the operational scalability of Zattoo’s live streaming system along several key issues:

I. I NTRODUCTION

T

HERE is an emerging market for IPTV. Numerous commercial systems now offer services over the Internet that are similar to traditional over-the-air, cable, or satellite TV. Live television, time-shifted programming, and content-ondemand are all presently available over the Internet. Increased broadband speed, growth of broadband subscription base, and improved video compression technologies have contributed to the emergence of these IPTV services. We draw a distinction between three uses of peer-to-peer (P2P) networks: delay tolerant file download of archival material, delay sensitive progressive download (or streaming) of archival material, and real-time live streaming. In the first case, the completion of download is elastic, depending on available bandwidth in the P2P network. The application buffer receives data as it trickles in and informs the user upon the completion of download. The user can then start playing back the file for viewing in the case of a video file. Bittorrent and variants are example of delay-tolerant file download systems. In the second case, video playback starts as H. Chang is with Alcatel-Lucent Bell Labs, Holmdel, NJ 07733 USA (email: [email protected]). S. Jamin is with EECS Department, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: [email protected]). W. Wang is with IBM Ressearch, CRL, Beijing 100193, China (e-mail: [email protected]). This work was done when authors Chang and Wang were at Zattoo Inc.

1) How does the system scale in terms of overlay size and its effectiveness in utilizing peers’ uplink bandwidth? 2) How responsive is the system during channel switching, for example, when compared to the 3-second channel switch time of satellite TV? 3) How effective is the packet retransmission scheme in allowing a peer to recover from transient congestion? 4) How effective is the receiver-based peer-division multiplexing scheme in delivering synchronized sub-streams? 5) How effective is the global bandwidth subsidy system in provisioning for flash crowd scenarios?

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011 2

6) Would a peer further away from the stream source experience adversely long lag compared to a peer closer to the stream source? 7) How effective is error-correcting code in isolating packet losses on the overlay? We also discuss in Section VI several challenges in increasing the bandwidth contribution of Zattoo peers. Finally, we describe related works in Section VII and conclude in Section VIII.

Encoding Servers

Demultiplexer

Authentication Server Rendezvous Server

II. S YSTEM A RCHITECTURE The Zattoo system rebroadcasts live TV, captured from satellites, onto the Internet. The system carries each TV channel on a separate peer-to-peer delivery network and is not limited in the number of TV channels it can carry. Although a peer can freely switch from one TV channel to another, and thereby departing and joining different peer-to-peer networks, it can only join one peer-to-peer network at any one time. We henceforth limit our description of the Zattoo delivery network as it pertains to carrying one TV channel. Fig. 1 shows a typical setup of a single TV channel carried on the Zattoo network. TV signal captured from satellite is encoded into H.264/AAC streams, encrypted, and sent onto the Zattoo network. The encoding server may be physically separated from the server delivering the encoded content onto the Zattoo network. For ease of exposition, we will consider the two as logically co-located on an Encoding Server. Users are required to register themselves at the Zattoo website to download a free copy of the Zattoo player application. To receive the signal of a channel, the user first authenticates itself to the Zattoo Authentication Server. Upon authentication, the user is granted a ticket with limited lifetime. The user then presents this ticket, along with the identity of the TV channel of interest, to the Zattoo Rendezvous Server. If the ticket specifies that the user is authorized to receive signal of the said TV channel, the Rendezvous Server returns to the user a list of peers currently joined to the P2P network carrying the channel, together with a signed channel ticket. If the user is the first peer to join a channel, the list of peers it receives contain only the Encoding Server. The user joins the channel by contacting the peers returned by the Rendezvous Server, presenting its channel ticket, and obtaining the live stream of the channel from them (see Section II-A for details). Each live stream is sent out by the Encoding Server as n logical sub-streams. The signal received from satellite is encoded into a variable-bit rate stream. During periods of source quiescence, no data is generated. During source busy periods, generated data is packetized into a packet stream, with each packet limited to a maximum size. The Encoding Server multiplexes this packet stream onto the Zattoo network as n logical sub-streams. Thus the first packet generated is considered part of the first sub-stream, the second packet that of the second sub-stream, the n-th packet that of the n-th substream. The n+1-th packet cycles back to the first sub-stream, etc. such that the i-th sub-stream carries the mn+i-th packets, where m ≥ 0, 1 ≤ i ≤ n, and n a user-defined constant. We call a set of n packets with the same index multiplier m

Feedback Server other admin servers

Fig. 1.

Zattoo delivery network architecture.

a “segment.” Thus m serves as the segment index, while i serves as the packet index within a segment. Each segment is of size n packets. Being the packet index, i also serves as the sub-stream index. The number mn + i is carried in each packet as its sequence number. Zattoo uses the Reed-Solomon (RS) error correcting code (ECC) for forward error correction. The RS code is a systematic code: of the n packets sent per segment, k < n packets carry the live stream data while the remainder carries the redundant data [3, Section 7.3]. Due to the variable-bit rate nature of the data stream, the time period covered by a segment is variable, and a packet may be of size less than the maximum packet size. A packet smaller than the maximum packet size is zero-padded to the maximum packet size for the purposes of computing the (shortened) RS code, but is transmitted in its original size. Once a peer has received k packets per segment, it can reconstruct the remaining n − k packets. We do not differentiate between streaming data and redundant data in our discussion in the remainder of this paper. When a new peer requests to join an existing peer, it specifies the sub-stream(s) it would like to receive from the existing peer. These sub-streams do not have to be consecutive. Contingent upon availability of bandwidth at existing peers, the receiving peer decides how to multiplex a stream onto its set of neighboring peers, giving rise to our description of the Zattoo live streaming protocol as a receiver-based, peerdivision multiplexing protocol. The details of peer-division multiplexing is described in Section II-A while the details of how a peer manages sub-stream forwarding and stream reconstruction is described in Section II-B. Receiver-based peerdivision multiplexing has also been used by the latest version of CoolStreaming peer-to-peer protocol though it differs from Zattoo in its stream management (Section II-B) and adaptive behavior (Section II-C) [4]. A. Peer-Division Multiplexing To minimize per-packet processing time of a stream, the Zattoo protocol sets up a virtual circuit with multiple fan outs at each peer. When a peer joins a TV channel, it establishes a peer-division multiplexing (PDM) scheme amongst a set of neighboring peers, by building a virtual circuit to each of the

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

neighboring peers. Baring departure or performance degradation of a neighbor peer, the virtual circuits are maintained until the joining peer switches to another TV channel. With the virtual circuits set up, each packet is forwarded without further per-packet handshaking between peers. We describe the PDM boot strapping mechanism in this section and the adaptive PDM mechanism to handle peer departure and performance degradation in Section II-C. The PDM establishment process consists of two phases: the search phase and the join phase. In the search phase, the new, joining peer determines its set of potential neighbors. In the join phase, the joining peer requests peering relationships with a subset of its potential neighbors. Upon acceptance of a peering relationship request, the peers become neighbors and a virtual circuit is formed between them. Search phase. To obtain a list of potential neighbors, a joining peer sends out a SEARCH message to a random subset of the existing peers returned by the Rendezvous Server. The SEARCH message contains the sub-stream indices for which this joining peer is looking for peering relationships. The substream indices is usually represented as a bitmask of n bits, where n is the number of sub-streams defined for the TV channel. In the beginning, the joining peer will be looking for peering relationships for all sub-streams and have all the bits in the bitmask turned on. In response to a SEARCH message, an existing peer replies with the number of sub-streams it can forward. From the returning SEARCH replies, the joining peer constructs a set of potential neighbors that covers the full set of sub-streams comprising the live stream of the TV channel. The joining peer continues to wait for SEARCH replies until the set of potential neighbors contains at least a minimum number of peers, or until all SEARCH replies have been received. With each SEARCH reply, the existing peer also returns a random subset of its known peers. If a joining peer cannot form a set of potential neighbors that covers all of the substreams of the TV channel, it initiates another SEARCH round, sending SEARCH messages to peers newly learned from the previous round. The joining peer gives up if it cannot obtain the full stream after two SEARCH rounds. To help the joining peer synchronize the sub-streams it receives from multiple peers, each existing peer also indicates for each sub-stream the latest sequence number it has received for that sub-stream, and the existence of any quality problem. The joining peer can then choose sub-streams with good quality that are closely synchronized. Join phase. Once the set of potential neighbors is established, the joining peer sends JOIN requests to each potential neighbor. The JOIN request lists the sub-streams for which the joining peer would like to construct virtual circuit with the potential neighbor. If a joining peer has l potential neighbors, each willing to forward it the full stream of a TV channel, it would typically choose to have each forward only 1/l-th of the stream, to spread out the load amongst the peers and to speed up error recovery, as described in Section II-C. In selecting which of the potential neighbors to peer with, the joining peer gives highest preference to topologically close-by peers, even if these peers have less capacity or carry lower quality sub-streams. The “topological” location of a peer is defined

3

PDM

Zattoo Peer and Player Application

Fig. 2.

Zattoo peer with IOB.

to be its subnet number, autonomous system (AS) number, and country code, in that order of precedence. A joining peer obtains its own topological location from the Zattoo Authentication Server as part of its authentication process. The list of peers returned by both the Rendezvous Server and potential neighbors all come attached with topological locations. A topology-aware overlay not only allows us to be “ISP-friendly,” by minimizing inter-domain traffic and thus save on transit bandwidth cost, but also helps reduce the number of physical links and metro hops traversed in the overlay network, potentially resulting in enhanced userperceived stream quality. B. Stream Management We represent a peer as a packet buffer, called the IOB, fed by sub-streams incoming from the PDM constructed as described in Section II-A.1 The IOB drains to (1) a local media player if one is running, (2) a local file if recording is supported, and (3) potentially other peers. Fig. 2 depicts a Zattoo player application with virtual circuits established to four peers. As packets from each sub-stream arrive at the peer, they are stored in the IOB for reassembly to reconstruct the full stream. Portions of the stream that have been reconstructed are then played back to the user. In addition to providing a reassembly area, the IOB also allows a peer to absorb some variabilities in available network bandwidth and network delay. The IOB is referenced by an input pointer, a repair pointer, and one or more output pointers. The input pointer points to the slot in the IOB where the next incoming packet with sequence number higher than the highest sequence number 1 In the case of the Encoding Server, which we also consider a peer on the Zattoo network, the buffer is fed by the encoding process.

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

received so far will be stored. The repair pointer always points one slot beyond the last packet received in order and is used to regulate packet retransmission and adaptive PDM as described later. We assign an output pointer to each forwarding destination. The output pointer of a destination indicates the destination’s current forwarding horizon on the IOB. In accordance to the three types of possible forwarding destinations listed above, we have three types of output pointers: player pointer, file pointer, and peer pointer. One would typically have at most one player pointer and one file pointer but potentially multiple concurrent peer pointers, referencing an IOB. The Zattoo player application does not currently support recording. Since we maintain the IOB as a circular buffer, if the incoming packet rate is higher than the forwarding rate of a particular destination, the input pointer will overrun the output pointer of that destination. We could move the output pointer to match the input pointer so that we consistently forward the oldest packet in the IOB to the destination. Doing so, however, requires checking the input pointer against all output pointers on every packet arrival. Instead, we have implemented the IOB as a double buffer. With the double buffer, the positions of the output pointers are checked against that of the input pointer only when the input pointer moves from one sub-buffer to the other. When the input pointer moves from sub-buffer a to subbuffer b, all the output pointers still pointing to sub-buffer b are moved to the start of sub-buffer a and sub-buffer b is flushed, ready to accept new packets. When a sub-buffer is flushed while there are still output pointers referencing it, packets that have not been forwarded to the destinations associate with those pointers are lost to them, resulting in quality degradation. To minimize packet lost due to sub-buffer flushing, we would like to use large sub-buffers. However, the real-time delay requirement of live streaming limits the usefulness of late arriving packets and effectively puts a cap on the maximum size of the sub-buffers. Different peers may request for different numbers of, possibly non-consecutive, sub-streams. To accommodate the different forwarding rates and regimes required by the destinations, we associate a packet map and forwarding discipline with each output pointer. Fig. 3 shows the packet map associated with an output peer pointer where the peer has requested sub-streams 1, 4, 9, and 14. Every time a peer pointer is repositioned to the beginning of a sub-buffer of the IOB, all the packet slots of the requested sub-streams are marked NEEDed and all the slots of the sub-streams not requested by the peer are marked SKIP. When a NEEDed packet arrives and is stored in the IOB, its state in the packet map is changed to READY. As the peer pointer moves along its associated packet map, READY packets are forwarded to the peer and their states changed to SENT. A slot marked NEEDed but not READY, such as slot n + 4 in Fig. 3, indicates that the packet is lost or will arrive out-of-order and is bypassed. When an out-oforder packet arrives, its slot is changed to READY and the peer pointer is reset to point to this slot. Once the out-of-order packet has been sent to the peer, the peer pointer will move forward, bypassing all SKIP, NEED, and SENT slots until it reaches the next READY slot, where it can resume sending.

4

1

4

9

14

n

segment 0

segment m-1

Fig. 3.

SKIP

NEED

SENT

READY

Packet map associated with a peer pointer.

IOB

segment size: n File

repair pointer segment 0

Packet Map

...

Peer0 Packet Map

segment m-1 sub-buffer 0

Peer1 Packet Map

segment 0 ...

input pointer

Player Packet Map

segment m-1 sub-buffer 1 received packet

Fig. 4.

empty slot

IOB, input/output pointers and packet maps.

The player pointer behaves the same as a peer pointer except that all packets in its packet map will always start out marked NEEDed. Fig. 4 shows an IOB consisting of a double buffer, with an input pointer, a repair pointer, and an output file pointer, an output player pointer, and two output peer pointers referencing the IOB. Each output pointer has a packet map associated with it. For the scenario depicted in the figure, the player pointer tracks the input pointer and has skipped over some lost packets. Both peer pointers are lagging the input pointer, indicating that the forwarding rates to the peers are bandwidth limited. The file pointer is pointing at the first lost packet. Archiving a live stream to file does not impose real-time delay bound on packet arrivals. To achieve the best quality recording possible, a recording peer always waits for retransmission of lost packets that cannot be recovered by error correction. In addition to achieving lossless recording, we use retransmission to let a peer recover from transient network congestion. A peer sends out a retransmission request when the distance between the repair pointer and the input pointer has reached a threshold of R packet slots, usually spanning multiple segments. A retransmission request consists of an Rbit packet mask, with each bit representing a packet, and the sequence number of the packet corresponding to the first bit. Marked bits in the packet mask indicate that the corresponding packets need to be retransmitted. When a packet loss is detected, it could be caused by congestion on the virtual circuits forming the current PDM or congestion on the path beyond the neighboring peers. In either case, current neighbor

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

peers will not be good sources of retransmitted packets. Hence we send our retransmission requests to r random peers that are not neighbor peers. A peer receiving a retransmission request will honor the request only if the requested packets are still in its IOB and it has sufficient left-over capacity, after serving its current peers, to transmit all the requested packets. Once a retransmission request is accepted, the peer will retransmit all the requested packets to completion. C. Adaptive PDM While we rely on packet retransmission to recover from transient congestions, we have two channel capacity adjustment mechanisms to handle longer-term bandwidth fluctuations. The first mechanism allows a forwarding peer to adapt the number of sub-streams it will forward given its current available bandwidth, while the second allows the receiving peer to switch provider at the sub-stream level. Peers on the Zattoo network can redistribute a highly variable number of sub-streams, reflecting the high variability in uplink bandwidth of different access network technologies. For a full-stream consisting of sixteen constant-bit rate substreams, our prior study show that based on realistic peer characteristics measured from the Zattoo network, half of the peers can support less than half of a stream, 82% of peers can support less than a full-stream, and the remainder can support up to ten full streams (peers that can redistribute more than a full stream is conventionally known as supernodes in the literature) [5]. With variable-bit rate streams, the bandwidth carried by each sub-stream is also variable. To increase peer bandwidth usage, without undue degradation of service, we instituted measurement-based admission control at each peer. In addition to controlling resource commitment, another goal of the measurement-based admission control module is to continually estimate the amount of available uplink bandwidth at a peer. The amount of available uplink bandwidth at a peer is initially estimated by the peer sending a pair of probe packets to Zattoo’s Bandwidth Estimation Server. Once a peer starts forwarding sub-streams to other peers, it will receive from those peers quality-of-service feedbacks that inform its update of available uplink bandwidth estimate. A peer sends qualityof-service feedback only if the quality of a sub-stream drops below a certain threshold.2 Upon receiving quality feedback from multiple peers, a peer first determines if the identified sub-streams are arriving in low quality. If so, the low quality of service may not be caused by limit on its own available uplink bandwidth; in which case, it ignores the low quality feedbacks. Otherwise, the peer decrements its estimate of available uplink bandwidth. If the new estimate is below the bandwidth needed to support existing number of virtual circuits, the peer closes 2 Depending on a peer’s NAT and/or firewall configuration, Zattoo uses either UDP or TCP as the underlying transport protocol. The quality of a substream is measured differently for UDP and TCP. A packet is considered lost under UDP if it doesn’t arrive within a fixed threshold. The quality measure for UDP is computed as a function of both the packet lost rate and the burst error rate (number of contiguous packet losses). The quality measure for TCP is defined to be how far behind a peer is, relative to other peers, in serving its sub-streams.

5

a virtual circuit. To reduce the instability introduced into the network, a peer closes first the virtual circuit carrying the smallest number of sub-streams. A peer attempts to increase its available uplink bandwidth estimate periodically: if it has fully utilized its current estimate of available uplink bandwidth without triggering any bad quality feedback from neighboring peers. A peer doubles the estimated available uplink bandwidth if current estimate is below a threshold, switching to linear increase above the threshold, similar to how TCP maintains its congestion window size. A peer also increases its estimate of available uplink bandwidth if a neighbor peer departs the network without any bad quality feedback. When the repair pointer lags behind the input pointer by R packet slots, in addition to initiating a retransmission request, a peer also computes a loss rate over the R packets. If the loss rate is above a threshold, the peer considers the neighbor slow and attempts to reconfigure its PDM. In reconfiguring its PDM, a peer attempts to shift half of the sub-streams currently forwarded by the slow neighbor to other existing neighbors. At the same time, it searches for new peer(s) to forward these sub-streams. If new peer(s) are found, the load will be shifted from existing neighbors to the new peer(s). If sub-streams from the slow neighbor continues to suffer after the reconfiguration of the PDM, the peer will drop the neighbor completely and initiate another reconfiguration of the PDM. When a peer loses a neighbor due to reduced available uplink bandwidth at the neighbor or due to neighbor departure, it also initiates a PDM reconfiguration. A peer may also initiate a PDM reconfiguration to switch to a topologically closer peer. Similar to the PDM establishment process, PDM reconfiguration is accomplished by peers exchanging substream bitmasks in a request/response handshake, with each bit of the bitmask representing a sub-stream. During and after a PDM reconfiguration, slow neighbor detection is disabled for a short period of time to allow for the system to stabilize. III. G LOBAL BANDWIDTH S UBSIDY S YSTEM Each peer on the Zattoo network is assumed to serve a user through a media player, which means that each peer must receive, and can potentially forward, all n sub-streams of the TV channel the user is watching. The limited redistribution capacity of peers on the Zattoo network means that a typical client can contribute only a fraction of the substreams that make up a channel. This shortage of bandwidth leads to a global bandwidth deficit in the peer-to-peer network. Whereas bittorrent-like delay-tolerant file downloads or the delay-sensitive progressive download of video-on-demand applications can mitigate such global bandwidth shortage by increasing download time, a live streaming system such as Zattoo’s must subsidize the bandwidth shortfall to provide real-time delivery guarantee. Zattoo’s Global Bandwidth Subsidy System (or simply, the Subsidy System), consists of a global bandwidth monitoring subsystem, a global bandwidth forecasting and provisioning subsystem, and a pool of Repeater nodes. The monitoring subsystem continuously monitors the global bandwidth requirement of a channel. The forecasting and provisioning

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

subsytem projects global bandwidth requirement based on measured history and allocates Repeater nodes to the channel as needed. The monitoring and provisioning of global bandwidth is complicated by two highly varying parameters over time, client population size and peak streaming rate, and one varying parameter over space, available uplink bandwidth, which is network-service provider dependent. Forecasting of bandwidth requirement is a vast subject in itself. Zattoo adopted a very simple mechanism, described in Section III-B which has performed adequately in provisioning the network for both daily demand fluctuations and flash crowds scenarios (see Section IV-C). When a bandwidth shortage is projected for a channel, the Subsidy System assigns one or more Repeater nodes to the channel. Repeater nodes function as bandwidth multiplier, to amplify the amount of available bandwidth in the network. Each Repeater node serves at most one channel at a time; it joins and leaves a given channel at the behest of the Subsidy System. Repeater nodes receive and serve all n sub-streams of the channel they join, run the same PDM protocol, and are treated by actual peers like any other peers on the network; however, as bandwidth amplifiers, they are usually provisioned to contribute more uplink bandwidth than the download bandwidth they consume. The use of Repeater nodes makes the Zattoo network a hybrid P2P and content distribution network. We next describe the bandwidth monitoring subsystem of the Subsidy System, followed by design of the simple bandwidth projection and Repeater node assignment subsystem. A. Global Bandwidth Measurement The capacity metric of a channel is the tuple (D, C), where D is the aggregate download rates required by all users on the channel, and C is the aggregate upload capacity of those users. Usually C < D and the difference between the two is the global bandwidth deficit of the channel. Since channel viewership changes over time as users join or leave the channel, we need a scalable means to measure and update the capacity metric. We rely on aggregating the capacity metric up the peer division multiplexing tree. Each peer in the overlay periodically aggregates the capacity metric reported by all its downstream receiver peers, adds its own capacity measure (D, C) to the aggregate, and forwards the resulting capacity metric upstream to its forwarding peers. By the time the capacity metric percolates up to the Encoding Server, it contains the total download and upload rate aggregates of the whole streaming overlay. The Encoding Server then simply forwards the obtained (D, C) to the Subsidy Server. B. Global Bandwidth Projection and Provisioning For each channel, the Subsidy Server keeps a history of the capacity metric (D, C) reports received from the channel’s Encoding Server. The channel utilization ratio (U ) is the ratio D over C. Based on recent movements of the ratio U , we classify the capacity trend of each channel into the following four categories.

6

Stable: the ratio U has remained within [S-ǫ, S+ǫ] for the past Ts reporting periods. • Exploding: the ratio U increased by at least E between Te (e.g., Te = 2) reporting periods. • Increasing: the ratio U has steadily increased by I (I ≪ E) over the past Ti reporting periods. • Falling: the ratio U has decreased by F over the past Tf reporting periods. Orthogonal to the capacity-trend based classification above, each channel is further categorized in terms of its capacity utilization ratio as follows. • Under-utilized: the ratio U is below the low threshold L, e.g., U ≤ 0.5. • Near Capacity: the ratio U is almost 1.0, e.g., U ≥ 0.9. If the capacity trend of a channel has been “Exploding,” one or more Repeater nodes will be assigned to it immediately. If the channel’s capacity trend has been “Increasing,” it will be assigned a Repeater node with a smaller capacity. The goal of the Subsidy System is to keep a channel below “Near Capacity.” If a channel’s capacity trend is “Stable” and the channel is “Under-utilized,” the Subsidy System attempts to free Repeater nodes (if any) assigned to the channel. If a channel’s capacity utilization is “Falling,” the Subsidy System waits for the utilization to stabilize before reassigning any Repeater nodes. Each Repeater node periodically sends a keep-alive message to the Subsidy System. The keep-live message tells the Subsidy System which channel the Repeater node is serving, plus its CPU and capacity utilization ratio. This allows the Subsidy System to monitor the health of Repeater nodes and to increase the stability of the overlay during Repeater reassignment. When reassigning Repeater nodes from a channel, the Subsidy System will start from the Repeater node with the lowest utilization ratio. It will notify the selected Repeater node to stop accepting new peers and then to leave the channel after a specified time. In addition to Repeater nodes, the Subsidy System may recognize extra capacity from idle peers whose owners are not actively watching a channel. However, our previous study shows that a large number of idle peers are required to make any discernible impact on the global bandwidth deficit of a channel [5]. Our current Subsidy System therefore does not solicit bandwidth contribution from idle peers. •

IV. S ERVER - SIDE M EASUREMENTS In the Zattoo system, two separate centralized collector servers collect usage statistics and error reports, which we call the “stats” server and the “user-feedback” server respectively. The “stats” server periodically collects aggregated player statistics from individual peers, from which full session logs are constructed and entered into a session database. The session database gives a complete picture of all past and present sessions served by the Zattoo system. A given database entry contains statistics about a particular session, which includes join time, leave time, uplink bytes, download bytes, and channel name associated with the session. We study the sessions generated on three major TV channels from

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

7

TABLE I S ESSION DATABASE (6/1/2008–6/30/2008). Channel ARD Cuatro SF2

# sessions 2,102,638 1,845,843 1,425,285

TABLE III AVERAGE SHARING RATIO .

# distinct users 298,601 268,522 157,639

Channel ARD Cuatro SF2

Average sharing ratio Off-peak Peak 0.335 0.313 0.242 0.224 0.277 0.222

TABLE II F EEDBACK LOGS (6/20/2008–6/29/2008). Channel ARD Cuatro SF2

# feedback logs 871 2,922 656

# sessions 1,253 4,568 1,140

three different countries (Germany, Spain, and Switzerland), from June 1st to June 30th, 2008. Throughout the paper, we label those channels from Germany, Spain, and Switzerland as ARD, Cuatro, and SF2, respectively. Euro 2008 games were held during this period, and those three channels broadcast a majority of the Euro 2008 games including the final match. See Table I for information about the collected session data sets. The “user-feedback” server, on the other hand, collects users’ error logs submitted asynchronously by users. The “user feedback” data here is different from peer’s quality feedback used in PDM reconfiguration described in Section II-C. Zattoo player maintains an encrypted log file which contains, for debugging purposes, detailed behavior of client-side P2P engine, as well as history of all the streaming sessions initiated by a user since the player startup. When users encounter any error while using the player, such as log-in error, join failure, bad quality streaming etc., they can choose to report the error by clicking a “Submit Feedback” button on the player, which causes the Zattoo player to send the generated log file to the user-feedback server. Since a given feedback log not only reports on a particular error, but also describes “normal” sessions generated prior to the occurrence of the error, we can study user’s viewing experience (e.g., channel join delay) from the feedback logs. Table II describes the feedback logs collected from June 20th to June 29th. A given feedback log can contain multiple sessions (for the same or different channels), depending on user’s viewing behavior. The second column in the table represents the number of feedback logs which contain at least one session generated on the channel listed in the corresponding entry in the first column. The numbers in the third column indicate how many distinct sessions generated on said channel are present in the feedback logs. A. Overlay Size and Sharing Ratio We first study how many concurrent users are supported by the Zattoo system, and how much bandwidth is contributed by them. For this purpose, we use the session database presented in Table I. By using the join/leave timestamps of the collected sessions, we calculate the number of concurrent users on a given channel at time i. Then we calculate the average sharing ratio of the given channel at the same time. The average

sharing ratio is defined as total users’ uplink rate divided by their download rate on the channel. A sharing ratio of one means users contribute to other peers in the network as much traffic as they download at the time. We calculate the average sharing ratio from the total download/uplink bytes of the collected sessions. We first obtain all the sessions which are active across time i. We call a set of such sessions Si . Then assuming uplink/download bytes of each session are spread uniformly throughout the entire session duration, P we approximate the average sharing ratio at time i as P

i∈Si

i∈Si

uplink bytes(i)/duration(i)

download bytes(i)/duration(i)

.

Fig. 5 shows the overlay size (i.e., number of concurrent users) and average sharing ratio super-imposed across the month of June, 2008. According to the figure, the overlay size grew to more than 20,000 (e.g., 20,059 on ARD on 6/18 and 22,152 on Cuatro on 6/9). As opposed to the overlay size, the average sharing ratio tends to stay flatter throughout the month. Occasional spikes in the sharing ratio all occurred during 2AM to 6AM (GMT) when the channel usage is very low, and therefore may be considered statistically insignificant. By segmenting a 24-hour day into two time periods, e.g., off-peak hours (0AM-6PM) and peak hours (6PM-0AM), Table III shows the average sharing ratio in the two time periods separately. Zattoo’s usage during peak hours typically accounts for about 50% to 70% of the total usage of the day. According to the table, the average sharing ratio during peak hours is slightly lower than, but not very much different from during off-peak hours. Cuatro channel in Spain exhibits relatively lower sharing ratio than the two other channels. One bandwidth test site [6] reports that average uplink bandwidth in Spain is about 205 kbps, which is much lower than in Germany (582 kbps) and Switzerland (787 kbps). The lower sharing ratio on the Spanish channel may reflect regional difference in residential access network provisioning. The balance of the required bandwidth is provided by Zattoo’s Encoding Server and Repeater nodes. B. Channel Switching Delay When user clicks on a new channel button, it takes some time (a.k.a. channel switching delay) for the user to be able to start watching streamed video on Zattoo player. The channel switching delay has two components. First, Zattoo player needs to contact other available peers and retrieve all required sub-streams from them. We call the delay incurred during this stage “join delay.” Once all necessary sub-streams have been negotiated successfully, the player then needs to wait and buffer the minimum amount of streams (e.g., 3 seconds) before starting to show the video to the user. We call

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011 8

1.4 15000

1.2 1

10000

0.8 0.6

5000

20000 Overlay Size

1.6

1.4 15000

1.2 1

10000

0.8

5000

5

10

15

20

25

0

30

1.4 15000

5

1 0.8 0.6 5000

0.4

10

15

20

25

0.2 0

30

0 0

5

10

Day in 2008/6

(a) ARD

15

20

25

30

Day in 2008/6

(b) Cuatro

(c) SF2

Overlay size and sharing ratio.

1

1

1

Feedback Logs Session Database

Feedback Logs Session Database

Feedback Logs Session Database

0.6

0.6

0.6 CDF

0.8

CDF

0.8

CDF

0.8

0.4

0.4

0.4

0.2

0.2

0.2

0

0 0

2

4

6

8

10 12 14 16 18 20 22 24 Arrival Hour

(a) ARD Fig. 6.

1.2

10000

0 0

Day in 2008/6

Fig. 5.

1.6

0.2

0 0

1.8

20000

0.4

0.2 0

2 Overlay Size Sharing Ratio

1.6

0.6

0.4

25000

1.8

Sharing Ratio

2 Overlay Size Sharing Ratio

Sharing Ratio

20000 Overlay Size

25000

1.8

Overlay Size

2 Overlay Size Sharing Ratio

Sharing Ratio

25000

0 0

2

4

6

8

10 12 14 16 18 20 22 24

0

2

Arrival Hour

4

6

8

10 12 14 16 18 20 22 24 Arrival Hour

(b) Cuatro

(c) SF2

CDF of user arrival time.

the resulting wait time “buffering delay.” The total channel switching delay experienced by users is thus the sum of join delay and buffering delay. PPLive reports channel switching delay around 20 to 30 seconds, but can be as high as 2 minutes, of which join delay typically accounts for 10 to 15 seconds [7]. We measure the join delay experienced by Zattoo users from the feedback logs described in Table II. Debugging information contained in the feedback logs tells us when user clicked on a particular channel, and when the player has successfully joined the P2P overlay and starting to buffer content. One concern in relying on user-submitted feedback logs to infer join delay is the potential sampling bias associated with them. Users typically submit feedback logs when they encounter some kind of errors, and that brings up the question of whether the captured sessions are representative samples to study. We attempt to address this concern by comparing the data from feedback logs against those from the session database. The latter captures the complete picture of user’s channel watching behavior, and therefore can serve as a reference. In our analysis, we compare the user arrival time distribution obtained from the two data sets. For fair comparison, we used a subset of the session database which was generated during the same period when the feedback logs were collected (i.e., from June 20th to 29th). Fig. 6 plots the CDF distribution of user arrivals per hour obtained from feedback logs and session database separately. The steep slope of the distributions during hour 18-20 (68PM) indicates the high frequency of user arrivals during those hours. On ARD and Cuatro, the user arrival distributions inferred from feedback logs are almost identical to those from session database. On the other hand, on SF2, the distribution obtained from feedback logs tends to grow slowly during early

TABLE IV M EDIAN CHANNEL JOIN DELAY. Channel ARD Cuatro SF2

Median join delay Off-peak Peak 2.29 sec 1.96 sec 3.67 sec 4.48 sec 2.49 sec 2.67 sec

Maximum overlay size Off-peak Peak 2,313 19,223 2,357 8,073 1,126 11,360

hours, which indicates that feedback submission rate during off-peak hours on SF2 was relatively lower than normal. Later during peak hours, however, feedback submission rate picks up as expected, closely matching the actual user arrival rate. Based on this observation, we argue that feedback logs can serve as representative samples of daily user activities. Fig. 7 shows the CDF distributions of channel join delay for ARD, Cuatro and SF2 channels. We show the distributions for off-peak hours (0AM-6PM) and peak hours (6PM-0AM) separately. Median channel join delay is also presented in a similar fashion in Table IV. According to the CDF distributions, 80% of users experience less than 4 to 8 seconds of join delay, and 50% of users even less than 2 seconds of join delay. Also, Table IV shows that even a 10-fold increase on the number of concurrent users during peak hours does not unduly lengthen the channel join delay (up to 22% increase in median join delay). C. Repeater Node Assignment As illustrated in Fig. 5, the live coverage of Euro Cup games brought huge flash crowds to the Zattoo system. With typical users contributing only about 25–30% of the average streaming bitrate, Zattoo must subsidize the balance. As described in Section III, the Zattoo’s Subsidy System assigns Repeater

9

1

1

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

CDF

1 0.9

CDF

0.5 0.4

0.4

0.3

0.3

0.3

0.2

0.2 Off-Peak Hours Peak Hours

0.1 0

0.2 Off-Peak Hours Peak Hours

0.1 0

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

0

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Channel Join Delay (Sec)

Channel Join Delay (Sec)

(a) ARD

(b) Cuatro

(c) SF2

CDF of channel join delay.

20000

140 120

15000

100 80

10000

60 40

5000

20 0

160 20000

19

20

21

22

23

120 100 80

10000

60 40

5000

24

Hour of Day

(a) ARD Fig. 8.

140

15000

0 18

180

20 0

200

14

16

18

20

22

180 160

20000

0 12

Overlay Size Number of Repeaters

25000

24

Hour of Day

(b) Cuatro

140 120

15000

100 80

10000

60 40

5000

20 0

Number of Repeaters Assigned

160

200 Overlay Size Number of Repeaters

25000

Overlay Size

180

Number of Repeaters Assigned

200 Overlay Size Number of Repeaters

25000

Overlay Size

Fig. 7.

Off-Peak Hours Peak Hours

0.1

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Channel Join Delay (Sec)

Overlay Size

0.5

0.4

Number of Repeaters Assigned

CDF

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

0 18

19

20

21

22

23

24

Hour of Day

(c) SF2

Overlay size and channel provisioning.

nodes to channels that require more bandwidth than its own globally available aggregate upload bandwidth. Fig. 8 shows the Subsidy System assigning more Repeater nodes to a channel as flash crowds arrived during each Euro Cup game and then gradually reclaiming them as the flash crowd departed. For each of the channel reported, we choose a particular date with the biggest flash crowd on the channel. The dip in overlay sizes on ARD and Cuatro channels occurred during the half-time break of the games. The Subsidy Server was less aggressive in assigning Repeater nodes to the ARD channel, as compared to the other two channels, because the Repeater nodes in the vicinity of the ARD server have higher capacity than those near the other two. V. C LIENT- SIDE M EASUREMENTS To further study the P2P overlay beyond details obtainable from aggregated session-level statistics, we run several modified Zattoo clients which periodically retrieve the internal states of other participating peers in the network by exchanging SEARCH/JOIN messages with them. After a given probe session is over, the monitoring client archives a log file where we can analyze control/data traffic exchanged and detailed protocol behavior. We run the experiment during Zattoo’s live coverage of Euro 2008 (June 7th to 29th). The monitoring clients tuned to game channels from one of Zattoo’s data centers located in Switzerland while the games were broadcast live. The data sets presented in this paper were collected during the coverage of the championship final on two separate channels: ARD in Germany and Cuatro in Spain. Soccer teams from Germany and Spain participated in the championship final.

As described in Section II-A, Zattoo’s peer discovery is guided by peer’s topology information. To minimize potential sampling bias caused by our use of single vantage point for monitoring, we assigned “empty” AS number and country code to our monitoring clients, so that their probing is not geared towards those peers located in the same AS and country. A. Sub-Stream Synchrony To ensure good viewing quality, peer should not only obtain all necessary sub-streams (discounting redundant substreams), but also have those sub-streams delivered temporally synchronized with each other for proper online decoding. Receiving out-of-sync sub-streams typically results in pixelated screen on the player. As described in Sections 1 and II-C, Zattoo’s protocol favors sub-streams that are relatively insync when constructing the PDM, and continually monitors the sub-streams’ progression over time, replacing those substreams that have fallen behind and reconfiguring the PDM when necessary. In this section we measure the effectiveness of Zattoo’s Adaptive PDM in selecting sub-streams that are largely in-sync. To quantify such inter-sub stream synchrony, we measure the difference in the latest (i.e., maximum) packet sequence numbers belonging to different incoming sub-streams. When a remote peer responds to a SEARCH query message, it includes in its SEARCH reply the latest sequence numbers that it has received for all sub-streams. If some sub-streams happen to be lossy or stalled at that time, the peer marks such sub-streams in its SEARCH replies. Thus, we can inspect SEARCH replies from existing peers to study their inter-sub stream synchrony.

10

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 CDF

CDF

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

0.5

0.4

0.3

0.3 0.2

0.2 0.1 0

Euro 2008 final on ARD Euro 2008 final on Cuatro

Euro 2008 final on ARD Euro 2008 final on Cuatro

0.1 0 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Number of Bad Sub-streams

100 200 300 400 500 600 700 800 900 1000 Sub-stream Synchrony (# of Packets)

(b) CDF for sub-stream synchrony

(a) CDF for number of bad sub-streams Sub-stream synchrony.

In our experiment, we collected SEARCH replies from 4,420 and 6,530 distinct peers from ARD and Cuatro channels respectively, during the 2-hour coverage of the final game. From the collected SEARCH replies, we check how many sub-streams (out of 16) are “bad” (e.g., lossy or missing) for each peer. Fig. 9(a) shows the CDF distribution of the number of bad sub-streams. According to the figure, about 99% (ARD) and 96% (Cuatro) of peers have 3 or less bad sub-streams.

1 0.9

Euro 2008 final on Cuatro Euro 2008 final on ARD

0.8 0.7 0.6 CDF

Fig. 9.

0.5

0.4

0.5 0.4 0.3 0.2

Current Zattoo deployment dedicates k = 3 sub-streams (out of n = 16) for loss recovery purposes. That is, given a segment of 16 consecutive packets, if peer has received at least 13 packets, it can reconstruct the remaining 3 packets from the RS error correcting code (see Section 1). Thus if peer can receive any 13 sub-streams out of 16 reliably, it can decode the full stream properly. The result in Fig. 9 (a) suggests that the number of bad sub-streams is low enough as to not cause quality issues in the Zattoo network. After discounting “bad” sub-streams, we then look at the synchrony of the remaining “good” sub-streams in each peer. Fig. 9(b) shows the CDF distribution of the sub-stream synchrony in the two channels. Sub-stream synchrony of a given peer is defined as the difference between maximum and minimum packet sequence numbers among all sub-streams, which is obtained from the peer’s SEARCH reply. For example, if some peer has sub-stream synchrony measured at 100, it means that the peer has one sub-stream that is ahead of another substream by 100 packets. If all the packets are received in order, the sub-stream synchrony of a peer measures at most n − 1. If we received multiple SEARCH replies from the same peer, we average the sub-stream synchrony across all the replies. Given the 500 kbps average channel data rate, 60 consecutive packets roughly correspond to 1-second worth of streaming. Thus, the figure shows that on Cuatro channel, 20% of peers have their sub-streams completely in-sync, while more than 90% have their sub-streams lagging each other by at most 5 seconds; on ARD channel, 30% are in-sync, and more than 90% are within 1.5 seconds. The buffer space of Zattoo player has been dimensioned sufficiently to accommodate such low degree of out-of-sync sub-streams.

0.1 0 -350 -300 -250 -200 -150 -100 -50

0

50

100

Relative Peer Synchrony (# of Packets)

Fig. 10.

Peer synchrony.

B. Peer Synchrony While sub-stream synchrony tells us stream quality different peers may experience, “peer synchrony” tells us how varied in time peers’ viewing points are. With small scale P2P networks, all participating peers are likely to watch live streaming roughly synchronized in time. However, as the size of the P2P overlay grows, the viewing point of edge nodes may be delayed significantly compared to those closer to the Encoding Server. In the experiment, we define the viewing point of a peer as the median of the latest sequence numbers across its sub-streams. Then we choose one peer (e.g., a Repeater node directly connected to the Encoding Server) as a reference point, and compare other peers’ viewing point against the reference viewing point. Fig. 10 shows the CDFs of relative peer synchrony. The relative peer synchrony of peer X is obtained by the viewing point of X subtracted by the reference viewing point. So peer synchrony at -60 means that a given peer’s viewing point is delayed by 60 packets (roughly 1 second for a 500 kbps stream) compared to the reference viewing point. A positive viewing point means that a given peer’s stream gets ahead of the reference point, which could happen for peers which receive streams directly from the Encoding Server. The figure shows that about 1% of peers on ARD and 4% of peers on Cuatro experienced more than 3 seconds (i.e., 180 packets) delay in streaming compared to the reference viewing point.

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

1 0.9 0.8 0.7

11

1

Peer hops from ES: 6 Peer hops from ES: 5 Peer hops from ES: 4 Peer hops from ES: 3 Peer hops from ES: 2

0.9 0.8 0.7 0.6 CDF

CDF

0.6 0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 -350 -300 -250 -200 -150 -100 -50

0

50

100

Peer Synchrony

(a) ARD Fig. 11.

Peer hops from ES: 6 Peer hops from ES: 5 Peer hops from ES: 4 Peer hops from ES: 3 Peer hops from ES: 2

0 -350 -300 -250 -200 -150 -100 -50

0

50

100

Peer Synchrony

(b) Cuatro

Peer synchrony vs. peer-hop from the Encoding Server.

To better understand to what extent peer’s position in the overlay affects peer synchrony, we plot in Fig. 11 the CDFs of peer synchrony at different depths, i.e., distances from the Encoding Server. We look at how much delay is introduced for sub-streams traversing i peers from the Encoding Server (i.e., depth i). For this purpose, we associate per-sub stream viewing point information from SEARCH reply with per-sub stream overlay depth information from JOIN reply, where SEARCH/JOIN replies were sent from the same peer, close in time (e.g., less than 1 second apart). If the length of peer-hop from the Encoding Server has an adverse effect on playback delay and viewing point, we expect peers further away from the Encoding Server to be further offset from the reference viewing point, resulting in CDFs that are shifted to the upper left corner for peers at further distances from the Encoding Server. The figure shows an absence of such a leftward shift in the CDFs. The median delay at depth 6 does not grow by more than 0.5 seconds compared to the median delay at depth 2. The figure shows that having more peers further away from the Encoding Servers increases the number of peers that are 0.5 seconds behind the reference viewing point, without increasing the offset in viewing point itself. On the Zattoo network, once the virtual circuits comprising a PDM have been set up, each packet are streamed with minimal delay through each peer. Hence each peer-hop from the Encoding Server introduces delay only in tens of milliseconds range, attesting to the suitability of the network architecture to carry live media streaming on large-scale P2P networks.

C. Effectiveness of ECC in Isolating Loss Now we investigate the effects of overlay sizes on the performance scalability of the Zattoo system. Here we focus on client-side quality (e.g., loss rate). As described in Section 1, Zattoo-broadcast media streams are RS encoded, which allows peers to reconstruct a full stream once they obtain at least k of n sub-streams. Since the ECC-encoded stream reconstruction occurs hop by hop in the overlay, it can mask sporadic packet losses, and thus prevent packet losses from being propagated throughout the overlay at large.

To see if such packet loss containment actually occurs in the production system, we run the following experiments. We let our monitoring client join a game channel, stay tuned for 15 seconds, and then leave. We wait for 10 seconds after the 15-second session is over. We repeat this process throughout the 2-hour game coverage and collect the logs. A given log file from each session contains a complete list of packet sequence numbers received from the connected peers during the session, from which we can detect upstream packet losses. We then associate individual packet loss with the peer-path from the Encoding Server. This gives us a rough sense of whether packets traversing longer hops would experience higher loss rate. In our analysis, we discount packets delivered for the first 3 seconds of a session to allow the PDM to stabilize. Fig. 12 shows how the average packet loss rate changes across different overlay depths (in all cases < 1%). On both channels, the packet loss rate does not grow with overlay depth. This result confirms our expectation that ECC help localize packet losses on the overlay.

VI. P EER - TO -P EER S HARING R ATIO The premise of a given P2P system’s success in providing scalable stream distribution is sufficient bandwidth sharing from participating users [5]. Section IV-A shows that the average sharing ratio of Zattoo users ranges from 0.2 to 0.35, which translates into bandwidth uplinks ranging from 100 kbps to 175 kbps. This is far lower than the numbers reported as typical uplink bandwidth in countries where Zattoo is available [6]. Aside from the possibility that user’s bandwidth may be shared with other applications, we find factors such as users’ behavior, support for variable-bit rate encoding, and heterogeneous NAT environments contributing to suboptimal sharing performance. Designers of P2P streaming systems must pay attention to these factors to achieve good bandwidth sharing. However, one must also keep in mind that improving bandwidth sharing should not be at the expense of compromised user viewing experience, e.g., due to more frequent uplink bandwidth saturation.

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

0.7

0.6

0.6

0.5 Packet Loss Rate (%)

Packet Loss Rate (%)

12

0.5 0.4 0.3 0.2

0 1

2

3

4

5

6

7

0

Peer Hops from Encoding Server

1

2

3

4

5

6

7

Peer Hops from Encoding Server

(a) ARD

(b) Cuatro

Packet loss rate vs. peer hops from Encoding Server. 60

Average Receiver-Peer Churns

0.2

0 0

50 40 30 20 10 0 0

Fig. 13.

0.3

0.1

0.1

Fig. 12.

0.4

10

20 30 40 Session Length (Min)

50

60

Receiver-peer churn frequency.

User churns: It is known that frequent user churns accompanied by short sessions, as well as flash crowd behavior pose a unique challenge in live streaming systems [7], [8]. User churns can occur in both upstream (e.g., forwarding peers) and downstream (e.g., receiver peers) connectivity on the overlay. While frequent churns of forwarding peers can cause quality problems, frequent changes of receiver peers can lead to under-utilization of a forwarding peer’s uplink capacity. To estimate receiver peers’ churn behavior, we visualize in Fig. 13 the relationship between session length and the frequency of receiver-peer churns experienced during the sessions. We studied receiver-peer churn behavior for the Cuatro channel by using Zattoo’s feedback logs described in Table II. The y-value at session length x minutes denotes the average number of receiver-peer churns experienced for sessions with length ∈ [x, x + 1). According to the figure, the receiver-peer churn frequency tends to grow linearly with session length, and peers experience approximately one receiver-peer churn every minute. Variable-bit rate streams: Variable-bit rate (VBR) encoding is commonly used in production streaming systems, including Zattoo, due to its better quality-to-bandwidth ratio compared to constant-bit rate encoding. However, VBR streams may put additional strain on peer’s bandwidth contribution and quality optimization. As described in Section II-C, the presence of VBR streams require peers to

perform measurement-based admission control when allocating resources to set up a virtual circuit. To avoid degradation of service due to overloading of the uplink bandwidth, the measurement-based admission control module must be conservative both in its reaction to increases in available bandwidth and its allocation of available bandwidth to newly joining peers. This conservativeness necessarily lead to underutilization of resources. NAT reachability: Asymmetric reachability imposed by prevalent NAT boxes adds to the difficulty in achieving full utilization of user’s uplink capacity. Zattoo delivery system supports 6 different NAT configurations: open host, full cone, IP-restricted, port-restricted, symmetric, and UDP-disabled, listed in increasing degree of restrictiveness. Not every pairwise communication can occur among different NAT types. For example, peers behind a port-restricted NAT box cannot communicate with those of symmetric NAT type (we documented a NAT reachability matrix in an earlier work [5]). To examine how the varied NAT reachability comes into play as far as sharing performance is concerned, we performed the following two controlled experiments. In both experiments, we run four Zattoo clients, each with a distinct NAT type, tuned to the same channel concurrently for 30 minutes. In the first experiment, we fixed the maximum allowable uplink capacity (denoted “max cp”) of those clients to the same constant value.3 In the second experiment, we let the max cp of those clients self-adjust over time (which is the default setting of Zattoo player). In both experiments, we monitored how the uplink bandwidth utilization ratio (i.e., ratio between actively used uplink bandwidth and maximum allowable bandwidth max cp) changes over time. The experiments were repeated three times for reliability. Fig. 14 shows the capacity utilization ratio from these two experiments for four different NAT types. Fig. 14(a) visualizes how fast the capacity utilization ratio converges to 1.0 over time after the client joins the network. Fig. 14(b) plots the R capacity(t)dt] / current average utilization ratio (i.e., [ t R [ t max cp(t)dt]). In both constant and adjustable cases, the 3 The experiments were performed from one of Zattoo’s data centers in Europe, and there was sufficient uplink bandwidth available to support 4 * max cp.

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

VII. R ELATED W ORKS Aside from Zattoo, several commercial peer-to-peer systems intended for live streaming have been introduced since 2005, notably PPLive, PPStream, SopCast, TVAnts, and UUSee from China, and Joost, Livestation, Octoshape, and RawFlow from the EU. A large number of measurement studies have been done on one or the other of these systems [7], [10]–[16]. Many research prototypes and improvements to existing P2P systems have also been proposed and evaluated [4], [17]–[24]. Our study is unique in that we are able to collect network core data from a large production system with over 3 million registered users with intimate knowledge of the underlying network architecture and protocol.

NAT failure rate # of NAT detections

6

3.5E+5 3.0E+5

5

2.5E+5

4

2.0E+5

3

1.5E+5

2

1.0E+5

1

5.0E+4

0 0

2

4

6

Number of NAT Detections

7 NAT Detection Failure Rate (%)

results clearly exemplify the varied sharing performance across different NAT types. Especially, the “symmetric” NAT type, which is the most restrictive among the four, shows inferior sharing ratio compared to the rest. Unfortunately, however, the “symmetric” NAT type is the second most popular NAT type for Zattoo population [5], and therefore can seriously affect Zattoo’s overall sharing performance. There is a relatively small number of peers running “IP-restricted” NAT-type, hence its sharing performance has not been fine-tuned at Zattoo. Reliability of NAT detection: To allow communications among clients behind a NAT gateway (i.e., NATed clients), each client in Zattoo performs a NAT detection procedure upon player startup to identify its NAT configuration and advertises it to the rest of the Zattoo network. Zattoo’s NAT detection procedure implements a UDP-based standard STUN protocol which involves communicating with external STUN servers to discover the presence/type of a NAT gateway [9]. The communication occurs in UDP with no reliable transport guarantee. Lost or delayed STUN UDP packets may lead to inaccurate NAT detection, preventing NATed clients from contacting each other, and therefore adversely affect their sharing performance. To understand the reliability and accuracy of STUN-based NAT detection procedure, we utilize Zattoo’s session database (Table I) which stores among other things NAT information, public IP address, and private IP address for each session. We assume that all the sessions associated with the same public/private IP address pair would be generated under the same NAT configuration. If sessions with the same public/private IP address pair report inconsistent NAT detection results, we consider those sessions as having experienced failed NAT detection, and apply the majority rule to determine the correct result for that configuration. Fig. 15 plots the daily trend of NAT detection failure rate derived from ARD, Cuatro and SF2 during the month of June, 2008. The NAT detection failure rate at hour x is the number of bogus NAT detection results divided by the total number of NAT detections occurring on that hour. The total number of NAT detections indicates how busy Zattoo system was through out. The NAT detection failure rate grows from around 1.5% at 2-3AM to a peak of almost 6% at 8PM. This means that at the busiest times, the NAT type of about 6% of clients are not determinable, leading to them not contributing any bandwidth to the P2P network.

13

0.0E+ 8 10 12 14 16 18 20 22 24 Hour of Day

Fig. 15.

NAT detection failure rate.

P2P systems are usually classified as either tree-based push or mesh-based swarming [25]. In tree-based push schemes, peers organize themselves into multiple distribution trees [26]– [28]. In mesh-based swarming, peers form a randomly connected mesh, and content is swarmed via dynamically constructed directed acyclic paths [21], [29], [30]. Zattoo is not only a hybrid between these two, similar to the latest version of Coolstreaming [4], its dependence on Repeater nodes also makes it a hybrid of P2P network and a content-distribution network (CDN), similar to PPlive [7]. VIII. C ONCLUSION We have presented a receiver-based, peer-division multiplexing engine to deliver live streaming content on a peer-topeer network. The same engine can be used to transparently build a hybrid P2P/CDN delivery network by adding Repeater nodes to the network. By analyzing large amount of usage data collected on the network during one of the largest viewing event in Europe, we have shown that the resulting network can scale to a large number of users and can take good advantage of available uplink bandwidth at peers. We have also shown that error-correcting code and packet retransmission can help improve network stability by isolating packet losses and preventing transient congestion from resulting in PDM reconfigurations. We have further shown that the PDM and adaptive PDM schemes presented have small enough overhead to make our system competitive to digital satellite TV in terms of channel switch time, stream synchronization, and signal lag. ACKNOWLEDGMENT The authors would like to thank Adam Goodman for developing the Authentication Server, Alvaro Saurin for the error correcting code, Zhiheng Wang for an early version of the stream buffer management, Eric Wucherer for the Rendezvous Server, Zach Steindler for the Subsidy System, and the rest of Zattoo development team for fruitful collaboration. R EFERENCES [1] R. auf der Maur, “Die Weiterverbreitung von TV- und Radioprogrammen u¨ ber IP-basierte Netze,” in Entertainment Law (f. d. Schweiz), 1st ed. St¨ampfli Verlag, 2006. [2] UEFA, “Euro2008,” http://www1.uefa.com/.

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 1, FEB 2011

14

1 Average Capacity Utilization Ratio

Current Capacity Utilization Ratio

1 0.8 0.6 0.4 Full cone IP-restricted Port-restricted Symmetric

0.2 0 0

5

10

15

20

25

30

0.6

0.4

0.2

0 Full-cone IP-restricted Port-restricted Symmetric

Time Since Joining (Min)

(a) Constant maximum capacity Fig. 14.

0.8

(b) Adjustable maximum capacity

Uplink capacity utilization ratio.

[3] S. Lin and D. J. Costello, Jr., Error Control Coding, 2nd ed. Pearson Prentice-Hall, 2004. [4] S. Xie, B. Li, G. Y. Keung, and X. Zhang, “CoolStreaming: Design, Theory, and Practice,” IEEE Trans. on Multimedia, vol. 9, no. 8, December 2007. [5] K. Shami et al., “Impacts of Peer Characteristics on P2PTV Networks Scalability,” in Proc. IEEE INFOCOM Mini Conference, April 2009. [6] Bandwidth-test.net, “Bandwidth test statistics across different countries,” http://www.bandwidth-test.net/stats/country/. [7] X. Hei, C. Liang, J. Liang, Y. Liu, and K. W. Ross, “Insights into PPLive: A Measurement Study of a Large-Scale P2P IPTV System,” in Proc. IPTV Workshop, International World Wide Web Conference, 2006. [8] B. Li et al., “An Empirical Study of Flash Crowd Dynamics in a P2P-Based Live Video Streaming System,” in Proc. IEEE Global Telecommunications Conference, 2008. [9] J. R. et al, “STUN - Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs),” 1993, RFC 3489. [10] A. Ali, A. Mathur, and H. Zhang, “Measurement of Commercial Peerto-Peer Live Video Streaming,” in Proc. Workshop on Recent Advances in Peer-to-Peer Streaming, August 2006. [11] L. Vu et al., “Measurement and Modeling of a Large-scale Overlay for Multimedia Streaming,” in Proc. Int’l Conf. on Heterogeneous Networking for Quality, Reliability, Security and Robustness, 2007. [12] T. Silverston and O. Fourmaux, “Measuring P2P IPTV Systems,” in Proc. ACM NOSSDAV, November 2008. [13] C. Wu, B. Li, and S. Zhao, “Magellan: Charting Large-Scale Peer-toPeer Live Streaming Topologies,” in Proc. ICDCS’07, 2007, p. 62. [14] M. Cha et al., “On Next-Generation Telco-Managed P2P TV Architectures,” in Proc. Int’l Workshop on Peer-to-Peer Systems, 2008. [15] D. Ciullo et al., “Understanding P2P-TV Systems Through Real Measurements,” in Proc. IEEE GLOBECOM, November 2008. [16] E. Alessandria et al., “P2P-TV Systems under Adverse Network Conditions: a Measurement Study,” in Proc. IEEE INFOCOM, April 2009. [17] S. Banerjee et al., “Construction of an Efficient Overlay Multicast Infrastructure for Real-time Applications,” in Proc. IEEE INFOCOM, 2003. [18] D. Tran, K. Hua, and T. Do, “ZIGZAG: An Efficient Peer-to-Peer Scheme for Media Streaming,” in Proc. IEEE INFOCOM, 2003. [19] D. Kostic et al., “Bullet: High Bandwidth Data Dissemination Using an Overlay Mesh,” in Proc. ACM SOSP, Bolton Landing, NY, USA, October 2003. [20] R. Rejaie and S. Stafford, “A Framework for Architecting Peer-to-Peer Receiver-driven Overlays,” in Proc. ACM NOSSDAV, 2004. [21] X. Liao et al., “AnySee: Peer-to-Peer Live Streaming,” in Proc. IEEE INFOCOM, April 2006. [22] F. Pianese et al., “PULSE: An Adaptive, Incentive-Based, Unstructured P2P live Streaming System,” IEEE Trans. on Multimedia, vol. 9, no. 6, 2007. [23] J. Douceur, J. Lorch, and T. Moscibroda, “Maximizing Total Upload in Latency-Sensitive P2P Applications,” in Proc. ACM SPAA, 2007, pp. 270–279. [24] Y.-W. Sung, M. Bishop, and S. Rao, “Enabling Contribution Awareness in an Overlay Broadcasting System,” in Proc. ACM SIGCOMM, 2006.

[25] N. Magharei, R. Rejaie, and Y. Guo, “Mesh or Multiple-Tree: A Comparative Study of Live P2P Streaming Approaches,” in Proc. IEEE INFOCOM, May 2007. [26] V. N. Padmanabhan, H. J. Wang, and P. A. Chou, “Resilient Peer-to-Peer Streaming,” in Proc. IEEE ICNP, November 2003. [27] M. Castro et al., “SplitStream: High-Bandwidth Multicast in Cooperative Environments,” in Proc. ACM SOSP, October 2003. [28] J. Liang and K. Nahrstedt, “DagStream: Locality Aware and Failure Resilient Peer-to-Peer Streaming,” in Proc. SPIE Multimedia Computing and Networking, January 2006. [29] N. Magharei and R. Rejaie, “PRIME: Peer-to-Peer Receiver-drIven MEsh-based Streaming,” in Proc. IEEE INFOCOM, May 2007. [30] M. Hefeeda et al., “PROMISE: Peer-to-Peer Media Streaming Using CollectCast,” in Proc. ACM Multimedia, November 2003. Hyunseok Chang received the B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 1998, and the M.S.E. and Ph.D. degrees in computer science and engineering from the University of Michigan, Ann Arbor, in 2001 and 2006, respectively. He is currently a Member of Technical Staff with the Network Protocols and Systems Department, Al- catel-Lucent Bell Labs, Holmdel, NJ. Prior to that, he was a Software Architect at Zattoo, Inc., Ann Arbor, MI. His research interests include content distribu- tion network, mobile applications, and network measurements. Sugih Jamin received the Ph.D. degree in computer science from the University of Southern California, Los Angeles, in 1996. He is an Associate Professor with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor. He cofounded a peer-to-peer live streaming company, Zattoo, Inc., Ann Arbor, MI, in 2005. Dr. Jamin received the National Science Foundation (NSF) CAREER Award in 1998, the Presidential Early Career Award for Scientists and Engineers (PECASE) in 1999, and the Alfred P. Sloan Research Fellowship in 2001. Wenjie Wang received the Ph.D. degree in computer science and engineering from the University of Michigan, Ann Arbor, in 2006. He is a Research Staff Member with IBM Research China, Beijing, China. He co-founded a peer-to-peer live streaming company, Zattoo Inc., Ann Arbor, MI, in 2005, based on his Ph.D. dissertation, and was CTO. He also worked in Microsoft Research China and Oracle China as a visiting student and consultant intern.

CodedStream: live media streaming with overlay coded ...