Load Shedding for Window Joins over Streams Donghong Han, Chuan Xiao, Rui Zhou, Guoren Wang, Huan Huo, and Xiaoyun Hui Northeastern University, Shenyang 110004, China [email protected]

Abstract. We present a novel load shedding technique over sliding window joins. We first construct a dual window architectural model including join-windows and aux-windows. With the statistics built on aux-windows, an effective load shedding strategy is developed to produce maximum subset join outputs. For the streams with high arrival rates, we propose an approach incorporating front-shedding and rearshedding, and then address the problem of how to cooperate these two shedding processes through a series of calculations. Based on extensive experimentation with synthetic data and real life data, we show that our load shedding strategy delivers superb join output performance, and dominates the existing strategies.

1

Introduction

Data stream applications such as network monitoring, on-line transaction flow analysis, intrusion detection and sensor networks pose tremendous challenges to traditional database systems. Unbounded continuous input streams require specific processing techniques different from fixed-size stored data sets. As for “join”, a traditional important operator, it is not practical to compare every tuple in one infinite stream with every tuple in another, thus, sliding window join is put forward [1]. It restricts the set of the most recent tuples that participate in the join within a bounded-size window, and produces acceptable approximate join outputs. There are mainly two types of windows: time-based window and tuple-based window [3]. As for time-based window, the number of tuples in window is not fixed. The higher the stream arrival rate is, the more tuples the window memory holds. For tuple-based window, the number of tuples in window is fixed. The higher the arrival rate is, the newer the window tuples are. In our paper, we primarily focus on tuple-based window, and time-based window is reserved for future work. Note that even with a window predication, join operator may lack of CPU or memory resources when streams have high arrival rates. Therefore, we need load shedding (drop some tuples to reduce system load) to facilitate the join processing, so as to keep pace with the incoming streams. There are two types of join approximation [5]: max-subset results and sampled results. We take maxsubset approximation as the evaluation criterion for shedding strategies. For time-based window joins, we have two kinds of resource limitations, CPU deficiency and memory shortage [4]. For tuple-based joins, the two limitations J.X. Yu, M. Kitsuregawa, and H.V. Leong (Eds.): WAIM 2006, LNCS 4016, pp. 472–483, 2006. c Springer-Verlag Berlin Heidelberg 2006 

Load Shedding for Window Joins over Streams

473

can attribute to CPU deficiency exclusively, because the buffer memory that holds tuples will not overflow if CPU is fast enough. Considering the evaluating process of joins, since probes(checking the opposite window for matching tuples) take up most of the CPU resources, we develop a novel shedding strategy by letting part of the tuples enter window without performing probes. We “drop” the tuples in this way rather than discard them directly, for the sake that future tuples from the other stream may produce join results with these ones. Furthermore, we implement a semantic selection of the shedding tuples based on statistics of aux-windows(Section 2), which shows good performance on producing max-subset outputs and is denoted as rear-shedding stategy(Section 3). If stream arrival rates are high, a large percent of tuples will be dropped. CPU resources are primarily spent on the operation of entering/leaving windows. Considering an extreme case, stream speeds are so high that no probes can be performed, then no join outputs will be obtained. Paradoxically, if we discard part of tuples beforehand, some CPU resource will be saved to perform probes, with a subset of join outputs gained. We name the shedding strategy here front-shedding, and address the problem of how to cooperate these two shedding processes through a series of calculations(Section 4). Experiment results are shown in Section 5. Related literatures are fully summarized in Section 6.

2

Dual Window Model

Our goal is to process a sliding window equi-join between two streams A and B producing maximum subset of join outputs with load shedding if necessary. We adopt the join process similar to those presented in [1,5]. Assume the two streams are Stream A and Stream B. On each arrival of a new tuple from Stream A, three tasks must be performed: 1. Scan Stream B’s window, looking for matching tuples, and propagate them to the result. This task is called probing. 2. Insert the new tuple into Stream A’s window. 3. Invalidate the oldest tuple in Stream A’s window. From the above, we conclude that there are two kinds of tasks for CPU to perform: probe (1) and updates (2,3). As for tuple-based window, for one tuple, updates (replacing the oldest tuple in the join window with a new coming one) can be performed more efficiently than probe. In cases of high stream speeds, CPU is unable to perform the whole join process (both probe and updates) for every arriving tuple, therefore we need to shed load by letting part of the tuples enter window without performing probes, yet the other tuples perform probes as normal. Notice that we do not discard the tuples that do not perform probes, for future tuples from the other stream may produce join results with these tuples. Consequently, CPU can keep pace with the streams whose speeds are faster than CPU’s processing ability. Figure 1 shows our model on window joins. We divide the memory into three parts. For each stream, we have :

474

D. Han et al. queue

aux-window

join-window

stream A histogram A front-shedding

rear-shedding

Output

histogram B stream B

Fig. 1. Dual Window Model

1. join-window, the join window holding the tuples with which a new arriving tuple from the opposite stream will perform join. 2. aux-window, auxiliary window, which is the same size as the join-window. We also construct a window-histogram based on the aux-window, and with the help of its statistics we can implement effective load shedding by dropping those tuples producing fewer join results. 3. queue, serves as a buffer. We can detect stream speeds by monitoring the queue size of each stream. When the queue length reaches the threshold when buffer is about to overflow, and the stream speeds are still faster than CPU processing rate, we start load shedding by keeping part of the tuples from performing probes. Hence CPU can process more tuples per time interval, though some join results are left out. We denote this load shedding process as rear-shedding, whose evaluation is executed when a tuple leaves aux-window, and preparing to enter join-window. If the incoming stream speeds further increase, exceeding another threshold (interpreted in sections below), we start front-shedding to cooperate with rear-shedding to produce max-subset join results.

3

Rear-Shedding

For convenience, in Table 1, we introduce notations for the constants and variables used in this paper. These notations are also used in the following sections. W , D are set up according to specific application, while Vj and Vw are determined by CPU processing ability and can be tested from experiments. 3.1

Determining kr

We do not need load shedding if CPU can perform the joins of every tuple. In order to keep the queue from overflow, we need to maintain an approximately constant queue length. Based on this prerequisite, we have the following deduction: The time for one tuple to enter queue is V1q , and the time for one tuple to leave aux-window and to join is V1w + V1j . Since the window size is fixed, the time for one tuple to leave aux-window is equal to that for one tuple to enter auxwindow, which is equal to the time for one tuple to leave queue. For a constant

Load Shedding for Window Joins over Streams

475

Table 1. Constants and Variables Name Vs Vq Vj Vw kr kf W D

Description speed of stream speed of tuples entering queue maximum number of tuple probes per time interval without considering the cost of entering and leaving windows maximum number of tuples entering/leaving window per time interval without considering the cost of probes rear-shedding rate front-shedding rate window size domain of the join attribute

queue length, the time for one tuple to enter queue and the time for one tuple to leave queue are equal, thus we have: 1 1 1 = + Vq Vw Vj Likewise, we can get the following equation when performing load shedding: 1 1 1 − kr = + Vq Vw Vj Then kr is determined as: kr = 1 − V j (

1 1 − ) Vq Vw

(1)

Furthermore, for a constant queue length, we obtain Vq = Vs . Vs can be detected by the system, thus we can find a shedding rate kr to let CPU coordinate with the incoming streams. The faster the streams are, the higher kr is adopted. 3.2

Determining Which Tuple to Shed

Suppose we perform joins on tuple’s attribute Attr, and take integer as data type for simplicity. For each stream, we build a window-histogram based on its aux-window by mapping the values of Attr into an array of counters. The array size is D. Figure 2 gives an example. There are two 1s, four 2s, one 3, zero 4, and one 5 in aux-windowB. Window-histogram is maintained dynamically, and when a new tuple enters aux-window or when an old one leaves, updates will be carried out by means of increasing or decreasing the corresponding counter of the tuple’s attribute value. Assume the two streams have the same speed(Processing of different speed ratio is omitted due to space limitation. Readers can refer to our technical report [14].), for such speed ratio 1:1, we let the aux-windows and join-windows of the two streams have the same size. CPU alternatively takes out a tuple from one of the aux-windows and performs joins with the opposite join-window.

476

D. Han et al. 1 2 3

histogramB

2

key counter

1

2

3

4

5

...

2

2

4

1

0

1

...

1

aux-window

5 2

A

B

Fig. 2. Window-histogram number of outputs to produce tuples in aux-window

0

1

2

3

4

5

6

7

4

3

6

8

4

2

1

0

4+3+6+8=21

Fig. 3. Calculating n with Frequency Array

Now we introduce the strategy to determine which tuple should be shed. Take Stream A for example. When a tuple g is about to enter aux-windowA, we check its join attribute value in the window-histogram of the opposite window (windowhistogramB), find how many join outputs it will produce, and save this number N . Accordingly, we construct an array C for all the tuples in aux-windowA, recording the number of join outputs that each tuple will produce. Moreover, a frequency array is built on the array C, counting how many tuples in auxwindowA will produce a specific number(C[i]) of join outputs. Therefore, when g leaves aux-windowA, it is able to count how many tuples in aux-windowA will produce less join outputs than N , denoted as n. Figure 3 provides an example: the tuple being judged will produce 4 join outputs, we need to count how many tuples in its aux-window will produce less than 4 join outputs. There are 4 tuples will produce 0 join outputs, 3 will produce 1, 6 will produce 2, and 8 will produce 3, there are 4+3+6+8=21 (n=21) tuples in all to produce less than 4 outputs. From Algorithm 1, we know that for a tuple g leaving aux-windowA, if the auxwindowA has kr W or more tuples that will produce join outputs fewer than N , g will not be shed. Algorithm 1. SheddingAlgorithm() Function: Judge whether a tuple should be shed or not 1: if n/W < kr then 2: update(enter its own join-window without probing the opposite join-window); 3: else 4: probe; update; 5: end if

Considering a case that many tuples in aux-windowA have the same number of join outputs. For example, suppose window size is 100, and of the 100 tuples

Load Shedding for Window Joins over Streams

477

in aux-windowA, 20 tuples will produce 0 join result, 50 tuples will produce 1 join result, and 30 tuples will produce 2 join results. Now we have the shedding rate kr =0.6, thus we should shed all 20 resultless tuples and also 40 tuples from 50 which will produce 1 join result. The 40 tuples will be chosen randomly. The algorithm is easy and omitted here.

4

Front-Shedding

Suppose stream speeds are extremely high, e.g. Vs > Vw . We have to shed all the tuples (kr =1), and CPU resources are all spent on performing updates, with no join results produced. Nevertheless, the speed of tuples entering queue is still higher than the speed of tuples leaving queue. The queue length will increase with no limit, and the system will become unstable. However, if we discard part of the incoming tuples before pushing them into the queue, letting Vq < Vw , some CPU resources will be saved to perform probes, and some join results will be obtained. We introduce front-shedding, controlling Vq < Vw . Figure 4 shows an approximate join outputs curve without front-shedding. When Vs <= Vj ∗ Vw /(Vj + Vw ) (we can get it from equation 1), system needs no load shedding (kr =0), and output results will increase in proportion to the stream speeds. As the stream speeds become higher, shedding rate increases correspondingly. When the stream speeds reach Vw , the shedding rate kr is 1, and the output results are 0. Since the stream speeds and the shedding rate kr are continuous, there exists a maximum number of join outputs at a certain speed, Vopt . (opt means optimal.) Next, we will calculate Vopt .

number of outputs per time interval

ideal dual window rear-shedding only

O max

Vw Vq=

Vj Vw (Vj+Vw)

Vopt

stream speeds

Fig. 4. Rear-shedding Outputs

Suppose the two incoming stream have the same distribution. Take uniform distribution as an example. Other types of distribution, such as Zipfian can be deduced similarly. For a tuple g with attribute value a, the probability it appears is 1/D, where D is the value domain of the attribute. The probability that in the opposite window, there are exactly i tuples with the same attribute value a as tuple g is:   1 1 W Pi = ( )i (1 − )W −i i D D

478

D. Han et al.

Suppose we will shed all the tuples producing outputs fewer than M (M ∈ [0, W ]), and part of the tuples producing M outputs, denote the ratio as r, i.e. among the tuples that will produce M outputs, the number of tuples to be shed divided by all such tuples. Thus the shedding rate kr is determined as: kr =

M−1 

Pi + r · PM (0 ≤ r ≤ 1)

i=0

And tuples joined per time interval is: Vq (1 − kr ). The average number of join outputs each tuple can produce is: Otuple = (1 − r) · M

W  PM Pi + i· 1 − kr 1 − kr i=M+1

Thus the total number of outputs per time interval is: M W   O = Vq (1 − kr ) · Otuple = Vq [M ( Pi − kr ) + i · Pi ] i=0

i=M+1

Our goal is to achieve the max-subset of join output results, letting O reach the maximum Omax . As is known in equation (1), kr is a function of Vq . Substitute kr with Vq , we obtain:

 O = V [M (  P + −   P ≤ k = 1 − V ( − M

q

i

i=0

M −1

i

Vj Vq

Vj Vw

1 j V q

r

i=0

− 1) + 1 Vw

)≤

 W

P

i · Pi ]

i=M +1 M

(2)

i

i=0

Let Vq = Vopt , when O reaches its maximum Omax . let λ = Vj /Vw , then Vq = M W   Pi + i · Pi , Vj /(λ+ 1 − kr ). Substitute Vq with kr and λ; and let α = M i=0

i=M+1

β = λ + 1, we can get: O = Vj ·

α − M kr α − βM = Vj · (M + ) β − kr β − kr

(3)

In equation (3), for a definite M , α, β, Vj are all constant. Hence O changes monotonically with kr . As a result, there is no such kr : M−1 

Pi < kr <

i=0

M 

Pi

i=0

that produces Omax . Omax is obtained only at endpoints, i.e. the ratio r = 0. The following equation can be deduced: kr =

M−1  i=0

Pi or kr =

M  i=0

Pi

Load Shedding for Window Joins over Streams

Equation (2) can be reduced to: ⎧ ⎪ ⎪ ⎨ O = Vq

W 

479

i · Pi

i=M+1

M  ⎪ ⎪ ⎩ Pi = kr = 1 − Vj ( V1q − i=0

1 Vw )

M is in [0,W ]. For a given window size and distribution, W and Pi are fixed; only M is variable. Therefore Omax can be easily found through a search of M among W +1 values. Vopt and kr can be then determined by M . Furthermore we can use a binary search to reduce the searching cost remarkably, for the function O has the shape like “Λ”, which means it first increases, and then decreases. The proof is omitted due to page limitation. Based on the discussions above, we summarize the applying of front-shedding and rear-shedding strategies as follows: – If Vs <= Vopt , only rear-shedding will be adopted. – If Vs > Vopt , rear-shedding and front-shedding will cooperate. Control Vq by front-shedding, letting Vq = Vopt . V

Front-shedding rate kf is determined as kf = 1 − Vopt . Semantic information s is ignored in front-shedding, because the long queue may impair its efficacy in prediction over joins. Therefore, we choose a subset of the streams in a random way, namely a simple but efficient way.

5

Experiments

To assess the practical performance of our model, we perform several sets of experiments on both synthetic and real life datasets. We compare the performance of our strategies (referred to as DUAL) with another two load shedding strategies. One is dropping tuples randomly from the join input buffers (referred to as RAND); the other is a heuristic strategy [4] (referred to as PROB). Additionally, we use an optimal offline strategy [4] (referred to as OPT) to better evaluate the results. All the experiments are performed on P4 3.2G, 512M, Windows XP. The experiments indicate that our dual-window model histogram-based load shedding strategy works surprisingly well in practice. 5.1

Experiments on Front-Shedding

Our first set of experiments is focused on studying the function of front-shedding. We compare two strategies, both front-shedding and rear shedding (referred to as DUAL), and rear-shedding only (referred to as REAR). We use window size 400, domain size 50, and input data generated from Zipfian distribution with skew parameter 1. From the tested speed of join probes and that of the tuples entering/leaving window, we obtain that Vopt = 117.396 by calculation, which accords with our experiment results. Vopt is determined similarly in subsections

480

D. Han et al. Output Vs Speed for w=400 Zipfian distribution, d=50 DUAL REAR

Output tuples per second

4.0E+06 3.5E+06 3.0E+06 2.5E+06 2.0E+06 1.5E+06 1.0E+06 40

80

120

160

240

320

400

480

560

640

Stream speeds of two streams (tuples/ms)

Fig. 5. Front-shedding and Rear-shedding

5.2 and 5.3, with respect to fixed data distributions and predefined window sizes. Figure 5 shows the comparison between the two strategies. When stream speeds are lower than Vopt , front-shedding has not been started, thus two strategies have the same results. As the stream speeds increase, we can easily see the difference: the result from DUAL keeps approximately a constant number, because front-shedding controls the tuples entering queue at a constant speed, and rear-shedding drops tuples at a constant shedding rate. 5.2

Effect of Window Size

Figures 6 and 7 show the number of join outputs for window sizes of 400 and 800 respectively. In this set of experiments, we use input data generated from Zipfian distribution with skew parameter 1, domain size 50. Four load shedding strategies are to be compared: OPT, RAND, PROB, and DUAL.

DUAL

Output Vs Speed for w=400 Zipfian distribution, d=50

DUAL

Output Vs Speed for w=800 Zipfian distribution, d=50

RAND

RAND

PROB

4.5E+06

PROB

4.5E+06

OPT

OPT 4.0E+06

Output tuples per second

Output tuples per second

4.0E+06 3.5E+06 3.0E+06 2.5E+06 2.0E+06 1.5E+06 1.0E+06

3.5E+06 3.0E+06 2.5E+06 2.0E+06 1.5E+06 1.0E+06

20

40

60

80

100

120

140

160

180

Stream speeds of two streams (tuples/ms)

Fig. 6. Window Size (W=400)

200

10

20

30

40

50

60

70

80

90

100

Stream speeds of two streams (tuples/ms)

Fig. 7. Window Size (W=800)

As shown in the figures, DUAL works much better than PROB and RAND, especially when stream speeds are high. The performance of the different strategies do not change much as the window size is varied. Increased window size only produces more join outputs at one stream speed, for a tuple needs to probe more tuples in the opposite window; but not impacts the performance of the load shedding strategies.

Load Shedding for Window Joins over Streams

5.3

481

Effect of Distribution

Figure 8 shows the performance of the different load shedding strategies for a window size of 400 when both the incoming streams have a uniform data distribution in a domain size of 50. The experiment results indicate that for less regular input data, shedding by heuristic information is not a good option, while our strategy has a significant advantage over shedding by heuristic information or random selection. The input data streams consist of tuples with uniformly distributed attribute values have different affects on the performance of different load shedding strategies. Since all the tuples have the same probability of finding a tuple with equal attribute value in the opposite window, heuristic information is trivial in judging which tuple will produce more join results. Therefore PROB will be as poor as RAND, however, DUAL is able to perform as well as on Zipfian distributed input data. Aux-windows are introduced to predict the number of join outputs that each tuple can produce, therefore enable the selection among tuples within a range of window size. Such preferences are accumulated through large streams, and finally lead to the advantage over the other two strategies. 5.4

Real Life Dataset Experiments

We use CO2 data available at [10] as our real life datasets for experiments. We perform a streaming sliding window join using the air temperature at 38.2 meters in two years - 1995 and 1998 - as two datasets, and we set window size as 1000. After deleting invalid data items and considering the warmup phase [4], 15471 tuples are left for join queries. Such join query results can be potentially used to research the change of ambient CO2 concentration at the same temperature in the three years. For the calculation of Vopt , we perform a sampling of the datasets, and then obtain an approximate distribution of the input data, thus Vopt can be determined as described in Section 4. Figure 9 shows the results from different strategies as a percentage of ideal case, namely the results produced by fast enough CPU. DUAL

Output Vs Speed for w=400 Uniform distribution, d=50

DUAL

Output Vs Speed for w=1000 CO 2 data

RAND

OPT 5.0E+05 4.5E+05 4.0E+05 3.5E+05 3.0E+05 2.5E+05 20

40

60

80

100

120

140

160

180

Stream speeds of two streams (tuples/ms)

Fig. 8. Uniform Distribution

200

Output tuples as a percentage of ideal

5.5E+05

Output tuples per second

RAND

PROB

PROB

100%

OPT 80% 60% 40% 20% 0% 8

12

16

20

24

28

32

36

40

44

Stream speeds of two streams (tuples/ms)

Fig. 9. Real Life Datasets

From the figure, it is observed that our strategy DUAL performs much better than PROB and RAND. The real life datasets are neither as random as uniform

482

D. Han et al.

distribution data, nor as regular as Zipfian distribution data. Therefore, heuristic information may be used to judge which tuple will produce more join outputs, but the judgment might not be accurate, in other words, the tuples with attribute value that produced more join outputs in the past might not produce more join outputs in the future. At the same time, DUAL performs well because the judgment is within one window instead of among all the tuples, and therefore more accurate than selection by heuristic information.

6

Related Work

There has been considerable work on data stream processing. The survey in [11] gives an overview of stream work, and has summarized the issues of building a data stream management system. Specialized systems have been built to process streaming data, such as Aurora [6], STREAM [2], NiagaraCQ [7] and TelegraphCQ [9]. The papers [1, 4, 5, 12, 13] focus on performing joins over streaming data. [1] introduces an implementation of join process, and addresses the cost models of nested loop joins and hash joins, which adopts the simplest random shedding strategy. [4] provides an architectural model, primarily discusses the offline loadshedding strategies, and introduces some heuristic online strategies. [5] puts forward the concepts of sampled results and age-based model, apart from maxsubset results and frequency-based model in [1,4]. Our work consider max-subset results and frequency-based model. We also construct an architectural model, and develop an online shedding strategy according to window statistics. In the literature of multi-joins, [12] analyzes the cost of nested loop joins and hash joins, and proposes join ordering heuristics to minimize the processing cost per unit time. [13] provides a symmetric multi-join operator for multiple joined streams to minimize memory usage as opposed to using multiple binary join operators.

7

Conclusions and Future Work

In this paper, we addressed a novel load shedding technique over sliding window joins. We propose a dual window architectural model, and build statistics based on the aux-windows. Effective semantic load shedding can be implemented, for the number of join outputs can be predicted by window-histograms in advance. With the cooperation of front-shedding and rear-shedding, we can deal with high stream arrival rate scenarios, and manage to produce max-subset results. A promising direction for future work is to consider time-based window joins in order to serve for different kinds of applications. Acknowledgments. This research was partially supported by the National Natural Science Foundation of China (Grant No. 60273079 and 60573089) and Specialized Research Fund for the Doctoral Program of Higher Education (SRFDP).

Load Shedding for Window Joins over Streams

483

References 1. J. Kang, J. F. Naughton, and S. D.Viglas. Evaluating Window Joins over Unbounded Streams. In Proc. 2003 Intl. Conf. on Data Engineering, Mar. 2003. 2. The STREAM Group. STREAM: The Stanford Stream Data Manager. IEEE Data Engineering Bulletin ,26(1):19-26,March 2003. 3. A. M. Ayad, J. F. Naughton. Static Optimization of Conjunctive Queries with Sliding Windows Over Infinite Streams. In Proc. ACM SIGMOD Conf., June 2004. 4. A. Das, J. Gehrke, and M. Riedewald. Approximate Join Processing Over Data Streams. In Proc. 2003 ACM SIGMOD Conf., June 2003. 5. U. Srivastava, J. Widom. Memory-Limited Execution of Windowed Stream Joins. In Proc. 30th Int. Conf. on Very Large Data Bases, 2004. 6. D. Abadi, D. Carney, et al. Aurora: a new model and architecture for data stream management. VLDB Journal, Vol.12(2),pp.120-139,2003. 7. J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continous query system for internet databasses. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 379-390, 2000. 8. P. M. Fenwich. A New Data Structure for Cumulative Frequency Tables. Software - Practice and Experience, Vol 24, No 3, pp 327-336, Mar 1994. 9. J. M. Hellerstein, M. J. Franklin, S. Chandrasekaran, et al. Adaptive query processing: Technology in evolution. IEEE data Engineering Bulletin, 23(2):7-18,2000. 10. D. Baldocchi, K. Wilson, et al. Half-Hourly Measurements of CO2, Water Vapor, and Energy Exchange Using the Eddy Covariance Technique from Walker Branch Watershed, Tennessee, 1995-1998. http://cdiac.esd.ornl.gov/ftp/ameriflux/data/ us-sites/walker-branch/ 11. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. Principles of Database Systems (PODS), June 2002. 12. L. Golab, M. T. Ozmu. Processing Sliding Window Multi-joins in Continuous Queries over Data Streams. In Proc. Conf. on Very Large Databases, Sept. 2003. 13. S. D. Viglas, J. F. Naughton, J. Burger. Maximizing the Output Rate of MultiWay Join Queries over Streaming Information Sources. In Proc. Int. Conf. on Very Large Databases (VLDB), Sept. 2003. 14. D. Han, R. Zhou, C. Xiao. Load shedding for Window Joins over Data Streams, June 2004, Technical report, Northeastern University. http://mitt.neu.edu.cn/ publications/HZX05-Joins.pdf

LNCS 4016 - Load Shedding for Window Joins over ...

Data stream applications such as network monitoring, on-line transaction flow analysis, intrusion ..... Stream speeds of two streams (tuples/ms). Output tuples .... cialized Research Fund for the Doctoral Program of Higher Education (SRFDP).

477KB Sizes 1 Downloads 137 Views

Recommend Documents

Load Shedding for Window Joins over Streams - Springer Link
As for the scenarios of variable speed ratio, we develop a plan reallocating CPU resources and dynamically resizing the windows. ... tical to compare every tuple in one infinite stream with ...... tinous query system for Internet databases. In Proc.

A Max-Min approach to delay load shedding in power ...
Keywords: robustness, power distribution networks, capacity uncertainties, Mixed Integer Linear Program- ming. 1. Introduction. In the context of energy, the market deregulation is deeply modifying the conditions of control of the opera- tional safet

Tutorial Load Balancing With Fail Over menggunakan Mikrotik 2.9.6 ...
Tutorial Load Balancing With Fail Over menggunakan Mikrotik 2.9.6.pdf. Tutorial Load Balancing With Fail Over menggunakan Mikrotik 2.9.6.pdf. Open. Extract.

sv-lncs - Research at Google
In dynamic web application development, software testing forms an ... Thus, in practice, a small company can rent these infrastructures from external cloud-.

Generic Load Regulation Framework for Erlang - GitHub
Erlang'10, September 30, 2010, Baltimore, Maryland, USA. Copyright c 2010 ACM ...... rate on a budget dual-core laptop was 500 requests/s. Using parallel.

LNCS 6622 - Connectedness and Local Search for ...
Stochastic local search algorithms have been applied successfully to many ...... of multiobjective evolutionary algorithms that start from efficient solutions are.

LNCS 3174 - Multi-stage Neural Networks for Channel ... - Springer Link
H.-S. Lee, D.-W. Lee, and J. Lee. In this paper, we propose a novel multi-stage algorithm to find a conflict-free frequency assignment with the minimum number of total frequencies. In the first stage, a good initial assignment is found by using a so-

LNCS 3815 - Configurable Meta-search for Integrating ...
an integrated search that can access several WebPACs simultaneously. This service can be achieved by either building a centralized union catalog (such as WorldCat of. OCLC), establishing standard data exchange protocols (such as Z39.50 [1] or OAI-. P

LNCS 4270 - A Service-Oriented Architecture for ...
Now, most of the visualization systems are based on the client/server frame- ... Also based on the transmission data format, the architecture can be divided into ... client. This offers high-quality graphics, but this approach needs powerful graph-.

ABK Joins MSA.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. ABK Joins MSA.

LNCS 4233 - Fast Learning for Statistical Face Detection - Springer Link
Department of Computer Science and Engineering, Shanghai Jiao Tong University,. 1954 Hua Shan Road, Shanghai ... SNoW (sparse network of winnows) face detection system by Yang et al. [20] is a sparse network of linear ..... International Journal of C

LNCS 7601 - Optimal Medial Surface Generation for ... - Springer Link
parenchyma of organs, and their internal vascular system, powerful sources of ... but the ridges of the distance map have show superior power to identify medial.

Using Task Load Tracking to Improve Kernel Scheduler Load ...
Using Task Load Tracking to Improve Kernel Scheduler Load Balancing.pdf. Using Task Load Tracking to Improve Kernel Scheduler Load Balancing.pdf. Open.

Weighted Proximity Best-Joins for Information ... - Research at Google
1Department of Computer Science, Duke University; {rvt,junyang}@cs.duke.edu. 2Google ... input to the problem is a set of match lists, one for each query term, which ... Two matchsets with different degrees of clusteredness but equal-size enclosing .

Efficient Similarity Joins for Near Duplicate Detection
Apr 21, 2008 - ing in a social network site [25], collaborative filtering [3] and discovering .... inverted index maps a token w to a list of identifiers of records that ...

A Efficient Similarity Joins for Near-Duplicate Detection
duplicate data bear high similarity to each other, yet they are not bitwise identical. There ... Permission to make digital or hard copies of part or all of this work for personal or .... The disk-based implementation using database systems will be.

An Efficient Algorithm for Similarity Joins With Edit ...
ture typographical errors for text documents, and to capture similarities for Homologous proteins or genes. ..... We propose a more effi- cient Algorithm 3 that performs a binary search within the same range of [τ + 1,q ..... IMPLEMENTATION DETAILS.

load balancing
Non-repudiation means that messages can be traced back to their senders, ..... Traffic sources: web, ftp, telnet, cbr (constant bit rate), stochastic, real audio. ... updates, maintaining topology boundary etc are implemented in C++ while plumbing ..

One Window Service.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect ...

overton window pdf
File: Overton window pdf. Download now. Click here if your download doesn't start automatically. Page 1. overton window pdf. overton window pdf. Open. Extract.

Window Notes Freebie.pdf
Page 1 of 6. Window Notes. on Spring Weather. Facts about spring weather: What does the idiom: “In like a lion, out. like a lamb” mean to you? Hurtful Spring Weather: Helpful Spring Weather: Name__________. While reading the article on Spring wea

X Window System Protocol - XFree86
standards, such as the Inter-Client Communication Conventions Manual and the X Logical Font. Description .... the depth, in which case the least significant bits are used to hold the pixmap data, and the values of the unused ...... True and the bytes

LNCS 8149 - Manifold Diffusion for Exophytic Kidney ...
acteristic analysis showed that the proposed method significantly outperformed ..... Seo, S., Chung, M.K., Vorperian, H.K.: Heat kernel smoothing using laplace-.