Region-Based Coding for Queries over Streamed XML Fragments Xiaoyun Hui, Guoren Wang, Huan Huo, Chuan Xiao, and Rui Zhou Northeastern University, Shenyang, China [email protected]

Abstract. Recently proposed Hole-Filler model is promising for transmitting and evaluating streamed XML fragments. However, by simply matching filler IDs with hole IDs, associating all the correlated fragments to complete the query path would result in blocking. Taking advantage of region-based coding scheme, this paper models the query expression into query tree and proposes a set of techniques to optimize the query plan. It then proposes XFPR (XML Fragment Processor with Region code) to speed up query processing by skipping correlating adjacent fragments. We illustrate the effectiveness of the techniques developed with a detailed set of experiments.

1

Introduction

As an emerging standard for data representation and exchange on the Web, XML is adopted by more and more applications for their information description. Recently, several applications of XML stream processing have emerged, and many research works focus on answering queries on streamed XML data, which has to be analyzed in real-time and by one pass. In order to decrease the transmission and evaluation cost, Hole-Filler model [1] is proposed. As a result, Xstreamcast [2], XFrag [3], and other approaches focus on dealing with fragmented XML data based on Hole-Filler model. Figure 1 gives an XML document and its DOM tree, which acts as an example of our work .

XML jane john 2000 Origins ... ......

book

title

allauthors

year

c hapter

c hapter

...... XML author author 2000 head

sec tion sec tion

...... jane

john

Origins

head

Fig. 1. An XML Document and its DOM Tree K. Aberer et al. (Eds.): WISE 2006, LNCS 4255, pp. 487–498, 2006. c Springer-Verlag Berlin Heidelberg 2006 

sec tion

488

X. Hui et al.

Data fragmentation offers various attractive alternatives to organize and manage data. In this way, infinite XML streams turn out to be a sequence of XML fragments, and queries on parts of XML data require less memory and processing time. Furthermore, changes to XML data may pose less overhead by sending only fragments corresponding to the changes, instead of sending the entire document. However, recently proposed frameworks have not fully exploited the advantage of Hole-Filler model. In XFrag, a novel pipelined framework is presented for processing XQueries to achieve processing and memory efficiency. XML fragments are processed as and when they arrived and only those messages that may effect on the query results are kept in the association table. However, the XFrag pipeline is still space consuming in maintaining the links in the association tables and time cost in scheduling the operations for each fragment. And it can not avoid “redundant” operations when dependence occurs between adjacent operators. This paper presents a new framework and a set of techniques for efficiently processing XPath queries over streamed XML fragment. We make the following contributions: (i)we adopt the region-coding scheme for XML documents and adapt it to streamed XML fragment model. This coding scheme simplifies the method checking structural relationship (such as parent-child relationship “/” and ancestor-descendant relationship “//”) between fragments and speeds up complex query operations, especially for nested-loop query and twig pattern query. (ii)we propose techniques for enabling the transformation from XPath expression to optimized query plan. We model the query expression using query tree and enable further analysis and optimizations by eliminating the “redundant” path evaluations. (iii)based on optimized Query tree, we propose query plan directly into an XML fragment query processor, named XFPR, which speed up query processing by skipping correlating adjacent fragments. Note that, we assume the query clients cannot reconstruct the entire XML data before processing the queries. The rest of this paper is organized as follows. Section 2 introduces our regionbased coding scheme for streamed XML fragments. Section 3 gives a detailed statement of our XML fragment processing framework. Section 4 shows experimental results from our implementation and shows the processing efficiency of our framework. Our conclusions are contained in Section 5.

2

Region-Based Coding Scheme on Hole-Filler Model

In our approach, we employ hole-filler model [1] to describe XML fragments. In order to specify the relationships between fragments, we adopt the region-based coding scheme of XML documents and adapt it to hole-filler model that holds both the data contents and structural relationships. 2.1

Preliminary Hole-Filler Model

In the Hole-Filler model, document is pruned into many fragments. Those fragments are correlated to each other by fillers and holes. Since fragments may

Region-Based Coding for Queries over Streamed XML Fragments

489

arrived in any order, the definition of filler and hole can maintain the context of fragments. The main concept of hole-filler is that every fragment is treated as a “filler” and is associated with a unique ID (denoted as f id). When a fragment needs to refer to another fragment in a parent-child relationship, it includes a “hole”, whose ID (hid) matches the f id of the referenced filler fragment. Another important information transmitted by the server is Tag Structure. It is transmitted in the stream as structural summary that provides the structural make-up of the XML data and captures all the valid paths in the data. A tag corresponds to an XML tagged element and is qualified by a unique id (tsid), a name (the element tag name), and a type. For an element with type “Filler”, we prune the element in the document and make it the root of the subtree. For embedded type, the element is embedded within its parent element, that means it is inside the same fragment. This information is useful while expanding wildcard path selections in the queries. The DTD and the corresponding tag structure of the XML document (given in Figure 1) are depicted in Figure 2. 1

bo o k

+ 2

3

5

6

title

allautho rs

year

chapter

+ 4

7

autho r

head

* 8

sectio n

11

*

figure

*

9

head

10

12

title title

< stream : structure> < tag nam e= "bo o k" id= "1 " Filler= "true"> < tag nam e= "title" id= "2 " / > < tag nam e= "allautho rs" id= "3 " Filler= "true"> < tag nam e= "autho r" id= "4 "/> < /tag> < tag nam e= "year" id= "5 " /> < tag nam e= "chapter" id= "6 " Filler= "true"> < tag nam e= "head" id= "7 " /> < tag nam e= "sectio n" id= "8 " Filler= "true"> < tag nam e= "head" id= "9 " /> < tag nam e= "title" id= "1 0 " /> < /tag> < tag nam e= "figure" id= "1 1 " Filler= "true"> < tag nam e= "title" id= "1 2 " /> < /tag> < /tag> < /tag> < /stream : structure>

Fig. 2. Tag Structure of Hole-Filler Model

2.2

Region-Based Encoding Representation

We can associate fillers with holes by matching f ids with hids. However, it does not suffice since f id and hid in a fragment alone cannot directly capture the ancestor-descendant structural relationships between fragments. We extend the f id with one widely accepted encoding approach [7], where the position of an element occurrence is represented as a 4-tuple (DocId, StartPos, EndPos, Level): (i) DocId is the identifier of the document, which can be omitted with one single document involved (ii) StartPos is the number given in a preorder traversal of the document and EndPos is the number given in a post-order traversal of the document.(iii) and Level is the nesting depth of the fragment’s

490

X. Hui et al.

root element (or string value) in original document, helping to identify “parentchild” relationship between fragments. In figure 2, “section” and “head” are in the same filler and the level of this filler equals the level of “section” in the original document. We use the 3-tuple of the root element in the fragment representing f id. As for hid, we use the number of StartPos. Figure 3 gives three fragments of the document in Figure 1 after coding f id with (StartPos, EndPos, Level).

Fragm ent 1: XML 2000 ......

Fragm ent 2: Origins ......

Fragm ent 3: ... ...

Fig. 3. XML Document Fragments

Taking (StartPos, EndPos, Level) as f id, the fragment not only retains the link information between correlated fragments, but also indicates the descendant fragments within the region from StartPos to EndPos. Given the region codes, the interval (StartPos, EndPos) of two arbitrary fragments are either inclusive or exclusive, and we can get the ancestor-descendant relationship between nodes by testing their region codes.We suppose that f1 with region code (S1 , E1 , L1 ),f2 with region code (S2 , E2 , L2 ) are fragments. Ancestor-Descendant: f2 is a descendant of f1 iff S1 < S2 and E2 < E1 . e.g. Fragment 3 with code (20,29,3) is a descendant of Fragment 1 with code (1,70,1) in Figure 3. Parent-Child: f2 is a child of f1 iff (1) S1 < S2 and E2 < E1 ; and (2) L1 + 1 = L2 . e.g. Fragment 2 with code (16,40,2) is a child of Fragment 1 with code (1,70,1) in Figure 3.

3

XFPR Query Handling

Based on hole-filler model, infinite XML streams turn out to be a sequence of XML fragments, which become the basic processing units of the query. From the analysis of simple code, we can find that getting the ancestor-descendant relationship needs more steps of assuredness of parent-child relationship. So, waiting for a fragment to come to complete the information necessary for execution would result in blocking. Taking advantage of region coding scheme for fragments, we can skip evaluating the structural relationship between the fragments to expedite processing time, especially for nested path expressions.

Region-Based Coding for Queries over Streamed XML Fragments

491

This section focuses on the techniques based on region coding scheme. We first introduce the pruning polices on query expressions to eliminate “redundant” path evaluations. Then we present the query plan transformation techniques for efficient query handling with XML fragments. 3.1

Query Plan Generation

Linear Pattern Optimization. Let T be an optimized query tree after dependence pruning and T S a tag structure complying with the DTD, we can transform the queries on XML elements to queries on XML fragments. Since T captures all the possible tsids involved in the query according to tag structure, we only need to locate the corresponding fragments, which are presented by element nodes with type “filler” in query expressions. Note that T can also capture all the valid paths in query path. For linear pattern optimization, we only need to handle operators which can output results. For example, the end element of Query1 “/book/chapter/section/head” is “head”, and its type is not “filler”. However, the type of “section” element which is in the same filler with “head” is “filler”.We only need to handle particular fragment “section” in output operator by matching its tsid 8 and level 3 in region code without inquiring the parent fragments. A simple query with “/*”, “//”, for example, Query 2 “/book//author”, we can catch the relationship with “book” fillers and the fillers including “author” element since region coding can quickly and directly locate the ancestordescendant relationship without knowing the intermediate fillers. However, such case cannot be directly used for queries including predicates. Twig Pattern Optimization. It is not common for path expressions to have only connectors such as “/” and “//”. The location step can also include one or more predicates to further refine the selected set of nodes.We can simplify a path computation into an XML fragment matching operation after determine the key nodes in the query tree to speed up the query. In this way, queries involving one or more predicates or twig patterns can also be optimized. Definition 3.1. Let T S be a tag structure and T an optimized query tree after dependence pruning. For ti ∈ T , if the successor of ti is more than one nodes , or the predicate of ti is not null, then ti is defined as the key node in the query tree, in short KN (T ) = ti . Taking advantage of the region code of the fragments, we can quickly judge the ancestor-descendant relationship between fragments by comparing (StartPos, EndPos) of the nodes. If a.StartP os < d.StartP os and d.EndP os < a.EndP os, fragment a is the ancestor of fragment d. Since the content of fragment is guaranteed by tag structure, we can prune off the intermediate nodes in query tree, which are not key nodes and only keep the key nodes and the output nodes in the query tree. As for nested-loop step, we can apply the same policy and check the level number with the number of repetition steps.

492

X. Hui et al.

Considering the Query 3: /book/chapter[/head]/figure/title, whose query tree is presented in Figure 4. In (a), all kinds of fragments involved in this query are indicated. In (b), according to definition 3.1, we only keep the “chapter” and “figure” nodes since “chapter” has predicate and “figure”can output the result. In this way, we can compare ancestor-descendant relationship between “chapter” fillers and “figure” fillers without inquiring the “book” fillers. bo o k, tsid= 1 , Filler chapter,tsid= 6 , Filler head tsid=7 c hapter,

tsid= 6 figure, tsid= 1 1 , Filler

head tsid=7

title tsid=12

figure , tsid= 1 1 ,Filler title tsid=12

(a) Original Query Tree

(b) Optim ized Query Tree

Fig. 4. Query Tree of Query 3

Nested Pattern Optimization. Given an XPath p, we define a simple subexpression s of p if s is equal to the path of the tag nodes along a path < v1 , v2 , · · · vn > in the query tree of p, such that each vi is the parent node of vi+1 (1 ≤ i < n) and the label of each vi (except perhaps for v1 ) is prefixed only by “/”. If each vi shares the same tsid and the same predecessor, we define it as repetition step. For example, Query 4: /book/chapter/section/section/section/head is such a query involving repetition step, whose query tree is shown in Fig. 5 (a).

Tsid:

1

6

8

8

8

9

book

chap

sect

sect

sect

head

(a) Original Query Tree

1

6

8

9

book

chap

(s ect) 3

head

Tsid:

(b) Optim ized Query Tree

Fig. 5. Query Tree of Query 4

Query 4 involves three types of fragments with tsid 1, tsid 6 and tsid 8. Since ‘/section’ is a nested path step, the query tree includes three nodes with the same tsid 8. Repetition step in XPath expression degrades the performance significantly, especially when the repeated path is highly nested. Taking advantage

Region-Based Coding for Queries over Streamed XML Fragments

493

of level number, we can simplify such repetition path evaluation. Since parentchild operator ‘/’ indicates the nodes at adjacent level, we can optimize the query tree by pruning off repeated steps and recording the number of repetition step for further processing. In Query 4, ‘/section’ occurs three times, so we keep only one of them in query tree and embrace it with “( )”, at the right corner of which we mark 3. The query plan for Query 4 after pruning is shown in Figure 5 (b). According to the linear pattern optimization, we only need to handle “section” filler with level 5. Considering Query 2:/book//head fragments with tsid 1, tsid 6, and tsid 8. Since “/section” is a nested path step, there can be many repetition steps between “/book” and “//head”. Taking advantage of region coding, we can simplify such repetition path evaluation by checking only ancestor-descendant relationship between fragments with tsid 1, tsid 6 and tsid 8. In the optimized query plan, we can embrace “/section” and mark * that means we can handle “section” fillers in any level. 3.2

The XFPR Matching Algorithm

XFPR is based on optimized query plan after pruning off the “redundant” operations in query tree. In this section, we focus on the main algorithm of query evaluation in XFPR framework and then analyzing its efficiency comparing to previous work. The transform from query tree to query plan is a mapping from XPath expression and the tag structure to XFPR processor. For each node in query tree, the tsid of the element node with type “Filler” is corresponding to an entry of the hash table, which is characterized by a predecessor p, a bucket list b, and the tag structure corresponding to the fragment. The predecessor p resolves fragments relationships and predicate criteria. The bucket list b linked each fragment with the corresponding tsid of the node together, and each item is denoted as a f-tuple (f illerid, {holeid}, value), in which f illerid denotes (StartPos, EndPos, Level) in XFPR, holeid denotes (StartPos, tsid) in XFPR, value can be set to true, f alse, undecided (⊥), or a result fragment corresponding to predicates. While the former three values are possible in intermediate steps that do not produce a result, the latter is possible in the terminal step in the query tree branch. Algorithm 1 describes the processing method, which is based on the SAX event-based interface that reports parsing events. When a fragment is processed by XFPR, it first needs to verify if the predecessor operator has excluded its parent fragment due to either predicate failure or exclusion of its ancestor. If the ancestor fragment has arrived, the value of the f-tuple copies the status of its ancestor’s value, otherwise the value is tagged with an “⊥”. And it has to trigger the descendant fragments and pass the status value on to its descendants as fragments may be waiting on operators to decide on their ancestor eligibility.

494

X. Hui et al.

Algorithm 1 startElement() 1: if (isFragmentStart()==true) then 2: tsid=getTsid(); 3: f id=getFid(); // including f id.startP os, f id.endP os, f id.level 4: if ( hashFindOperator(tsid)!=null) then 5: fillInformation(); 6: if (isQueryFragment()==true) then 7: p=findAncestorOperator(tsid); 8: for each f-tuple f t of p do 9: if ( f t.f id.startP os < f id.startP os && f t.f id.endP os > f id.endP os && isSatisfied(f t.f id.level,f id.level) && f t.f id.value! = ⊥) then 10: currentValue=ft.value; 11: end if 12: end for 13: else 14: q=findDescendantOperator(tsid); 15: for each f-tuple f t of q do 16: if ( f id.startP os < f t.f id.startP os && f id.endP os > f t.f id.endP os && isSatisfied(f t.f id.level,f id.level) && f id.value! = ⊥) then 17: ft.value=currentValue; 18: end if 19: end for 20: end if 21: end if 22: end if

3.3

Algorithm Analysis

In this section, we illustrate the advantage of our region-coding scheme adopted in XFPR with different types of queries, and compare with the query efficiency with simple fid and hid numbering scheme in previous frameworks [8, 3]. Hierarchical Matching. As the illustrative example, consider Query 5://chapter/section/section/title, which returns the “title” of the “section” nested in other “sections”.In the previous frameworks with simple fid and hid numbering scheme, when fragment with fid 2, hids 4, 13 arrives, it is accepted by “chapter” operator by hashing to the corresponding entry of the hash table, and the link information are recorded as f-tuple (2, {4, 13}, undecided). The value identifies the fragment’s state which is decided by its parent fragment. If its parent fragment arrives, the value copies the parent fragment’s value, otherwise the fragment’s value is set “undecided”. Similarly, there are three “section” fragments with fid 4, hid 6, fid 6, hid 10 and fid 13, hid 15 arrive successively. After inquiring the predecessor operator and triggering the successive operator, the “section” fragments with fid 4 and 13 relate to the “chapter” fragment with fid 2, and fragment with fid 6 is the child fragment with fid 4. However, not all the “section” fragments can be output as the results, since only those “section” fragments matching the second “/section” steps in the query path contribute to

Region-Based Coding for Queries over Streamed XML Fragments

495

the results. As the fragment with fid 6 matches the second location step “/section” in the query path, it is output as the result after tagging “true” for the corresponding value of the fragment’s f-tuple. We can find out from Query 5 that the simple fid,hid numbering scheme is not efficient for some queries. In the hash table, each entry recording the arrived fragment information needs to execute two steps. One is inquiring predecessor, the other is triggering successive operators. It takes too much time to find the related fragments before query processing under such kind of manipulation. Moreover, after many such manipulations, filler still cannot output the result due to not matching the particular location step in query path. However, with region-coding scheme, we can easily identify the element level and speed up such hierarchical relationship operations. In our framework, we use nested pattern optimization and linear pattern optimization to generate the query plan for Query 5, we only need to check each arrived “section” fragment whether its tsid equals to “4” and its level equals to “4”. When “section” fragment with region code arrives, we directly output the element “title” as the result, which is in the “section” fragment representing the same element type and the fourth level in XML document tree. From the analysis of the query example, we can conclude that simple numbering scheme is not suitable for hierarchical relationship evaluation. Especially, if the path expression inquiries the element in a nested hierarchical step, it complexes the processing and costs more time. Obviously, region-based coding scheme shows strong superiority in hierarchical relationship matching. Skipping Fragments. Consider Query 6://Chapter/section[/figure]/*/figure/ title, which is a twig pattern expression with “*” involved. Tag structure defines the structure of data and captures all the valid paths. We can use it to expand wild-card path selections in queries to specify query execution. So after we specify the “*”node, Query 6 equals to “ book/chapter/section [/figure1] /section/figure2/title”. In order to distinguish the “figure” in branch expression and the “figure” in main path expression, we denote the former one as figure1 and the latter as figure 2. We know that the simple numbering scheme for f id and hid can only handle parent-child relationship between fragments. In this way, it might decelerate the query evaluation because it would result in blocking to wait for all the the correlated fragments to come to complete the query path. However, taking advantage of twig pattern optimization, XFPR can skip the intermediate fragments and output the results as soon as possible. For fragments with tsid 4 and tsid 7 satisfying the condition section.StartP os < f igure2.StartP os ∧ section.EndP os > f igure2.EndP os ∧ f igure2.Level = 5 ∧ section.Level = 3, algorithm outputs the results immediately if “figure1” fragment also satisfies the following condition:f igure1.StartP os > section.StartP os ∧ f igure1.EndP os < section.EndP os ∧ f igure1.Level = 4. The handle process adopted twig pattern optimization is that : assume the “section” fragment with region code (6, 27, 3) has already arrived and its information is recorded in the association table. When “figure” fragment with

496

X. Hui et al.

region code (15, 26, 5) arrives, it is compared with the “section” fragment by their StartPos, EndPos. Since 15 > 6, 26 < 27 and figure.Level=5, section.level=3 the “figure” fragment is one of the descendant fragments of the “section” fragment. However the “figure” fragment (mapping in the twig pattern) as the child of the “section” has not arrived yet, the “section” fragment can not verifies its value. So we can not output the “figure” fragment as the result. When the “figure” fragment with region code (9, 14, 4) arrives, its f-tuple value is set “true” since it satisfies the condition (i.e. Level = 4 ∧ 6 < 9 ∧ 27 > 14). So the “section” fragment triggers its descendant “figure” fragment with region code (15, 26, 5). Then the “title” element in the same fragment is output. From the analysis of the query example, we can conclude that by taking advantage of region coding scheme, checking an ancestor-descendant structural relationship is as easy as checking a parent-child structural relationship. Hence we can skip intermediate fragments along the path and produce the query results as soon as possible without waiting for all the correlated fragments to arrive.

4

Performance Evaluation

In this section, we present the results of performance evaluation of various algorithms over queries with different types, depths and document sizes on the same platform. All the experiments are run on a PC with 2.6GHz CPU, 512M memory. Data sets are generated by the xmlgen program. We have fragmented an XML document into fragments to produce an stream, based on the tag structure defining the fragmentation layout. And we implemented a query generator that takes the DTD as input and creates sets of XPath queries of different types and depths. We have used 4 queries on the document and compared the results among the following algorithms: (1) XFrag [3], (2) XFPro [8] and (3) XFPR. Figure 6 shows the queries that we used.

NO

Path expression

Q1

book/sec tion/title

Q2

book/sec tion/*/title

Q3

book//sec tion/title

Q4

book/sec tion[/figure/title]/sec tion/title

Fig. 6. Path expression

In Figure 7 (a), (b) three kinds of processing strategies over various query types are tested and compared. From the result, we can conclude that for any query type, XFPR outperforms its counterparts, and query performance doesn’t vary much on different query types. This because the XFPR only need to process the fragments that include key nodes or the output results. Furthermore, XFPro outperforms XFrag in time, because it deletes the dependent operations. But it

Region-Based Coding for Queries over Streamed XML Fragments

XFPR

XFPro

XFPR

XFrag

XFrag

XFPR

100000 50000

4000000 3000000 2000000 1000000

0 Q2

Q3

XFrag

150000 100000 50000

0 Q1

XFPro

200000

Memory cost(KB)

Time(ms)

Time(ms)

XFPro

5000000

150000

497

0 Q1

Q4

Q2

(a)

Q3

Q4

3 5 8 (c)Depth of Query

(b)

Fig. 7. Time with Different Queries

is not better than XFPR since it has to specify the query paths when queries including “*” or “//” and it cannot eliminate intermediate fillers. For XFrag, each fragment needs to be passed on through the pipeline and evaluated step by step. Therefore the performance of XFrag is affected by the character of query. For memory usage, complex queries will result in an increase in the number of operators joining in the query processing, along with more information in the association table and additional space consuming. Figure 7 (c) shows the time of various query depths, 3, 5 and 8 respectively on the three methods. When the depth increases, the time of XFrag and XFPro increases due to the increased path steps. While with region coding , XFPR greatly reduces intermediate path steps’ evaluation, thus time cost of deep queries is almost the same with that of short queries.

40000

XFPro

150000

0

100000

100000

XFPro

XFrag

XFPR

Memory cost(KB)

XFrag 2500000 2000000 1500000 1000000 500000 0

10M 15M (e)

XFPro

10M

15M

(h)

20M

200000

0 10M

15M

20M

5M

10M

(f)

XFrag

XFPR

8000000 6000000 4000000 2000000

XFPR

100000

5M

20M

0 5M

XFrag

300000

0 5M

20M

XFPro 400000

50000

Memory cost(KB)

10M 15M (d)

XFPR

150000

0 5M

XFrag

200000

50000

20000

Memory cost(KB)

XFPR

XFPro

XFrag

XFPR

Memory cost(KB)

60000

XFrag

200000

Time(ms)

XFPro

XFPR

80000

Time(ms)

XFrag

Time(ms)

Time(ms)

XFPro 100000

8000000 6000000 4000000 2000000

15M (g)

XFPro

20M

XFPR

8000000 6000000 4000000 2000000 0

0 5M

10M

15M (i)

20M

5M

10M

15M

20M

(j)

5M

10M

15M

20M

(k)

Fig. 8. Time with Different Documents

In Figure 8 (d),(e),(f) and (g), as the document size increases, XFrag observes the most costly, for much time is wasted in inserting fragments and finding relationship between them. In XFPro, with dependence pruning, it omits some operators corresponding the query path. The work XFPR performs best for the reason that quite a number of intermediate fillers are out of regard, and query path is greatly shortened. Figure 8 (h),(i),(j) and (k), illustrate the memory usage for different document size. For XFPR, memory usage is less impact with size increasing since many intermediate fillers are omited. For XFPro and XFrag,

498

X. Hui et al.

this case is in reverse. However, XFPro performs a bit better than XFrag. XFPro only considers subroot nodes in tid tree while XFrag handles all operators in pipeline. With the document size increasing, more operators means more redundant information, so space cost becomes large.

5

Conclusions

This paper adopts the region coding scheme for XML documents and adapts it to streamed XML fragment model. Taking advantage of region coding scheme, we model the query expressions into query tree and propose a set of techniques, which enable further analysis and optimizations. Based on this optimized query tree, we map a query tree directly into an XML fragment query processor, named XFPR, which speed up query processing by skipping correlating adjacent fragments. Our experimental results over XPath expressions with different properties have clearly demonstrated the benefits of our approach. Acknowledgement. This work is partially supported by National Natural Science Foundation of China under grant Nos. 60573089 and 60273074 and supported by Specialized Research Fund for the Doctoral Program of Higher Education under grant SRFDP 20040145016.

References 1. Fegaras, L., Levine, D., Bose, S., Chaluvadi, V.: Query processing of streamed XML data. In: Eleventh International Conference on Information and Knowledge Management (CIKM 2002), McLean, Virginia, USA (November 4–9, 2002) 2. Bose, S., Fegaras, L., Levine, D., Chaluvadi, V.: A query algebra for fragmented XML stream data. In: Proceedings of the 9th International Conference on Data Base Programming Languages, Potsdan, Germany (September 6–8, 2003) 3. Bose, S., Fegaras, L.: XFrag: A query processing framework for fragmented XML data. In: Eighth International Workshop on the Web and Databases (WebDB 2005), Baltimore, Maryland (June 16–17,2005) 4. Liu, Y., X.Liu, Xiao, L., .Ni, L., Zhang, X.: Location-aware topology matching in p2p systems. In: IEEE INFOCOM, Hongkong (2004) 5. Chen, L., Ng, R.: On the marriage of lp-norm and edit distance. In: Proceedings of 30th International Conference on Very Large DataBase, Toronto, Canada (August, 2004) 6. L. Chen, M.T.O., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proceedings of 24th ACM International Conference on Management of Data (SIGMOD’05), Baltimore, MD (June 2005) 7. Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA (May, 2001) 8. Huo, H., Wang, G., Hui, X., Zhou, R., Ning, B., Xiao, C.: Efficient query processing for streamed XML fragments. In: The 11th International Conference on Database Systems for Advanced Applications, Singapore (April 12–15,2006)

Region-Based Coding for Queries over Streamed XML ... - Springer Link

region-based coding scheme, this paper models the query expression into query tree and ...... Chen, L., Ng, R.: On the marriage of lp-norm and edit distance.

230KB Sizes 1 Downloads 262 Views

Recommend Documents

View selection for real conjunctive queries - Springer Link
Received: 28 May 2006 / Accepted: 17 April 2007 / Published online: 26 May 2007 ... Computer Science Department, North Carolina State University, ..... under set/bag/bag-set semantics (Q1 ≡s Q2, Q1 ≡b Q2, Q1 ≡bs Q2, respectively) if ...

Efficient Query Processing for Streamed XML Fragments
Institute of Computer System, Northeastern University, Shenyang, China ... and queries on parts of XML data require less memory and processing time.

Load Shedding for Window Joins over Streams - Springer Link
As for the scenarios of variable speed ratio, we develop a plan reallocating CPU resources and dynamically resizing the windows. ... tical to compare every tuple in one infinite stream with ...... tinous query system for Internet databases. In Proc.

XML schema refinement through redundancy detection ... - Springer Link
Feb 20, 2007 - egy to avoid data redundancies is to design redundancy-free schema from the ...... nodesCN,C A, andCNA are removed becauseC is an XML. Key. ..... the name element of province and the country attribute of city together ...

Finding Frequent Items over General Update Streams - Springer Link
satellite data processing system where continuous and voluminous weather data ...... Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet ... Y., Memik, G.: Monitoring Flow-level High-speed Data Streams with Reversible.

On the Effects of Frequency Scaling Over Capacity ... - Springer Link
Jan 17, 2013 - Springer Science+Business Media New York 2013 .... the scaling obtained by MH in wireless radio networks) without scaling the carrier ...

On the Effects of Frequency Scaling Over Capacity ... - Springer Link
Nov 7, 2012 - Department of Electrical and Computer Engineering, Northeastern ... In underwater acoustic communication systems, both bandwidth and received signal ... underwater acoustic channels, while network coding showed better performance than M

Adaptive Filters for Continuous Queries over Distributed ...
The central processor installs filters at remote ... Monitoring environmental conditions such as ... The central stream processor keeps a cached copy of [L o. ,H o. ] ...

Evaluation Strategies for Top-k Queries over ... - Research at Google
their results at The 37th International Conference on Very Large Data Bases,. August 29th ... The first way is to evaluate row by row, i.e., to process one ..... that we call Memory-Resident WAND (mWAND). The main difference between mWAND ...

Entity-Relationship Queries over Wikipedia
locations, events, etc. For discovering and .... Some systems [25, 17, 14, 6] explicitly encode entities and their relations ..... 〈Andy Bechtolsheim, Cisco Systems〉.

Processing Probabilistic Range Queries over ...
In recent years, uncertain data management has received considerable attention in the database community. It involves a large variety of real-world applications,.

Exploiting Graphics Processing Units for ... - Springer Link
Then we call the CUDA function. cudaMemcpy to ..... Processing Studies (AFIPS) Conference 30, 483–485. ... download.nvidia.com/compute/cuda/1 1/Website/.

Evidence for Cyclic Spell-Out - Springer Link
Jul 23, 2001 - embedding C0 as argued in Section 2.1, this allows one to test whether object ... descriptively head-final languages but also dominantly head-initial lan- ..... The Phonology-Syntax Connection, University of Chicago Press,.

MAJORIZATION AND ADDITIVITY FOR MULTIMODE ... - Springer Link
where 〈z|ρ|z〉 is the Husimi function, |z〉 are the Glauber coherent vectors, .... Let Φ be a Gaussian gauge-covariant channel and f be a concave function on [0, 1].

Tinospora crispa - Springer Link
naturally free from side effects are still in use by diabetic patients, especially in Third .... For the perifusion studies, data from rat islets are presented as mean absolute .... treated animals showed signs of recovery in body weight gains, reach

Chloraea alpina - Springer Link
Many floral characters influence not only pollen receipt and seed set but also pollen export and the number of seeds sired in the .... inserted by natural agents were not included in the final data set. Data were analysed with a ..... Ashman, T.L. an

GOODMAN'S - Springer Link
relation (evidential support) in “grue” contexts, not a logical relation (the ...... Fitelson, B.: The paradox of confirmation, Philosophy Compass, in B. Weatherson.

Bubo bubo - Springer Link
a local spatial-scale analysis. Joaquın Ortego Æ Pedro J. Cordero. Received: 16 March 2009 / Accepted: 17 August 2009 / Published online: 4 September 2009. Ó Springer Science+Business Media B.V. 2009. Abstract Knowledge of the factors influencing

Quantum Programming - Springer Link
Abstract. In this paper a programming language, qGCL, is presented for the expression of quantum algorithms. It contains the features re- quired to program a 'universal' quantum computer (including initiali- sation and observation), has a formal sema

BMC Bioinformatics - Springer Link
Apr 11, 2008 - Abstract. Background: This paper describes the design of an event ontology being developed for application in the machine understanding of infectious disease-related events reported in natural language text. This event ontology is desi

Isoperimetric inequalities for submanifolds with ... - Springer Link
Jul 23, 2011 - if ωn is the volume of a unit ball in Rn, then. nnωnVol(D)n−1 ≤ Vol(∂D)n and equality holds if and only if D is a ball. As an extension of the above classical isoperimetric inequality, it is conjectured that any n-dimensional c