Region-Based Coding for Queries over Streamed XML ... - Springer Link

Viewer
Transcript

Region-Based Coding for Queries over Streamed XML Fragments Xiaoyun Hui, Guoren Wang, Huan Huo, Chuan Xiao, and Rui Zhou Northeastern University, Shenyang, China [email protected]

Abstract. Recently proposed Hole-Filler model is promising for transmitting and evaluating streamed XML fragments. However, by simply matching ﬁller IDs with hole IDs, associating all the correlated fragments to complete the query path would result in blocking. Taking advantage of region-based coding scheme, this paper models the query expression into query tree and proposes a set of techniques to optimize the query plan. It then proposes XFPR (XML Fragment Processor with Region code) to speed up query processing by skipping correlating adjacent fragments. We illustrate the eﬀectiveness of the techniques developed with a detailed set of experiments.

1

Introduction

As an emerging standard for data representation and exchange on the Web, XML is adopted by more and more applications for their information description. Recently, several applications of XML stream processing have emerged, and many research works focus on answering queries on streamed XML data, which has to be analyzed in real-time and by one pass. In order to decrease the transmission and evaluation cost, Hole-Filler model [1] is proposed. As a result, Xstreamcast [2], XFrag [3], and other approaches focus on dealing with fragmented XML data based on Hole-Filler model. Figure 1 gives an XML document and its DOM tree, which acts as an example of our work .

XML jane john 2000 Origins ... ......

book

title

allauthors

year

c hapter

c hapter

...... XML author author 2000 head

sec tion sec tion

...... jane

john

Origins

head

Fig. 1. An XML Document and its DOM Tree K. Aberer et al. (Eds.): WISE 2006, LNCS 4255, pp. 487–498, 2006. c Springer-Verlag Berlin Heidelberg 2006

sec tion

488

X. Hui et al.

Data fragmentation oﬀers various attractive alternatives to organize and manage data. In this way, inﬁnite XML streams turn out to be a sequence of XML fragments, and queries on parts of XML data require less memory and processing time. Furthermore, changes to XML data may pose less overhead by sending only fragments corresponding to the changes, instead of sending the entire document. However, recently proposed frameworks have not fully exploited the advantage of Hole-Filler model. In XFrag, a novel pipelined framework is presented for processing XQueries to achieve processing and memory eﬃciency. XML fragments are processed as and when they arrived and only those messages that may eﬀect on the query results are kept in the association table. However, the XFrag pipeline is still space consuming in maintaining the links in the association tables and time cost in scheduling the operations for each fragment. And it can not avoid “redundant” operations when dependence occurs between adjacent operators. This paper presents a new framework and a set of techniques for eﬃciently processing XPath queries over streamed XML fragment. We make the following contributions: (i)we adopt the region-coding scheme for XML documents and adapt it to streamed XML fragment model. This coding scheme simpliﬁes the method checking structural relationship (such as parent-child relationship “/” and ancestor-descendant relationship “//”) between fragments and speeds up complex query operations, especially for nested-loop query and twig pattern query. (ii)we propose techniques for enabling the transformation from XPath expression to optimized query plan. We model the query expression using query tree and enable further analysis and optimizations by eliminating the “redundant” path evaluations. (iii)based on optimized Query tree, we propose query plan directly into an XML fragment query processor, named XFPR, which speed up query processing by skipping correlating adjacent fragments. Note that, we assume the query clients cannot reconstruct the entire XML data before processing the queries. The rest of this paper is organized as follows. Section 2 introduces our regionbased coding scheme for streamed XML fragments. Section 3 gives a detailed statement of our XML fragment processing framework. Section 4 shows experimental results from our implementation and shows the processing eﬃciency of our framework. Our conclusions are contained in Section 5.

2

Region-Based Coding Scheme on Hole-Filler Model

In our approach, we employ hole-ﬁller model [1] to describe XML fragments. In order to specify the relationships between fragments, we adopt the region-based coding scheme of XML documents and adapt it to hole-ﬁller model that holds both the data contents and structural relationships. 2.1

Preliminary Hole-Filler Model

In the Hole-Filler model, document is pruned into many fragments. Those fragments are correlated to each other by ﬁllers and holes. Since fragments may

Region-Based Coding for Queries over Streamed XML Fragments

489

arrived in any order, the deﬁnition of ﬁller and hole can maintain the context of fragments. The main concept of hole-ﬁller is that every fragment is treated as a “ﬁller” and is associated with a unique ID (denoted as f id). When a fragment needs to refer to another fragment in a parent-child relationship, it includes a “hole”, whose ID (hid) matches the f id of the referenced ﬁller fragment. Another important information transmitted by the server is Tag Structure. It is transmitted in the stream as structural summary that provides the structural make-up of the XML data and captures all the valid paths in the data. A tag corresponds to an XML tagged element and is qualiﬁed by a unique id (tsid), a name (the element tag name), and a type. For an element with type “Filler”, we prune the element in the document and make it the root of the subtree. For embedded type, the element is embedded within its parent element, that means it is inside the same fragment. This information is useful while expanding wildcard path selections in the queries. The DTD and the corresponding tag structure of the XML document (given in Figure 1) are depicted in Figure 2. 1

bo o k

+ 2

3

5

6

title

allautho rs

year

chapter

+ 4

7

autho r

head

* 8

sectio n

11

*

figure

*

9

head

10

12

title title

< stream : structure> < tag nam e= "bo o k" id= "1 " Filler= "true"> < tag nam e= "title" id= "2 " / > < tag nam e= "allautho rs" id= "3 " Filler= "true"> < tag nam e= "autho r" id= "4 "/> < /tag> < tag nam e= "year" id= "5 " /> < tag nam e= "chapter" id= "6 " Filler= "true"> < tag nam e= "head" id= "7 " /> < tag nam e= "sectio n" id= "8 " Filler= "true"> < tag nam e= "head" id= "9 " /> < tag nam e= "title" id= "1 0 " /> < /tag> < tag nam e= "figure" id= "1 1 " Filler= "true"> < tag nam e= "title" id= "1 2 " /> < /tag> < /tag> < /tag> < /stream : structure>

Fig. 2. Tag Structure of Hole-Filler Model

2.2

Region-Based Encoding Representation

We can associate ﬁllers with holes by matching f ids with hids. However, it does not suﬃce since f id and hid in a fragment alone cannot directly capture the ancestor-descendant structural relationships between fragments. We extend the f id with one widely accepted encoding approach [7], where the position of an element occurrence is represented as a 4-tuple (DocId, StartPos, EndPos, Level): (i) DocId is the identiﬁer of the document, which can be omitted with one single document involved (ii) StartPos is the number given in a preorder traversal of the document and EndPos is the number given in a post-order traversal of the document.(iii) and Level is the nesting depth of the fragment’s

490

X. Hui et al.

root element (or string value) in original document, helping to identify “parentchild” relationship between fragments. In ﬁgure 2, “section” and “head” are in the same ﬁller and the level of this ﬁller equals the level of “section” in the original document. We use the 3-tuple of the root element in the fragment representing f id. As for hid, we use the number of StartPos. Figure 3 gives three fragments of the document in Figure 1 after coding f id with (StartPos, EndPos, Level).

Fragm ent 1: XML 2000 ......

Fragm ent 2: Origins ......

Fragm ent 3: ... ...

Fig. 3. XML Document Fragments

Taking (StartPos, EndPos, Level) as f id, the fragment not only retains the link information between correlated fragments, but also indicates the descendant fragments within the region from StartPos to EndPos. Given the region codes, the interval (StartPos, EndPos) of two arbitrary fragments are either inclusive or exclusive, and we can get the ancestor-descendant relationship between nodes by testing their region codes.We suppose that f1 with region code (S1 , E1 , L1 ),f2 with region code (S2 , E2 , L2 ) are fragments. Ancestor-Descendant: f2 is a descendant of f1 iﬀ S1 < S2 and E2 < E1 . e.g. Fragment 3 with code (20,29,3) is a descendant of Fragment 1 with code (1,70,1) in Figure 3. Parent-Child: f2 is a child of f1 iﬀ (1) S1 < S2 and E2 < E1 ; and (2) L1 + 1 = L2 . e.g. Fragment 2 with code (16,40,2) is a child of Fragment 1 with code (1,70,1) in Figure 3.

3

XFPR Query Handling

Based on hole-ﬁller model, inﬁnite XML streams turn out to be a sequence of XML fragments, which become the basic processing units of the query. From the analysis of simple code, we can ﬁnd that getting the ancestor-descendant relationship needs more steps of assuredness of parent-child relationship. So, waiting for a fragment to come to complete the information necessary for execution would result in blocking. Taking advantage of region coding scheme for fragments, we can skip evaluating the structural relationship between the fragments to expedite processing time, especially for nested path expressions.

Region-Based Coding for Queries over Streamed XML Fragments

491

This section focuses on the techniques based on region coding scheme. We ﬁrst introduce the pruning polices on query expressions to eliminate “redundant” path evaluations. Then we present the query plan transformation techniques for eﬃcient query handling with XML fragments. 3.1

Query Plan Generation

Linear Pattern Optimization. Let T be an optimized query tree after dependence pruning and T S a tag structure complying with the DTD, we can transform the queries on XML elements to queries on XML fragments. Since T captures all the possible tsids involved in the query according to tag structure, we only need to locate the corresponding fragments, which are presented by element nodes with type “ﬁller” in query expressions. Note that T can also capture all the valid paths in query path. For linear pattern optimization, we only need to handle operators which can output results. For example, the end element of Query1 “/book/chapter/section/head” is “head”, and its type is not “ﬁller”. However, the type of “section” element which is in the same ﬁller with “head” is “ﬁller”.We only need to handle particular fragment “section” in output operator by matching its tsid 8 and level 3 in region code without inquiring the parent fragments. A simple query with “/*”, “//”, for example, Query 2 “/book//author”, we can catch the relationship with “book” ﬁllers and the ﬁllers including “author” element since region coding can quickly and directly locate the ancestordescendant relationship without knowing the intermediate ﬁllers. However, such case cannot be directly used for queries including predicates. Twig Pattern Optimization. It is not common for path expressions to have only connectors such as “/” and “//”. The location step can also include one or more predicates to further reﬁne the selected set of nodes.We can simplify a path computation into an XML fragment matching operation after determine the key nodes in the query tree to speed up the query. In this way, queries involving one or more predicates or twig patterns can also be optimized. Definition 3.1. Let T S be a tag structure and T an optimized query tree after dependence pruning. For ti ∈ T , if the successor of ti is more than one nodes , or the predicate of ti is not null, then ti is defined as the key node in the query tree, in short KN (T ) = ti . Taking advantage of the region code of the fragments, we can quickly judge the ancestor-descendant relationship between fragments by comparing (StartPos, EndPos) of the nodes. If a.StartP os < d.StartP os and d.EndP os < a.EndP os, fragment a is the ancestor of fragment d. Since the content of fragment is guaranteed by tag structure, we can prune oﬀ the intermediate nodes in query tree, which are not key nodes and only keep the key nodes and the output nodes in the query tree. As for nested-loop step, we can apply the same policy and check the level number with the number of repetition steps.

492

X. Hui et al.

Considering the Query 3: /book/chapter[/head]/ﬁgure/title, whose query tree is presented in Figure 4. In (a), all kinds of fragments involved in this query are indicated. In (b), according to deﬁnition 3.1, we only keep the “chapter” and “ﬁgure” nodes since “chapter” has predicate and “ﬁgure”can output the result. In this way, we can compare ancestor-descendant relationship between “chapter” ﬁllers and “ﬁgure” ﬁllers without inquiring the “book” ﬁllers. bo o k, tsid= 1 , Filler chapter,tsid= 6 , Filler head tsid=7 c hapter,

tsid= 6 figure, tsid= 1 1 , Filler

head tsid=7

title tsid=12

figure , tsid= 1 1 ,Filler title tsid=12

(a) Original Query Tree

(b) Optim ized Query Tree

Fig. 4. Query Tree of Query 3

Nested Pattern Optimization. Given an XPath p, we deﬁne a simple subexpression s of p if s is equal to the path of the tag nodes along a path < v1 , v2 , · · · vn > in the query tree of p, such that each vi is the parent node of vi+1 (1 ≤ i < n) and the label of each vi (except perhaps for v1 ) is preﬁxed only by “/”. If each vi shares the same tsid and the same predecessor, we deﬁne it as repetition step. For example, Query 4: /book/chapter/section/section/section/head is such a query involving repetition step, whose query tree is shown in Fig. 5 (a).

Tsid:

1

6

8

8

8

9

book

chap

sect

sect

sect

head

(a) Original Query Tree

1

6

8

9

book

chap

(s ect) 3

head

Tsid:

(b) Optim ized Query Tree

Fig. 5. Query Tree of Query 4

Query 4 involves three types of fragments with tsid 1, tsid 6 and tsid 8. Since ‘/section’ is a nested path step, the query tree includes three nodes with the same tsid 8. Repetition step in XPath expression degrades the performance signiﬁcantly, especially when the repeated path is highly nested. Taking advantage

Region-Based Coding for Queries over Streamed XML Fragments

493

of level number, we can simplify such repetition path evaluation. Since parentchild operator ‘/’ indicates the nodes at adjacent level, we can optimize the query tree by pruning oﬀ repeated steps and recording the number of repetition step for further processing. In Query 4, ‘/section’ occurs three times, so we keep only one of them in query tree and embrace it with “( )”, at the right corner of which we mark 3. The query plan for Query 4 after pruning is shown in Figure 5 (b). According to the linear pattern optimization, we only need to handle “section” ﬁller with level 5. Considering Query 2:/book//head fragments with tsid 1, tsid 6, and tsid 8. Since “/section” is a nested path step, there can be many repetition steps between “/book” and “//head”. Taking advantage of region coding, we can simplify such repetition path evaluation by checking only ancestor-descendant relationship between fragments with tsid 1, tsid 6 and tsid 8. In the optimized query plan, we can embrace “/section” and mark * that means we can handle “section” ﬁllers in any level. 3.2

The XFPR Matching Algorithm

XFPR is based on optimized query plan after pruning oﬀ the “redundant” operations in query tree. In this section, we focus on the main algorithm of query evaluation in XFPR framework and then analyzing its eﬃciency comparing to previous work. The transform from query tree to query plan is a mapping from XPath expression and the tag structure to XFPR processor. For each node in query tree, the tsid of the element node with type “Filler” is corresponding to an entry of the hash table, which is characterized by a predecessor p, a bucket list b, and the tag structure corresponding to the fragment. The predecessor p resolves fragments relationships and predicate criteria. The bucket list b linked each fragment with the corresponding tsid of the node together, and each item is denoted as a f-tuple (f illerid, {holeid}, value), in which f illerid denotes (StartPos, EndPos, Level) in XFPR, holeid denotes (StartPos, tsid) in XFPR, value can be set to true, f alse, undecided (⊥), or a result fragment corresponding to predicates. While the former three values are possible in intermediate steps that do not produce a result, the latter is possible in the terminal step in the query tree branch. Algorithm 1 describes the processing method, which is based on the SAX event-based interface that reports parsing events. When a fragment is processed by XFPR, it ﬁrst needs to verify if the predecessor operator has excluded its parent fragment due to either predicate failure or exclusion of its ancestor. If the ancestor fragment has arrived, the value of the f-tuple copies the status of its ancestor’s value, otherwise the value is tagged with an “⊥”. And it has to trigger the descendant fragments and pass the status value on to its descendants as fragments may be waiting on operators to decide on their ancestor eligibility.

494

X. Hui et al.

Algorithm 1 startElement() 1: if (isFragmentStart()==true) then 2: tsid=getTsid(); 3: f id=getFid(); // including f id.startP os, f id.endP os, f id.level 4: if ( hashFindOperator(tsid)!=null) then 5: ﬁllInformation(); 6: if (isQueryFragment()==true) then 7: p=ﬁndAncestorOperator(tsid); 8: for each f-tuple f t of p do 9: if ( f t.f id.startP os < f id.startP os && f t.f id.endP os > f id.endP os && isSatisﬁed(f t.f id.level,f id.level) && f t.f id.value! = ⊥) then 10: currentValue=ft.value; 11: end if 12: end for 13: else 14: q=ﬁndDescendantOperator(tsid); 15: for each f-tuple f t of q do 16: if ( f id.startP os < f t.f id.startP os && f id.endP os > f t.f id.endP os && isSatisﬁed(f t.f id.level,f id.level) && f id.value! = ⊥) then 17: ft.value=currentValue; 18: end if 19: end for 20: end if 21: end if 22: end if

3.3

Algorithm Analysis

In this section, we illustrate the advantage of our region-coding scheme adopted in XFPR with diﬀerent types of queries, and compare with the query eﬃciency with simple ﬁd and hid numbering scheme in previous frameworks [8, 3]. Hierarchical Matching. As the illustrative example, consider Query 5://chapter/section/section/title, which returns the “title” of the “section” nested in other “sections”.In the previous frameworks with simple ﬁd and hid numbering scheme, when fragment with ﬁd 2, hids 4, 13 arrives, it is accepted by “chapter” operator by hashing to the corresponding entry of the hash table, and the link information are recorded as f-tuple (2, {4, 13}, undecided). The value identiﬁes the fragment’s state which is decided by its parent fragment. If its parent fragment arrives, the value copies the parent fragment’s value, otherwise the fragment’s value is set “undecided”. Similarly, there are three “section” fragments with ﬁd 4, hid 6, ﬁd 6, hid 10 and ﬁd 13, hid 15 arrive successively. After inquiring the predecessor operator and triggering the successive operator, the “section” fragments with ﬁd 4 and 13 relate to the “chapter” fragment with ﬁd 2, and fragment with ﬁd 6 is the child fragment with ﬁd 4. However, not all the “section” fragments can be output as the results, since only those “section” fragments matching the second “/section” steps in the query path contribute to

Region-Based Coding for Queries over Streamed XML Fragments

495

the results. As the fragment with ﬁd 6 matches the second location step “/section” in the query path, it is output as the result after tagging “true” for the corresponding value of the fragment’s f-tuple. We can ﬁnd out from Query 5 that the simple fid,hid numbering scheme is not eﬃcient for some queries. In the hash table, each entry recording the arrived fragment information needs to execute two steps. One is inquiring predecessor, the other is triggering successive operators. It takes too much time to ﬁnd the related fragments before query processing under such kind of manipulation. Moreover, after many such manipulations, ﬁller still cannot output the result due to not matching the particular location step in query path. However, with region-coding scheme, we can easily identify the element level and speed up such hierarchical relationship operations. In our framework, we use nested pattern optimization and linear pattern optimization to generate the query plan for Query 5, we only need to check each arrived “section” fragment whether its tsid equals to “4” and its level equals to “4”. When “section” fragment with region code arrives, we directly output the element “title” as the result, which is in the “section” fragment representing the same element type and the fourth level in XML document tree. From the analysis of the query example, we can conclude that simple numbering scheme is not suitable for hierarchical relationship evaluation. Especially, if the path expression inquiries the element in a nested hierarchical step, it complexes the processing and costs more time. Obviously, region-based coding scheme shows strong superiority in hierarchical relationship matching. Skipping Fragments. Consider Query 6://Chapter/section[/figure]/*/figure/ title, which is a twig pattern expression with “*” involved. Tag structure deﬁnes the structure of data and captures all the valid paths. We can use it to expand wild-card path selections in queries to specify query execution. So after we specify the “*”node, Query 6 equals to “ book/chapter/section [/ﬁgure1] /section/ﬁgure2/title”. In order to distinguish the “ﬁgure” in branch expression and the “ﬁgure” in main path expression, we denote the former one as ﬁgure1 and the latter as ﬁgure 2. We know that the simple numbering scheme for f id and hid can only handle parent-child relationship between fragments. In this way, it might decelerate the query evaluation because it would result in blocking to wait for all the the correlated fragments to come to complete the query path. However, taking advantage of twig pattern optimization, XFPR can skip the intermediate fragments and output the results as soon as possible. For fragments with tsid 4 and tsid 7 satisfying the condition section.StartP os < f igure2.StartP os ∧ section.EndP os > f igure2.EndP os ∧ f igure2.Level = 5 ∧ section.Level = 3, algorithm outputs the results immediately if “ﬁgure1” fragment also satisﬁes the following condition:f igure1.StartP os > section.StartP os ∧ f igure1.EndP os < section.EndP os ∧ f igure1.Level = 4. The handle process adopted twig pattern optimization is that : assume the “section” fragment with region code (6, 27, 3) has already arrived and its information is recorded in the association table. When “ﬁgure” fragment with

496

X. Hui et al.

region code (15, 26, 5) arrives, it is compared with the “section” fragment by their StartPos, EndPos. Since 15 > 6, 26 < 27 and ﬁgure.Level=5, section.level=3 the “ﬁgure” fragment is one of the descendant fragments of the “section” fragment. However the “ﬁgure” fragment (mapping in the twig pattern) as the child of the “section” has not arrived yet, the “section” fragment can not veriﬁes its value. So we can not output the “ﬁgure” fragment as the result. When the “ﬁgure” fragment with region code (9, 14, 4) arrives, its f-tuple value is set “true” since it satisﬁes the condition (i.e. Level = 4 ∧ 6 < 9 ∧ 27 > 14). So the “section” fragment triggers its descendant “ﬁgure” fragment with region code (15, 26, 5). Then the “title” element in the same fragment is output. From the analysis of the query example, we can conclude that by taking advantage of region coding scheme, checking an ancestor-descendant structural relationship is as easy as checking a parent-child structural relationship. Hence we can skip intermediate fragments along the path and produce the query results as soon as possible without waiting for all the correlated fragments to arrive.

4

Performance Evaluation

In this section, we present the results of performance evaluation of various algorithms over queries with diﬀerent types, depths and document sizes on the same platform. All the experiments are run on a PC with 2.6GHz CPU, 512M memory. Data sets are generated by the xmlgen program. We have fragmented an XML document into fragments to produce an stream, based on the tag structure deﬁning the fragmentation layout. And we implemented a query generator that takes the DTD as input and creates sets of XPath queries of diﬀerent types and depths. We have used 4 queries on the document and compared the results among the following algorithms: (1) XFrag [3], (2) XFPro [8] and (3) XFPR. Figure 6 shows the queries that we used.

NO

Path expression

Q1

book/sec tion/title

Q2

book/sec tion/*/title

Q3

book//sec tion/title

Q4

book/sec tion[/figure/title]/sec tion/title

Fig. 6. Path expression

In Figure 7 (a), (b) three kinds of processing strategies over various query types are tested and compared. From the result, we can conclude that for any query type, XFPR outperforms its counterparts, and query performance doesn’t vary much on diﬀerent query types. This because the XFPR only need to process the fragments that include key nodes or the output results. Furthermore, XFPro outperforms XFrag in time, because it deletes the dependent operations. But it

Region-Based Coding for Queries over Streamed XML Fragments

XFPR

XFPro

XFPR

XFrag

XFrag

XFPR

100000 50000

4000000 3000000 2000000 1000000

0 Q2

Q3

XFrag

150000 100000 50000

0 Q1

XFPro

200000

Memory cost(KB)

Time(ms)

Time(ms)

XFPro

5000000

150000

497

0 Q1

Q4

Q2

(a)

Q3

Q4

3 5 8 (c)Depth of Query

(b)

Fig. 7. Time with Diﬀerent Queries

is not better than XFPR since it has to specify the query paths when queries including “*” or “//” and it cannot eliminate intermediate ﬁllers. For XFrag, each fragment needs to be passed on through the pipeline and evaluated step by step. Therefore the performance of XFrag is aﬀected by the character of query. For memory usage, complex queries will result in an increase in the number of operators joining in the query processing, along with more information in the association table and additional space consuming. Figure 7 (c) shows the time of various query depths, 3, 5 and 8 respectively on the three methods. When the depth increases, the time of XFrag and XFPro increases due to the increased path steps. While with region coding , XFPR greatly reduces intermediate path steps’ evaluation, thus time cost of deep queries is almost the same with that of short queries.

40000

XFPro

150000

0

100000

100000

XFPro

XFrag

XFPR

Memory cost(KB)

XFrag 2500000 2000000 1500000 1000000 500000 0

10M 15M (e)

XFPro

10M

15M

(h)

20M

200000

0 10M

15M

20M

5M

10M

(f)

XFrag

XFPR

8000000 6000000 4000000 2000000

XFPR

100000

5M

20M

0 5M

XFrag

300000

0 5M

20M

XFPro 400000

50000

Memory cost(KB)

10M 15M (d)

XFPR

150000

0 5M

XFrag

200000

50000

20000

Memory cost(KB)

XFPR

XFPro

XFrag

XFPR

Memory cost(KB)

60000

XFrag

200000

Time(ms)

XFPro

XFPR

80000

Time(ms)

XFrag

Time(ms)

Time(ms)

XFPro 100000

8000000 6000000 4000000 2000000

15M (g)

XFPro

20M

XFPR

8000000 6000000 4000000 2000000 0

0 5M

10M

15M (i)

20M

5M

10M

15M

20M

(j)

5M

10M

15M

20M

(k)

Fig. 8. Time with Diﬀerent Documents

In Figure 8 (d),(e),(f) and (g), as the document size increases, XFrag observes the most costly, for much time is wasted in inserting fragments and ﬁnding relationship between them. In XFPro, with dependence pruning, it omits some operators corresponding the query path. The work XFPR performs best for the reason that quite a number of intermediate ﬁllers are out of regard, and query path is greatly shortened. Figure 8 (h),(i),(j) and (k), illustrate the memory usage for diﬀerent document size. For XFPR, memory usage is less impact with size increasing since many intermediate ﬁllers are omited. For XFPro and XFrag,

498

X. Hui et al.

this case is in reverse. However, XFPro performs a bit better than XFrag. XFPro only considers subroot nodes in tid tree while XFrag handles all operators in pipeline. With the document size increasing, more operators means more redundant information, so space cost becomes large.

5

Conclusions

This paper adopts the region coding scheme for XML documents and adapts it to streamed XML fragment model. Taking advantage of region coding scheme, we model the query expressions into query tree and propose a set of techniques, which enable further analysis and optimizations. Based on this optimized query tree, we map a query tree directly into an XML fragment query processor, named XFPR, which speed up query processing by skipping correlating adjacent fragments. Our experimental results over XPath expressions with diﬀerent properties have clearly demonstrated the beneﬁts of our approach. Acknowledgement. This work is partially supported by National Natural Science Foundation of China under grant Nos. 60573089 and 60273074 and supported by Specialized Research Fund for the Doctoral Program of Higher Education under grant SRFDP 20040145016.

References 1. Fegaras, L., Levine, D., Bose, S., Chaluvadi, V.: Query processing of streamed XML data. In: Eleventh International Conference on Information and Knowledge Management (CIKM 2002), McLean, Virginia, USA (November 4–9, 2002) 2. Bose, S., Fegaras, L., Levine, D., Chaluvadi, V.: A query algebra for fragmented XML stream data. In: Proceedings of the 9th International Conference on Data Base Programming Languages, Potsdan, Germany (September 6–8, 2003) 3. Bose, S., Fegaras, L.: XFrag: A query processing framework for fragmented XML data. In: Eighth International Workshop on the Web and Databases (WebDB 2005), Baltimore, Maryland (June 16–17,2005) 4. Liu, Y., X.Liu, Xiao, L., .Ni, L., Zhang, X.: Location-aware topology matching in p2p systems. In: IEEE INFOCOM, Hongkong (2004) 5. Chen, L., Ng, R.: On the marriage of lp-norm and edit distance. In: Proceedings of 30th International Conference on Very Large DataBase, Toronto, Canada (August, 2004) 6. L. Chen, M.T.O., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proceedings of 24th ACM International Conference on Management of Data (SIGMOD’05), Baltimore, MD (June 2005) 7. Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA (May, 2001) 8. Huo, H., Wang, G., Hui, X., Zhou, R., Ning, B., Xiao, C.: Eﬃcient query processing for streamed XML fragments. In: The 11th International Conference on Database Systems for Advanced Applications, Singapore (April 12–15,2006)

Region-Based Coding for Queries over Streamed XML ... - Springer Link

region-based coding scheme, this paper models the query expression into query tree and ...... Chen, L., Ng, R.: On the marriage of lp-norm and edit distance.

Download PDF

230KB Sizes 1 Downloads 301 Views

Report

Region-Based Coding for Queries over Streamed XML ... - Springer Link

Recommend Documents