A Combinatorial Approach to Building Navigation Graphs for Dynamic Web Applications W. Wang1, Y. Lei1, S. Sampath2, R. Kacker3, R. Kuhn3, J. Lawrence4 1University
of Texas at Arlington 2University of Maryland, Baltimore County 3National Institute of Standards and Technology 4George Mason University 9/24/2009
Outline • Introduction • Basic concepts, Challenges
• Our approach • Abstract URL, Pairwise strategy, Algorithm design, Tool
• Experiments • Design, Subject applications, Empirical results
• Related Work • Conclusion
09/24/2009
2/27
Navigation graph • A navigation graph represents the navigation structure of a web application. • A node represents a web page. • An edge represents one transition between two nodes.
• Usage: regression testing, impact analysis • Has an expected navigation path been implemented? • Has an unexpected navigation path been introduced? • What pages will be affected if one page is changed? 09/24/2009
3/27
Challenges • Page explosion problem • An astronomical number of dynamic web pages, possibly infinite web pages • Example: a web application may dynamically generate greeting pages for different users.
• Navigation structure capture problem • Some dynamic web pages may not be reached unless appropriate requests are supplied. • Example: searching flights in the studentuniverse web site.
09/24/2009
4/27
Challenges • Form parameters: departure city, arrival city, departure date, return date. • City name: Dallas, Denver, Detroit, Edmonton. • Date: Sep. 29, Sep. 30. • home page->error page, captured by special combinations between two parameters. • the departure city is the same to arrival city. • the return date is before departure date.
• home page->searchResults page, captured by other ordinary combinations. 09/24/2009
5/27
Outline • Introduction • Basic concepts, Challenges
• Our approach • Abstract URL, Pairwise strategy, Algorithm design, Tool
• Experiments • Design, Subject applications, Empirical results
• Related Work • Conclusion
09/24/2009
6/27
Abstract URL • One abstract URL represents a group of concrete URLs. • These concrete URLs have the same base component and the same parameters in the query component.
• Example: • u1 = “http://test.com/foo.jsp?x=1&y=2” • u2 = “http://test.com/foo.jsp?x=0&y=3” • U= “http://test.com/foo.jsp?x&y”
09/24/2009
Figure 1: An URL example
7/27
Pairwise strategy • Given any two out of the k parameters, we ensure that every value combination between any two parameters is covered in at least once. • Our approach generates pairwise input combinations for forms to capture navigation structures behind forms. p1
p2
p3
p1 p2 p3 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1 Figure 3: Combinations from pairwise testing 1 1 0 1 1 1 Figure 2: Combinations from the exhaustive testing 09/24/2009
8/27
Algorithm design
09/24/2009
Figure 4: Algorithm flow graph
9/27
Tansuo’s architecture
Figure 5: Tansuo’s architecture 09/24/2009
10/27
Tansuo’s architecture • Builder: • drives the entire exploration process.
• Fetcher: • fetches a page from the web server.
• Parser: • extracts static links and forms from a page.
• Form Handler: • Obtains values for form parameters. • fills forms with combinations. • Obtains URLs from form submissions.
• Fireeye: • generates pairwise input combinations.
• State Manager: • Resets the database. • Re-exercises the path from starting page to the current page.
• Viewer: • displays the current page our approach is working on 09/24/2009
11/27
Exploration demo
Figure 6: Exploration demo of Tansuo 09/24/2009
12/27
Features of Tansuo • Define exploration scope. • Define keywords for exploration scope. • Example: • Navigation structures for ordinary user. • Navigation structures for administrators.
• Semi-automated/automated exploration • GUI interface interaction. • Predefined files.
• Extract option values • • • •
09/24/2009
Values of select menus Values of check boxes Values of radio buttons Default values of text fields
13/27
Outline • Introduction • Basic concepts, Challenges
• Our approach • Abstract URL, Pairwise strategy, Algorithm design, Tool
• Experiments • Design, Subject applications, Empirical results
• Related Work • Conclusion
09/11/2009 09/24/2009
14/27
Experiment design • Environment: • Hardware: • CPU: 1.66GHz, RAM: 2G, Hard disk: 80G. • Software: • Windows XP SP2, Resin 2.1.8 web server, Apache 2.0.48, MySQL Server 4.1.
• Subject applications: • www.gotocode.com • Use five jsp web applications because of using the Clover tool. • Get source code statistics of subject applications with Clover. • Clover processes only JSP web applications. 09/24/2009
15/27
Application statistics Subject Application
NLOC
Classes
Methods
Branches
Bookstore
18385
27
925
4392
BugTrack
8094
13
438
1946
Classifieds
11599
18
618
2730
Links
8849
13
499
2074
Portal
Characteristics
17621 27 915 4084 Table 1: Source code statistics of subject applications
Subject Application
Forms
Actions
Params
APA
AVP
Bookstore
18
63
66
1.05
3.35
BugTrack
8
19
27
1.42
6.15
Classifieds
11
29
27
0.93
5.07
Links
11
24
26
1.08
5.77
19 39 95 2.44 Table 2: Form statistics of subject applications
3.40
Portal 09/24/2009
Characteristics
16/27
Results: navigation graph size Subject Application
Characteristics Nodes
Edges
Conn.
Bookstore
93
484
10.17
Bug Track
43
175
7.85
Classifieds
50
313
12.53
Links
52
259
9.72
Portal
80
652
17.77
Table 3: Size-statistics of generated navigation graphs
Notes: Conn.(Connectivity): the average incoming and outgoing edges per node. 09/24/2009
17/27
Results: performance & cost Subject Application
Total Time (hours)
State Restoration Time Memory Usage (hours) (M Bytes)
Bookstore
33.4415
27.7654
42.6328
BugTrack
0.1321
0.0641
19.5625
Classifieds
0.2999
0.2123
39.0078
Links
0.1275
0.0581
19.4570
Portal
1.2218
0.9519
80.3554
Table 4: Time and memory usage
Notes: Bookstore that contains a large number of images, which increased exploration time dramatically. For example, a search result page for Bookstore contained 20 images, whereas a search result page for Portal contained no images. 09/24/2009
18/27
Results: completeness Subject Application
Manual
Tansuo
Nodes
Edges
Nodes
%
Edges
%
Bookstore
97
596
93
95.9
484
81.2
Portal
91
836
80
87.9
652
78.0
Table 5: Completeness result statistics
Notes: Some nodes and edges are missed because of missing some complicated scenarios. For example, the page-flipping is missed because our approach, for efficiency, just place one order in the ShoppingCartRecord page. 09/24/2009
19/27
Results-comparison Subject Application
WebSphinx
LCP
Tansuo
Nodes
Edges
Nodes
Edges
Nodes
Edges
Bookstore
11
11
11
11
93
484
BugTrack
7
7
7
7
43
175
Classifieds
15
16
9
9
50
313
Links
11
12
11
11
52
259
Portal
17
22
17
22
80
652
Table 6: Comparison results
Nodes: LPC: Link Checker Pro. VeriWeb is not public accessible.
09/24/2009
20/27
Outline • Introduction • Basic concepts, Challenges
• Our approach • Abstract URL, Pairwise strategy, Algorithm design, Tool
• Experiments • Design, Subject applications, Empirical results
• Related Work • Conclusion
09/24/2009
21/27
VeriWeb [WWW 02] • Page explosion problem • Solution: sets length limits on navigation paths. • Results: • Can not address the page explosion problem indeed. • May cause losing navigation structures.
• Navigation structure capture problem • Does not consider input combinations for forms. • May miss navigation structures behind forms.
09/24/2009
22/27
WebSphinx [WWW 98] • Page explosion problem • Does not consider the page explosion. • Uses concrete URLs as nodes directly.
• Navigation structure capture problem • Can not handle forms. • Misses navigation structures behind forms.
09/24/2009
23/27
Google’s deep-web crawl [VLDB 08] • Page explosion problem • Solution: uses content discovery strategy to pick pages with most information. • Example: “Login” page will be discarded because it contains little information.
• Results: loses navigation structures
• Navigation structure capture problem • Solution: uses bottom-up fashion to generate input combinations for forms. • In fact, this solution works like exhaustive testing, which may produce a huge number of test cases.
• Results: causes low efficiency. 09/24/2009
24/27
Conclusion • Our approach is effective for generating practical navigation graphs. • Abstracting URLs controls navigation graph size effectively. • Pairwise input combinations of forms help capture most navigation structures.
• Future work: • Constraint support. • Improve the efficiency of state restoration. • Improve user interface.
09/24/2009
25/27
References • [WWW 02] M. Benedikt, J. Freire, and P. Godefroid, “VeriWeb: Automatically Testing Dynamic Web Sites”, Proc. of 1th Int’l Conf. on WWW, 2002. • [WWW 98] R.C. Miller, and K. Bharat, “SPHINX: A Framework for Creating Personal, Site-specific Web Crawlers”, Proc. of 7th Int’l Conf. on WWW, pp. 119130, 1998. • [VLDB 08] J. Madhavan, D. Ko, Ł. Kot, V. Ganapathy, A. Rasmussen, and A. Halevy, “Google’s Deep-Web Crawl”, Proc. of the VLDB Endowment, 1 (2): 12411252, 2008.
09/24/2009
26/27
Thanks ! 09/24/2009
27/27