Journal of Internet Business

Issue 6 - 2009

Visualization of Online Marketplaces using Hidden Web Services

Pushpa Kumar University of Texas at Dallas, USA

About the Author Ms. Pushpa Kumar is currently a PhD student at the Department of Computer Engineering, University of Texas at Dallas, Richardson, Texas, USA and is a GAANN Fellow. She holds a B.S in Electronics Communication Engineering from Delhi College of Engineering, Delhi, India, and a M.S in Electrical Engineering from University of Notre Dame, South Bend, Indiana, USA. Her research interests include information visualization, web services, graph drawing and software engineering. She has more than ten years of industry experience in e-commerce, and is a member of the IEEE computer society. Acknowledgment The author would like to thank the US Department of Education GAANN Fellowship Federal grant for funding this project

Corresponding Author Mailing Address, Phone and E-mail Pushpa Kumar University of Texas at Dallas 2601 N. Floyd Road Richardson, TX 75080, USA [email protected] Tel: +1-469-878-8026 Fax: +1-972-883 2349

28

Journal of Internet Business

Issue 6 - 2009

Visualization of Online Marketplaces using Hidden Web Services Pushpa Kumar University of Texas at Dallas, USA

Abstract Web services are web-based software applications that exchange data with other web-based applications. Current Web services are visible to web applications through their APIs, while hidden Web services are designed to serve end users rather than Web applications. Search engine spiders as well as end users can access only the surface Web through static URLs or links, but not the hidden Web services. The use of Web services is becoming increasingly popular in business environments, since they provide convenient access to publicly accessible resources that contain large amounts of extremely valuable information. Online marketplaces capture online transactions between buyers and sellers. The many-to-many relationships can be modeled as social networks, which help connect friends, business partners, or other individuals. In this paper, we aim to visualize online marketplaces using hidden Web services. Categorization of hidden Web services using Web service registries is performed. We consider one of the most popular e-business models, eBay, as a case study. By extracting relevant data using hidden Web services, we generate hierarchical structure displays, and study the social interaction among sellers and buyers, across geographical boundaries.

Keywords - web services, visualization, social network analysis

29

Journal of Internet Business

Issue 6 - 2009

1. INTRODUCTION Web services have attracted much research effort and gained great progress. They provide a standard means of interoperating between different software applications, running on a variety of platforms and frameworks and are widely used. Web services are characterized by their great interoperability and extensibility, as well as their machine-processable descriptions thanks to the use of XML. They can be combined in a loosely coupled way in order to achieve complex operations. Unlike traditional client server models, such as a web server or web page system, Web services do not provide the user with a GUI. They instead share business logic, data and processes through a programmatic interface across a network. The simplest and most basic definition of a Web service is an application that provides a web API that supports application-application communication and interoperation with other Web services [30]. Web services represent a category of resources on the web and are designed to be accessed only by web applications or programs, rather than by human users [4] [27]. These technologies are fundamentally changing the software industry, making the role of enterprise IT organizations more strategic [7] [31]. In recent years, the web has been “deepened” by online databases that support web query forms [6]. Apart from the web query interfaces supported by databases, many services are presented through web pages, such as currency conversion and language translation. Computations in these services occur behind the scenes and hence are also invisible to search engines. Search engine spiders access the surface Web and cannot go inside the computation logic that is literally "invisible" to them. In this paper, we refer to those computation-based services as the executable Web. They, together with the Deep Web, make up the hidden Web services, which have no program-oriented interfaces or service descriptions like WSDL. Web services allow us to conveniently access publicly accessible resources that contain large amounts of extremely valuable information [11].

Web services allow easy interoperability between

disparate computer systems, and many businesses have added public Web services interfaces to their databases, allowing direct programmatic access to them. The availability of Web services has thus made it easier to retrieve the Web information reflecting users’ activities in online marketplaces. Online marketplaces are web-based trading exchanges for buyers and sellers to meet and conduct business. They house vital information for visualizing and analyzing the behaviors of online customers through social networks. Massive

30

Journal of Internet Business

Issue 6 - 2009

quantities of data on large social networks are available from blogs, knowledge-sharing sites, collaborative filtering systems, online gaming, social networking sites, newsgroups, chat rooms, e-businesses and so on [34]. Social networking also refers to a category of Internet applications to help connect friends, business partners, or other individuals using a variety of tools. These applications like “MySpace” and “LiveJournal” are becoming increasingly popular. A social network is a social structure made of nodes, which are generally individuals or organizations. It indicates the ways in which they are connected through various social familiarities ranging from casual acquaintance to close familial bonds. Recent research has been performed in varied fields like hybrid representations of online activities [29], research community mining [18], mining newsgroups [1], and semantic social network analysis [13]. We attempt to utilize concepts of social network analysis for studying networks of online marketplaces. We take eBay, one of the successful online businesses as a case study for our research. In this paper, we attempt to mine eBay customer data from its production environment through hidden Web services provided by eBay and construct social networks. The customer profiles retrieved in this manner for sellers and buyers include “userid”, geographic location and transaction history. Social network graph structure built on the eBay data varies with the rate at which selling/buying occurs. We use a novel method to visualize online marketplaces using hidden Web services. A combination of Deep Web and executable Web technologies is utilized for this purpose. Information about the buyers and sellers is usually obtained using deep Web search, since databases have to be accessed; whereas the representation and analysis of the social networks formed from the buyer and seller data is typically performed using executable Web techniques. Existing broker models focus on creating markets by bringing buyers and sellers together and facilitating transactions between them. Those can be business-to-business (B2B), businessto-consumer (B2C), or consumer-to-consumer (C2C) markets. Being one of the oldest forms of brokering, auctions have been widely used throughout the world to set prices for such items as agricultural commodities, financial instruments, and unique items like fine art and antiquities. The Web has popularized the auction model and broadened its applicability to a wide array of goods and services. Our aim is to gain insight into customer behaviors with certain patterns, so that these brokerage e-business models could be further enhanced.

31

Journal of Internet Business

Issue 6 - 2009

In summary, this paper makes contribution in two aspects: classification of Web services, and visual analysis of online marketplaces using Web services.

The rest of the paper is

organized as follows. Section 2 presents related work, while Section 3 discusses the current status of the services available on the Web. Section 4 attempts to categorize hidden web services. Section 5 provides a brief overview of web-based social network analysis. Section 6 presents the methodology used for our eBay case study and provides details of visualization and analysis of resultant data. Section 7 gives the conclusions and future work.

2. RELATED WORK Our effort in the mining, construction and visualization of online social networks is related to ongoing research in the fields of web visualization, and data mining. We highlight some of the papers in this section that were a motivation for our paper.

Online communities are

extremely popular; yet most of them still rely primarily on text for knowledge creation and communication. A graphical web-poll prototype which is a mix of information, knowledge, and social visualization has been designed and deployed in an online discussion board on herbal antidepressants [20].

The prototype named the “plot-poll”, allows users to collaboratively

construct a sequence of mini histograms that indicate experienced mood change during a ten week period. The drawback is that more work is required to support these findings. Salient features of social interaction are used to build a legible interactive visual representation of Usenet [3]. The approach for developing this type of visualization, discussing the theoretical framework, questions considered to access the socially salient features, and a series of design iterations are used for exploring how to develop a visual language that conveys social meaning. However this paper represents a work in progress, and is limited to Usenet. Semantic social network analysis is facilitated by tools such as “iQuest”, which is a software system that is based upon a "grammar" and allows us to understand and visualize patterns of communication [13]. It extends automatic visualization of social networks by mining communication archives such as e-mail and blogs through including analysis of the contents of those archives. A visualization system has been designed and implemented for playful end-user exploration and navigation of large-scale online social networks [16]. This design builds upon familiar node link network layouts to contribute customized techniques for exploring connectivity in large graph structures, supporting visual search and analysis, and automatically

32

Journal of Internet Business

Issue 6 - 2009

identifying and visualizing community structures. C-Group is a tool for analyzing dynamic group membership in temporal social networks over time [21].

Unlike most network

visualization tools, which show the group structure within an entire network or the group membership for a single actor, CGroup allows users to focus their analysis on a pair of individuals. It also provides users with a flexible interface for defining groups interactively, and supports two novel visual representations of the evolving group memberships. Two new clustering algorithms that can effectively cluster documents even in the presence of a very high dimensional feature space are described in [14]. These clustering techniques are based on generalizations of graph partitioning and do not require pre-specified ad hoc distance functions, and are capable of automatically discovering document similarities or associations. An agent for exploring and categorizing documents on the World Wide Web based on automatic categorization of a set of documents, combined with a process for generating new queries used to search for new related documents is presented in [15]. The new clustering algorithms based on graph partitioning provide a significant improvement in performance over traditional clustering algorithms used in information retrieval. Experimental work conducted to investigate user profiling within a framework for personal agents is described in [33]. Investigations were aimed at discovering whether user interests could be automatically classified through the use of several heuristics. To resolve some of lexical disagreement problems between queries and FAQs, a reliable FAQ retrieval system using query log clustering is described in [23]. The proposed system clusters the logs of users’ queries into predefined FAQ categories and due to cluster-based retrieval technique; the proposed system could partially bridge lexical chasms between queries and FAQs, and outperforms the traditional information retrieval systems in FAQ retrieval. A methodology of knowledge acquisition from technical texts is proposed in [2] by which the knowledge engineer, assisted by an automatic clustering tool, builds the "conceptual fields" of the domain. The results of the clustering module and interpretation are provided, and focus on the methodological aspects of KA from texts.

“TabletopTabletop Community” enables the visualization of social interactions around the table [12]. Through this artwork system, users and participants easily record the state and atmosphere of each interaction. The system visualizes the state of the entire community as an

33

Journal of Internet Business

Issue 6 - 2009

interactive network visualization. Two approaches to analyze the evolution of two different types of online communities on the level of subgroups are proposed; the first method consists of statistical analyses and visualizations that allow for an interactive analysis of subgroup evolutions in communities that exhibit a rather membership structure. The second method is designed for the detection of communities in an environment with highly fluctuating members [10]. CI-KNOW is a suite of Web based tools that implements a network recommendation system that incorporates social motivations for why we create, maintain, and dissolve our knowledge network ties [17]. The network data is captured by automated harvesting of digital resources using Web crawlers, text miners, tagging tools that automatically generate communityoriented metadata, and socientometric data such as co-authorship and citations. Based on this knowledge network, the CI-KNOW recommender system produces personalized search results through two steps; identify matching entities according to their metadata and network statistics and select the best fits according to requester's perspectives and connections in social networks. Visualization of detected community structures uncovered by different methods from human encounter traces is presented [38]. The focus is on extracting information related to levels of clustering, network transitivity, and strong community structure. The analysis of large, social small world graphs, such as those formed by human networks is performed and brings together ideas from a number of different research areas, including graph layout, graph clustering and partitioning, machine learning, and user interface design [28]. It helps users explore the networks and develop insights concerning their members and structure that may be difficult or impossible to discover via traditional means, including existing graph visualization. A model is proposed which considers also the customer's network value in addition to the intrinsic value of the customer, the expected profit from sales to other customers a user may influence to buy, the customers those may influence, and so on recursively. Instead of viewing a market as a set of independent entities, the market is viewed as a social network and modeled as a Markov random field [9]. A large reduction in computational cost is achieved after application to data from a knowledge-sharing site. The amount of marketing funds spent on each customer is optimized, rather than just making a binary decision on whether to market [32]. In this paper, our approach is different from the above related work since we attempt to mine and construct the social networks ourselves using hidden Web services, instead of utilizing already existing social

34

Journal of Internet Business

Issue 6 - 2009

networking sites or graphs. Web-based SNA is utilized for the visualization and analysis of the retrieved data. 3. CURRENT SERVICES ON THE WEB This section reviews the different types of services available currently on the Web. Figure 1 depicts the existing world of Web services. The part of the Web that is served dynamically "on the fly" is much greater than the static documents associated with Web pages. Web Applications

Current Search Engines

Hidden Web Services

Web Services

Interoperation

APIs

Surface Web

Executable Web

Deep Web

End Users

Figure 1. Existing World of Web Services [24]. Current Web services are visible to the Web applications through their APIs, and services communicate through application program interfaces (APIs). They cannot be accessed by end users. Hidden Web services on the other hand, by design serve end users rather than Web applications. Search engine spiders and end users can access only the surface Web through static URLs or links, but not the hidden Web services. Hidden Web services are comprised of the Deep Web that accesses backend databases, and the executable Web that make computations behind the scenes. Promising work has been done on making the Deep Web accessible not only to end users but also to Web applications [5].

3.1 Existing Web Services Web services represent a category of resources available on the Web that are designed to be accessed only by Web applications or programs, rather than by human users [4]. The simplest

35

Journal of Internet Business

Issue 6 - 2009

and basic definition of a Web service is an application that provides a Web API to support application-application communication and interoperation with other Web services. The affiliate Web site uses the Amazon Web service to search Amazon’s catalog and display the results on its own site, which includes features such as Amazon reviews and book ratings [27]. Figure 2 shows an example of a Web API offered by Amazon for its online catalog that allows its marketing affiliates to easily incorporate Amazon contents and features into their Web sites. QUERY Amazon’s Web API

Affiliate Web Site STRUCTURED INFOMATION

Figure 2. Web Services [24].

3.2 Surface Web and Deep Web A significant amount of valuable information on the Web is generated from the databases below the Web surface. It has been estimated that the content present in the Deep Web may be roughly 500 times larger than the surface Web [19]. The contents in the various databases accessible on the Web are distinct from static and surface Web pages, which are essentially documents possibly with multimedia files accessed directly by end users and search engines. As an example of the Deep Web, cars query interface Qdw: cars.com shown in Figure 3 retrieves queried results from the underlying database on cars.

Figure 3. Qdw: cars.com Interface [24].

3.3 Executable Web The computation logic that occurs behind the scenes of Web interface forms is hidden from the user and is also invisible to search engines. Figure 4 illustrates an example of the executable Web, the “bmi calculator” query interface Qew: asbs.com. The computation logic 36

Journal of Internet Business

Issue 6 - 2009

may not be driven by an underlying database. Compared to the Deep Web, the executable Web could generate an infinite number of content pages depending on the users’ inputs. The next section will show the significant presence of this type of services available on the Web.

Computation Logic

Figure 4. Qew: asbs.com Interface [24].

4. CATEGORIZING HIDDEN WEB SERVICES Fundamental to our work on hidden Web services is the determination of categories representing this classification of Web services. This section presents the methodology used to perform this experiment, and categories including the obtained statistics.

4.1 Methodology Query forms generally seem to follow a concerted structure and appear to be “modularly” constructed upon a small set of building blocks [39].

Such query structures enable us to

categorize different types of services. For each category, this observation steers us towards the prediction, classification and ultimately discovery of the services according to their query forms. Using information on categorization present in the registries, keywords were identified for use in the existing search engines to find related Web sites. We downloaded the Web page containing query interfaces and recorded the corresponding title, url, domain information in an XML file. We illustrate our methodology by the following example.

Microsoft UDDI

(http://uddi.microsoft.com) was utilized to locate the “finance” category, and the corresponding Web service entries were returned by the UDDI. The Web service, “Calculate IT - Web Service to Calculate the Income Tax” was identified for this category. “After identifying the key words like “Calculate the Income Tax”, we used these key words as input to Google” [24]. As we browsed through the returned results to locate pages that accept user inputs and produce results

37

Journal of Internet Business

Issue 6 - 2009

through computations. The Web page at “http://www.dinkytown.net/java/Tax1040.html” met this criterion. We recorded the service in an XML file, saved the related Web pages, and also recorded its title (Financial calculators) and domain (Finance).

We also determined some

categories by their frequent usage such as financial calculators.

4.2 Categories Using the above methodology, we identified several categories that perform computations via Web query forms which are summarized in Table 1. The recorded title and URL information are shown. There are a total of 10 categories in Table 1, with about two or three types of services for each category. Categories include various fields like mortgage calculators, health, food and finance. We also determined sampling and estimation rates through the search engine Google for these categories documented in Table 2. Sampling was performed on the first 100 results, for 10 pages each with 10 results. “The statistics (approximately in the range of 80 to 85%) provides a good validation for considering these categories in hidden Web services for our proposed work” [24]. The resultant data is presented solely based on the information the user provides in the query form and is not obtained by connection to any database. Table 1. Categories under the Executable Web CATEGORY Mortgage Calculator

TITLE interest.com banksite.com hsh.com

http://www.hsh.com/calc-amort.html

Unit Converter

digital dutch

http://www.digitaldutch.com/unitconverter/

institute of chemistry

http://www.chemie.fu-berlin.de/chemistry/general/units_en.html

measurement unit converter

http://www.people.virginia.edu/~rmf8a/convert.html

Currency Converter

x-rates.com xe.com

http://www.x-rates.com/calculator.html http://www.xe.com/ucc/

oanda.com

http://www.oanda.com/convert/classic

Finance

Financial Calculators Missouri Department of Revenue Internet Pizza Server Ordering Area halls.md

http://www.dinkytown.net/java/Tax1040.html

http://www.halls.md/body-mass-index/bmi.htm

preventdisease.com

http://preventdisease.com/healthtools/articles/bmi.html

msn

http://moneycentral.msn.com/investor/calcs/n_expect/main.asp

Food Bmi Calculator Life Expectancy

WEB SERVICE URL http://mortgages.interest.com/content/calculators/additionalpayment.asp http://www.banksite.com/calcs/mortgagecalc.html

http://www.dor.mo.gov/tax/calculators/incometax/ http://www.ecst.csuchico.edu/~pizza/pizzaWeb.html

Table 2. Sampling results for the Executable Web CATEGORY Mortgage Calculator Unit Converter

ESTIMATE% 82 85

TOTAL RESULTS 7,888,000 6,860,000

38

Journal of Internet Business

Issue 6 - 2009

Currency Converter Finance Bmi Calculator Life Expectancy Insurance

80 81 83 81 85

21,800,000 1,030,000 224,000 120,000 7,780,000

Table 3. Categories under the Deep Web CATEGORY Books Travel Weather Prescription Refills Cars Pets Stock Quotes

TITLE amazon barnes&noble.com travelocity expedia.com the weather channel cnn.com kmart randalls cars.com startribune.com yahoo!pets petfinder.com lycos pcquote.com

WEB SERVICE URL http://www.amazon.com/books/ http://www.barnesandnoble.com/index.asp?r=1&popup=0 http://www.travelocity.com http://www.expedia.com http://www.weather.com http://www.cnn.com/WEATHER/ https://pharmacy.kmartcorp.com/KMRx?step=Refill http://www.randalls.com/RxRefill.asp http://www.cars.com/go/index.jsp http://www.startribune.com/cars/ http://pets.yahoo.com/pets/ http://www.petfinder.org/ http://www.quote.com/qc/default.aspx http://www.pcquote.com/

Categorization results for the Deep Web are summarized in Table 3. There are a total of 7 categories obtained like books, travel, pets and license renewal. Tables 1 and 3 present a general idea of the differences between the Deep Web and executable Web. Consider “amazon.com” under books category as an example. When provided with a query for book search with a title “Operating Systems”, it returns a list of books with the complete title, author name, publisher name and price that are retrieved from the database. This example represents the Deep Web. Under the executable Web category, consider “halls.md” bmi calculator. “In the query form, when a user enters weight in pounds and height in feet/inches, it returns a BMI index number in kg/m2 units in the result field” [24]. This is a result of computations occurring behind the scenes. 5. WEB-BASED SOCIAL NETWORK ANALYSIS The Web services technology facilitates information retrieval from databases associated with web portals. Web-based Social network analysis (WSNA) implies application of social network analysis on the data extracted by using the Web services technology. WSNA is related to network theory and has emerged as a key technique in modern sociology, anthropology, geography, social psychology, information science and organizational studies, as well as a popular topic of speculation and study [36]. WSNA provides statistical tools for examining relational data rather than merely characterizing attributes of individual actors, and focuses on describing patterns of relationships among actors, and analyzing the structure of these patterns.

39

Journal of Internet Business

Issue 6 - 2009

An “actor” is an individual, organization, event or collective social entity that links to others in a network and is represented as a “node” [22]. Elements of a social network can be illustrated in a simple “sociogram” [8]. An “Adjacency matrix” is a square matrix, usually consisting of zeros and ones that indicates whether each pair of actors in the network is connected or not. The three most popular individual centrality measures defined below give us insight into the various roles and groupings in a network. i) Degree Centrality: “Degree for a node is highest when the node has the maximum possible number of direct connections to other nodes. Degree is thus the number of direct ties to other nodes, and measures activity of nodes in the network. Nodes are 'connectors' or 'hubs' in this network” [25]. ii) Closeness Centrality: “Closeness for a node is highest when a node can reach all other nodes in the network. Mathematically, closeness is the graph-theoretic distance of a node to all other nodes. Nodes with high centrality closeness are ones most likely to receive and transmit innovations” [25]. iii) Betweenness Centrality: “Betweenness for a node is highest when nodes connecting to other nodes maximally utilize that node. That is, betweenness measures how many paths pass through a node. A node high on betweenness has a high opportunity to play gatekeeper, liaison, or broker role. It is in excellent position and location to monitor the information flow in the network and has the best visibility into what is happening in the network” [25]. For online market businesses, higher values for the three degrees of centrality represent users who are better connected and hence can sell more products than others. A few reasons for this may be the following: 

Superior publicity techniques for their products – more cost may be involved



Selling popular categories of items



Better location – for example, advertising at the top of the pages for more visibility



Better visualization techniques – utilizing sophisticated cameras for higher resolution and improved quality of images for added visual effects Online market businesses may additionally choose to target advertisements and

promotions for these better located users. Through web-based social network analysis (WSNA) of online market places, we aim at answering the following questions:

40

Journal of Internet Business

Issue 6 - 2009

 What are the typical seller-buyer relationships? What is the buyer’s loyalty (buying the same kind of items from the same seller)?  What categories of goods are popular?  What are the correlations among the number of buyers, the number of for sale and the number of sold items?  What are the measurements of networks relating to centralities?

6. CASE STUDY – eBAY This section provides an overview of the tools and concepts utilized for our case study on eBay.

The eBay Web Services package provides a programmatic access to the eBay

marketplace and allows third-party applications to build custom applications, tools and services that leverage the eBay marketplace [35]. These tools are part of Deep Web technology that access databases in the backend and retrieve buyer, seller data in XML format. We utilize an extensible software framework called ‘Prefuse’ [37] that helps software developers to create interactive information visualization applications using Java programming language. Fundamental to our work on discovering and analyzing social networks on eBay, is the mining and extraction of relevant data from the eBay using Web services API. To accomplish this task, we utilized the eBay web services package and the API test tool provided by eBay [35]. The overall methodology is shown in Figure 5.

INPUT WEB SERVICE XML

EBAY API TEST TOOL

RESULTANT OUTPUT XML

COVERT XML TO GRAPHML

VISUALIZATION AND ANALYSIS

Figure 5. Overall Methodology [25]. eBay related Web service calls in the form of input XML are forwarded to the API test tool in a production environment, and the corresponding resultant output XML is obtained. The resultant information rich XML was converted into the Prefuse toolkit requirement of GraphML format, for visualization using hierarchical and force-directed layouts. We also performed webbased social network analysis (WSNA) on the retrieved data.

41

Journal of Internet Business

Issue 6 - 2009

6.1 Implementation This section discusses implementation of the methodology. The eBay Web Services package provides programmatic access to the eBay marketplace and enables third-party applications to build custom applications, tools, and services that leverage the eBay marketplace in new ways. An eBay-enabled application can present data in custom ways that best meet the users' needs. eBay Web Services support XML/HTTPs and SOAP for the API. The eBay API tool provides a convenient user interface for sending XML API requests through web service and is part of Deep Web technology. This tool accesses databases in the backend to retrieve relevant transaction data. XML for various web services can be submitted easily with the click of the button. A valid binary security token and user credentials are crucial to the correct usage of the eBay API test tool. Relevant authentication and authorization are performed and the resultant XML output is displayed in the browser pane. The XML API provided with this tool contains standard data elements for a call completion. With the aid of API documentation provided on the eBay developer web site [35], this XML can be customized for our purposes.

Table 4. Data Summary for GetSellerEvents ItemID StartTime EndTime BidCount CurrentPrice QuantitySold UserID UserAnonymized FeedbackScore ListingStatus Site Title Currency

190025808625 20060829T15:45:00.000Z 20060905T15:45:00.000Z 1 3 1 mrm0m false 123 Completed eBayMotors MOTOR'S AUTO REPAIR MANUAL 1955 hardcover USD

We customized the XML API for web service “GetSellerEvents”, by adding “UserID” child element corresponding to the seller in whom we are interested. We also modified the “ModTimeFrom” and “ModTimeTo” element data to the time period we are interested in. The input XML call retrieves a list of the items for which a seller event has occurred. A seller event is an event that is of interest to an item's seller, such as the change of the current bid price or the item's ending time. “GetSellerEvents” returns the data for the item into an ItemArray object. The root node “ItemArray” comprises of many “Item” child elements. Relevant data is stored in various child elements of the “Item” node. Figure 6a is an example call for a user with userid 42

Journal of Internet Business

Issue 6 - 2009

of “svg” and for a time period of one month from September 30, 2006 to October 31, 2006. Figure 6b displays a screenshot of resultant XML output. Table 4 summarizes the retrieved data values for one such Item node. VALIDAUTHENENTICATION TOKEN ReturnAll en_US 427 US 2006 -09-30T12:00:00.000Z 2006 -10-31T12:00:00.000Z svg2331

a)

20061208T15:31:33.994Z Success 489 e489_core_Bundled_3911286_R1< /Build> 20061101T12:00:00.000Z 190025808625 20060829T15:45:00.000Z 20060905T15:45:00.000Z 1 3.0 1 mrm0m ...

b)

Figure 6. Request (a) and Response (b) XML for GetSellerEvents. We observe that the data values for “userid” and “Site” can be utilized for the construction and analysis of social networks. The “userid” refers to the buyer of the sold item, while “Site” is the geographical location data. Prefuse toolkit requires conversion of XML data to its equivalent format consisting of branches and leaves for a tree structure, depicted in Figure 7a. For social network graphs, we convert the XML data to a GraphML format depicted in Figure 7b.

...

a)

svg2331 M

b)

Figure 7. GraphML for tree structure (a) and social network graphs (b).

43

Journal of Internet Business

Issue 6 - 2009

6.2 Data Visualization and Analysis

We visualize the resultant data that was converted to the required GraphML structures using the Prefuse visualization toolkit [37]. Our aim is to study buyer-seller interaction, and understand user habits and popular categories of sold items. This kind of analysis can help in enhancing the business strategy and practice, and assist market researchers to better predict and hence market their products over time. The overall trends can be seasonal or geographical with some categories of products selling better than others for a few months, or in certain parts of the world. A screenshot of the resultant hierarchical structure representing social interactions is partially depicted in Figure 8a, where the number in square brackets ([]) appended to the “userid” represents multiple purchases from the same seller.

Figure 8b depicts social interactions

screenshot deeper down the hierarchy. There are a total of 531 users with one root node “svg2331” and the remaining nodes are distributed among four hierarchy levels. The root node has 17 direct children nodes. There is cross-linking present between nodes in various levels of the hierarchy as shown in Figures 8a and 8b. This indicates that a seller in a certain level could also be the seller to a user in a different level which is characteristic of social networks. The hierarchy structure summary is indicated in Table 5.

a)

b)

Figure 8. Screenshot of the eBay social interaction hierarchy. Table 5. Summary of eBay hierarchy nodes Root node 1

Level 1 17

Level 2 23

Level 3 157

Level 4 333

44

Journal of Internet Business

Issue 6 - 2009

Node reduction was performed by removing redundancy across multiple hierarchy levels for

the same node. Our aim is to eliminate repetitive nodes and achieve a more efficient hierarchy structure. Figure 9a depicts the original hierarchy structure, while Figure 9b depicts the improvised version for repeating node ‘A’. There were a total of 52 sellers, 483 buyers and 36 redundant cross linked nodes across four levels of hierarchy resulting in an overall 7% node reduction.

B

A

C

Level 1

A Level 2

Figure 9a. Original hierarchy structure [25].

B

A Level 1

C Level 2

Figure 9b. Reduced hierarchy structure [25].

We study correlation between the total items and number of items sold using “Parvis” visualization tool. The parallel co-ordinates visualization was generated for 52 sellers with regards to the number of buyers, total items sold and number of sold items. As can be seen from obtained Figure 10, there is a direct positive correlation between the total items and number of items sold. The number of unsold items is calculated from the total number of items and the number of sold items. To study buyer loyalty, we extracted information for 15 users from the four hierarchies for a quarter of the year (Nov ‘06 to Feb’ 07). Our aim is to determine if the buyer is purchasing the same kind of items from the same seller, and the obtained results are tabulated in Table 6. We observe that except one user (“northernl”) most buyers (93%) bought items from the same category for the same seller over the period of time.

45

Journal of Internet Business

Issue 6 - 2009

Figure 10. Parallel Co-ordinates Visualization for Seller Activity [25]. Table 6. Study of Buyer Loyalty BUYER lakev domino pretty moess schmid northernl night zement esth inge fingert dai birg sask cha

SELLER svg isabella Isabella isabella isabella isabella holk ccs ccs ccs ccs ccs ccs ccs ccs

LEVEL 1 2 2 2 2 2 3 4 4 4 4 4 4 4 4

Nov '06 Collectibles Collectibles Collectibles Collectibles Collectibles Clothes Books Books Books Books Books Books Books Books Books

CATEGORY Dec '06 Collectibles Collectibles Collectibles Collectibles Collectibles Clothes Books Books Books Books Books Books Books Books Books

Jan '07 Collectibles Collectibles Collectibles Collectibles Collectibles Books Books Books Books Books Books Books Books Books Books

Table 7. Category information CATEGORY Books eBay Motors Toys & Hobbies Crafts Home & Garden Pottery & Glass Collectibles DVDs & Movies Clothing Shoes & acc Music Everything Else: Gifts & Occasions Computers & Networking Jewelry & Watches Cameras & Photo Health & Beauty Everything Else:Metaphysical Cell Phones & PDAs Sporting Goods

CODE BO ET TH CR HG PG CO DM CS MU EG CN JW CP HB EM CP SG

46

Journal of Internet Business

Issue 6 - 2009

Figure 11. Pie chart for determining categories. We assigned a two letter code for each of the 18 categories obtained for 531 users utilizing the “GetItem” webservice call to identify popular categories for sold items. “GetItem” retrieves details of an item for a particular “itemid”. We extracted node information from the obtained resultant xml. Table 7 summarizes the relevant data and the corresponding pie chart in shown in Figure 11. Category information was visualized as a force directed layout in Figure 12 with 18 different colors assigned to each category with cross linking present between nodes in different hierarchy levels. The hierarchy structure is depicted in previous Figure 8b. Due to the structural hierarchy, this layout models clustered directed acyclic graphs (DAGs) [26]. It is interesting to note that some sellers sold completely different items from what they bought, while for some sellers almost all buyers bought items from the same category. We conclude that the three most popular categories of items sold are “Books (46%)”, “Collectibles (18%)” and “Clothing, shoes & accessories (12%)” for the users under our current study. To obtain a better understanding of the origin country for eBay customers, a Pareto chart was plotted with the geographical locations of the users for the data under study. This chart is depicted in Figure 13. The chart indicates that the majority of the users (88.9%) belonged to the US, while 5.6% belonged to Australia and Germany.

47

Journal of Internet Business

Issue 6 - 2009

Figure 12. Force Directed Layout with Cross-Linking for eBay Categories.

Pareto Chart of LOCATION 20 100 80 60

10

Percent

Count

15

40 5 20 0 LOCATION Count Percent Cum %

US 16 88.9 88.9

Australia 1 5.6 94.4

Germany 1 5.6 100.0

0

Figure 13. Analysis of Geographical Location [25]. An “actor” is defined as a discrete individual, organization, event or collective social entity that links to others in a network in Section 5. For studying social interaction, we measure the degree of centrality for the top ten most active and visible actors in the network. These actors have the highest number of non- repetitive buyers, and hence the highest interactions in our study. We construct a sociogram for these actors as shown in Figure 14, where the arrows represent “direct” connections between the actors. A web query form as depicted in Figure 15 is

48

Journal of Internet Business

Issue 6 - 2009

used to calculate the various centralities for the sociogram, and is part of executable Web technology that makes computations behind the scenes. Using this tool, the three centrality measures: Degree Centrality, Betweenness Centrality and Closeness Centrality were calculated for the constructed sociogram.

crazyh gsxrch

svg

mopsx

isabella

moess vieledp domino

sahera

ccs

Figure 14. Constructed Sociogram for Actors [25].

Enter Actor

isabella

# of direct neighbors

7

# of in-between nodes

3

# of shortest paths

2

# of total nodes

15

Submit

Reset

Degree Centrality Betweenness Centrality

0.466 0.200

Closeness Centrality

0.133

Figure 15. Query Form for Centrality Calculation. The relevant statistics obtained using the query form, are plotted as a bar chart in Figure 16. We observe that the most active actor (e.g. ccs) may not necessarily be the socially bestconnected actor with regards to degrees, betweenness and closeness centrality measures (e.g. isabella). This analysis provides insight into users with the best location in the network under

49

Journal of Internet Business

Issue 6 - 2009

consideration. “Degree Centrality” (DEG in Figure 16) varies between 0 and 80%, “Betweenness Centrality” (BET) varies between 0 and 70% and “Closeness Centrality” (CLOSE) varies between 30.3% and 90.9% [25]. Higher values of centralities for eBay buyers and sellers can indicate better positions in the social network, resulting in more transactions for the customers.

1 0.909 0.9 0.8 0.8 0.7 0.7 0.588

Centrality

0.6

0.555 0.526

0.526

0.555

0.526

0.526

0.526

0.5

0.4 0.303 0.3 0.20.2

0.20.2

0.2 0.1

0.1

0.1

0.1

0.1

0.1

0.1 0

0 0

0

0

0

0

0

0

s cc

ch xr gs

lla be isa

g sv

ra he sa

dp ile ve

yh az cr

sx op m

s oe m

o in m do

User DEG

BET

CLOSE

Figure 16. Various Centrality Measurements of the eBay Users [25]. Managerial Implications

Today, managers are utilizing visual methods for many operational functions, including exploring data relationships. The various techniques presented in this paper can offer a manager simple-to-use tools to obtain insight into social networks. All of these tools are freely available as open-source technologies, so the involved cost factor is low. Through the use of simple visualization methods, a decision-maker can gain insight into social networks and typical activity like seller-buyer relationships, buyer's loyalty, popular categories of goods, correlation among buyers and sold items, and centrality measurements. These tools can be easily incorporated into the day-to-day responsibilities of a manager.

50

Journal of Internet Business

Issue 6 - 2009

7. CONCLUSIONS AND FUTURE WORK

The use of Web services is becoming increasingly popular in business environments, since they provide convenient access to publicly accessible resources that contain large amounts of extremely valuable information. Online marketplaces house vital information for visualizing and analyzing the behaviors of online customers through social networks. In this paper, we utilize hidden Web services and Web-based Social network analysis (WSNA) to aid in the construction, visualization and analysis of social networks for Online marketplaces. Hidden Web services comprising of the Deep Web and executable Web have no program-oriented interfaces or service descriptions. As our case study tool, eBay web services package aids in mining relevant data and deriving social networks. Information about the buyers and sellers is usually obtained using deep Web search through Web services technology; whereas the representation and analysis of the social networks is performed using hidden Web techniques and WSNA. Visualization of hierarchies indicates different interaction patterns. Transactions between buyers and sellers indicate cross linking between nodes in various hierarchy levels. Removal of redundant nodes by removing repetitiveness resulted in a more efficient hierarchy structure.

Force-directed

layouts modeling directed acyclic graphs (DAGs) are useful in visualizing important eBay item categories. Web query forms belonging to the executable Web are used to obtain various centrality statistics. Statistics obtained from the eBay data reveal interesting insights into the roles of various actors in the network and their measure of centrality. A buyer of a product can reside in a geographical location completely different from the seller of the product, yet can be closely connected through the social network. Also, the most visible actor needs not necessarily be the one that has the best location in the network. User behavior analysis can help us to further understand the potential trend. Existing brokerage e-business models can be enhanced by insight into customer relationships and behaviors. Our future work will involve a more in-depth analysis of WSNA tool for user behavior, and employing clustering mechanisms along with visualization to provide more intelligence [40]. We also aim to move towards the service-oriented Web with common program interfaces. Figure 17 illustrates our vision on the future Web service infrastructure, which we will simply call the service-oriented Web.

Current Web services can communicate through program

interfaces, and do not provide access to end users, while hidden Web services serve end users

51

Journal of Internet Business

Issue 6 - 2009

through Web user interfaces. Our aim is to provide common program interfaces to hidden Web services and the existing Web services. Automation of Web analysis, service profiling, and search can then be achieved. A service discovery engine will be able to find desired services

Current Search Engines Surface Web

APIs

Web Applications Interoperation

Service Discovery Engines

through the common program interfaces.

ServiceOriented Web End Users

Figure 17. Our Vision of Service-oriented Web.

8. ACKNOWLEDGMENT

We would like to thank the US Department of Education GAANN Fellowship Federal grant for funding this project. REFERENCES

1. Agrawal, R.; Rajagopalan, S.; Srikant, R.; and Xu Y. Mining newsgroups using networks arising from social behavior. Proceedings of the 12th international conference on World Wide Web, New York: ACM Press, 2003, 529-535.

2. Assadi, H. Knowledge Acquisition from Texts: Using an Automatic Clustering Method Based on Noun-Modifier Relationship. Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, New Jersey: Association for Computational Linguistics, 1997, 504-506. 3. Boyd, D.; Lee H.-Y.; Ramage, D; and Donath, J. Developing Legible Visualizations for Online Social Spaces. Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02), IEEE Computer Society, 2002, 115.

52

Journal of Internet Business

Issue 6 - 2009

4. Chang, K. C.-C.; He, B.; and Zhang, Z. Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. Proceedings 2nd Conference on Innovative Data Systems Research (CIDR 2005), IEEE Explore, 2005, 44-55.

5. Chang, K. C.-C.; He, B.; and Zhang, Z. Mining semantics for large scale integration on the Web: evidences, insights, and challenges. ACM SIGKDD Explorations Newsletter, 6, 2 (December 2004), 67-76. 6. Chang, K. C.-C.; He, B.; Li, C.; Patel, M.; and Zhang Z. Structured Databases on the Web: Observations and Implications. ACM SIGMOD Record, 33, 3 (September 2004), 61-70. 7. Chatterjee, S. and James, W. Developing Enterprise Web services – an architect’s Guide. New Jersey: Prentice Hall PTR, 2004. 8. Churchill, E. F. and Halverson, C. A.

Social Networks and Social Networking. IEEE

Internet Computing, 9, 5 (2005), 14-19.

9. Domingos, P. and Richardson, M. Mining the Network Value of Customers. Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, New York:

ACM Press, 2001, 57-66. 10. Falkowski, T.; Bartelheimer, J.; and Spiliopoulou, M. Mining and Visualizing the Evolution of Subgroups in Social Networks. IEEE/WIC/ACM International Conference on Web Intelligence, 2006, 52-58.

11. Ferreira, J.; Silva, A.R. da; and Delgado, J. Web services for information retrieval. Proceedings of the International Conference on Information Technology: Coding and Computing, Washington, DC: IEEE Computer Society, 2005, 497-502.

12. Fujimura, N.; Fujiyoshi, S.; Hope, T.; and Nishimura, T. Tabletop community: visualization of real world oriented social network. ACM Multimedia 2006, New York: ACM Press, 2006, 1035-1036. 13. Gloor, P. A. and Zhao, Y. Analyzing Actors and Their Discussion Topics by Semantic Social Network Analysis. Proceedings of the conference on Information Visualization, Washington, DC: IEEE Computer Society, 2006, 130-135. 14. Han, E-H.; Boley, D.; Gini, M.; Gross, R.; Hastings, K.; Karypis, G.; Kumar, Vipin; Mobasher, B.; and Moore, J. Partitioning-based clustering for Web document categorization. Decision Support Systems, 27, 3 (December 1999), 329-341.

53

Journal of Internet Business

Issue 6 - 2009

15. Han, E-H.; Boley, D.; Gini, M.; Gross, R.; Hastings, K.; Karypis, G.; Kumar, Vipin; Mobasher, B.; and Moore, J. WebACE: A Web Agent for Document Categorization and Exploration. Proceedings of the 2nd International Conference on Autonomous Agents, New York: ACM Press, 1998, 408-415. 16. Heer, J. and Boyd, D. Vizster: Visualizing Online Social Networks. IEEE Symposium on Information Visualization, IEEE Explore, 2005, 32-39.

17. Huang, Y.; Contractor, N.; and Yao, Y. CI-KNOW: recommendation based on social networks. Proceedings of the 2008 international conference on Digital government, Digital Government Society of North America, 2008, 375-376. 18. Ichise, R.; Takeda, H.; and Muraki, T.

Research Community Mining with Topic

Identification. Proceedings of Conference on Information Visualization, Washington, DC: IEEE Computer Society, 2006, 276-281. 19. Ipeirotis, P. G.; Gravano, L.; and Sahami, M. Probe, count, and classify: categorizing hidden Web databases. Proceedings of SIGMOD’01, New York: ACM, May 2001, 67-78. 20. Ivanov, A.; Erickson, T.; and Cyr, D. Plot-polling: Collaborative Knowledge Visualization for Online Discussions. Tenth International Conference on Information Visualisation, Washington, DC: IEEE Computer Society, 2006, 205-210. 21. Kang, H.; Getoor, L.; and Singh, L. Visual analysis of dynamic group membership in temporal social networks. ACM SIGKDD Explorations Newsletter archive, 9, 2 (2007), 13-21. 22. Kilduff, M. and Tasi, W. Social Networks and Organizations. London: SAGE, 2003. 23. Kima, H. and Seob, J. High-performance FAQ retrieval using an automatic clustering method of query logs, Information Processing & Management. Information Processing & Management, 42, 3 (May 2006), 650-661. 24. Kumar, P.; Song, G.L.; and Zhang, K. Towards A Unified View of Service-Oriented Web. Proc. 2006 IEEE International Conference on Service Operations and Logistics, and Informatics, IEEE Press, 2006, 862-867.

25. Kumar, P. and Zhang, K. Social Network Analysis of Online Marketplaces. Proceedings of IEEE International Conference on e-Business Engineering, IEEE Press, 2007, 363-367.

26. Kumar, P.; Zhang, K.; and Wang, Y. Visualization of Clustered Directed Acyclic Graphs without Node Overlapping. Proceedings of the 12th International Conference on Information Visualization, Washington DC: IEEE Computer Society, 2008, 38–43.

54

Journal of Internet Business

Issue 6 - 2009

27. Manes, A. T. Web Services a Manager’s Guide. Boston, MA: Addison-Wesley, June 2003. 28. McPherson, J.; Ma, K.-L.; and Ogawa, M. Discovering parametric clusters in social smallworld graphs. Proceedings of the 2005 ACM symposium on Applied computing, New York: ACM Press, 2005, 1231-1238. 29. Medynskiy, Y. Eugene; Ducheneaut, N.; and Farahat, A. Using Hybrid Networks for the Analysis of Online Software Development Communities. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM Press, New York, 2006, 513-516.

30. Oellermann, W. L. Jr. Architecting Web Services. Berkeley, CA: Apress, 2001. 31. O’Reilly

Webservices.xml.com.

A

Web

Services

Primer.

Accessible

at

http://Webservices.xml.com, April 04, 2001. 32. Richardson, M. and Domingos, P. Mining knowledge-sharing sites for viral marketing. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, New York: ACM Press, 2002, 61-70.

33. Soltysiak, S. J. and Crabtree, I. B. Automatic Learning of User Profiles — Towards the Personalisation of Agent Services. BT Technology, 16, 3 (July 1998), 110-117. 34. Staab, S.; Domingos, P.; Mika, P.; Golbeck, J.; Ding, L.; Finin, T.; Joshi, A.; Nowak, A.; and Vallacher, R. R. Social Networks Applied. IEEE Intelligent Systems, 20, 1 (Jan/Feb 2005), 8093. 35. EBay Developers Program, http://developer.ebay.com/support/docs/, Accessed on October 18, 2008. 36. Social Network Analysis: A Brief Introduction, http://www.orgnet.com/sna.html, Accessed on October 21, 2008. 37. The Prefuse Visualization Toolkit, http://www.prefuse.org/, Accessed on October 14, 2008. 38. Yoneki, E. Visualizing communities and centralities from encounter traces. Proceedings of the third ACM workshop on Challenged networks, New York: ACM Press, 2008, 129-132.

39. Zhang, Z.; He, B.; and Chang, K. C.-C. Research sessions: Web, XML and IR: Understanding Web query interfaces: best-effort parsing with hidden syntax. Proceedings SIGMOD’04, New York: ACM, June 2004, 107-118.

40. Zhong, N.; Liu, J.; and Yao, Y. In Search of the Wisdom Web. Computer, 35, 11 (November 2002), 27-31.

55

Visualization of Online Marketplaces using Hidden Web ...

She has more than ten years of industry experience in e-commerce, and is ... The use of Web services is becoming increasingly popular in ... collaborative filtering systems, online gaming, social networking sites, newsgroups, chat rooms,.

525KB Sizes 3 Downloads 123 Views

Recommend Documents

Information Visualization, Web 2.0, and the Teaching of Writing.pdf ...
Information Visualization, Web 2.0, and the Teaching of Writing.pdf. Information Visualization, Web 2.0, and the Teaching of Writing.pdf. Open. Extract. Open with.

Data Visualization Using R & ggplot2 - GitHub Pages
Feb 22, 2015 - 3. 1.4 .2 setosa. # Note the use of the . function to allow Species to be used ..... Themes are a great way to define custom plots. ... Then just call your function to generate a plot. ... ggsave(file = "/path/to/figure/filename.pdf") 

Visualization Of Driving Behavior Using Deep Sparse ...
○Driving behavioral data is high-dimensional time-series data ... Driving cube and driving color map using the DSAE results in good visualization for time series ...

Crawling the Hidden Web
We describe the architecture of HiWE and present a number of novel tech- ..... In Section 5.5, we describe various data sources (including humans) from .... Identify the pieces of text, if any, that are visually adjacent to the form element, in the .

Detection of Hidden Fraudulent URLs within Trusted Sites Using ...
ulent modification of a web site, often with the addition of new pages at URLs ... such hidden URLs are increasingly hosted within trusted sites, thereby rendering ...

2D/3D Web Visualization on Mobile Devices
Web visualization on both high-end and low-end mobile devices as the. MWeb3D ... lithically as a single piece of software running on a single computer.

A tutorial on clonal ordering and visualization using ClonEvol - GitHub
Aug 18, 2017 - It uses the clustering of heterozygous variants identified using other tools as input to infer consensus clonal evolution trees and estimate the cancer cell ... This results in a bootstrap estimate of the sampling distribution of the C

Multiform Glyph Based Web Search Result Visualization
visualization of mixed data sets based on transformed data sets. ... Introduction. Existed in many application areas, the data sets that ... A Star Coordinates-based visualization for ... However, these often create artificial patterns, thus equal.

Readings in Information Visualization: Using Vision to ...
enhanced objectsDiscussions of the applications for and implications of ... burgeoning field; professionals involved in financial data analysis, statistics, and ...

Clustering and Visualization of Online Chat
... chat transcripts of 10 education-oriented and/or theme-based chat sessions. ..... tool that allows a user to add annotations/comments to a particular cluster.

Enhancing Web Navigation Usability Using Web Usage ...
decorated websites, but very little about marketing a website or creating a website .... R. Padmaja Valli, T. Santhanam published an article [8] on “An overview.

A Pragmatic Application of the Semantic Web Using ...
The Semantic Web is a new layer of the Internet that enables semantic representation of the ... Business Process Management Project: Distributed process modeling of the ..... consistent between several small related models. This will make.