USO0RE41440E

(19) United States (12) Reissued Patent

(10) Patent Number: US RE41,440 E (45) Date of Reissued Patent: Jul. 13, 2010

Briscoe et a]. (54)

GATHERING ENRICHED WEB SERVER

5,848,396 A

* 12/1998

ACTIVITY DATA OF CACHED WEB CONTENT

5,892,917 A

*

4/1999

5,913,041 A

*

6/1999 Ramanathan et al. .

709/233

8/1999 Logue et a1.

709/219

5,935,207 A *

(75) Inventors: Paul Roger Briscoe, Southampton (GB); Cameron Donald Ferstat, St. Leonards

(AU); Matthew Robert Ganis, Carmel, NY (US); Stephen Carl Hammer, Marietta, GA (US); Gary Bob Kip Hansen, Saugerties, NY (US); Sean Alan Harp, Atlanta, GA (US); Michael Shannon Nichols, Georgetown, TX (US); Herbert Daniel Pearthree, Cary, NC (US); Paul Reed, Brookline, MA (US); Brian James Snitzer, Lancaster, PA (US) (73) Assignee: International Business Machines

Corporation, Armonk, NY (US)

Gerace ...................... .. 705/10 Myerson ............ .. 709/224

5,991,735 A

* 11/1999

Gerace ........ ..

6,018,619 A

*

Allard et a1. .... ..

1/2000

6,018,763 A *

1/2000 Hughes et al.

709/213

6,023,726 A

*

2/2000

709/219

6,041,355 A

*

3/2000

Toga .............. ..

6,085,229 A

*

7/2000

Newman et al.

..

..... .. 709/203

6,094,662 A

*

7/2000

Hawes ............ ..

707/104.1

6,363,418 B1 *

3/2002

Conboy et a1. ............ .. 709/218

Saksena ....... ..

OTHER PUBLICATIONS

Computer Knowledge NewsletteriNov. 1999 Issue.* PCT/EP01/09308, PCT Preliminary Examination Report, Jul. 3, 2003, European Patent Of?ce.* Primary ExamineriLashonda T Jacobs

(22) Filed:

(57)

ABSTRACT

A method and system for gathering enriched web server activity data in a global communications network in which requested information ?les are cached at a plurality of net work devices. With the prevalence of web caching on the Internet, the origin web servers do not serve the majority of

Related US. Patent Documents

Reissue of:

(64) Patent No.: Issued:

709/227

(Continued)

(21) App1.No.: 12/437,5s1 May 8, 2009

705/10

709/224

7,216,149 May 8, 2007

Appl. No.:

09/641,495

Filed:

Aug. 18, 2000

requests for web site content. A single pixel clear Graphics Image Format (GIF) request is added to the HyperText

(2006.01)

Markup Language (HTML) source ?le for a web page. Appended to the GIF request is a Common Gateway Inter

(52)

US. Cl. ...................... .. 709/217; 709/203; 709/219;

face (CGI) string of data that contains enhanced web activity data information, including the number of images (“hits”)

(58)

Field of Classi?cation Search ................ .. 709/203,

that have to be retrieved by a client browser to build the web page, and the referring identi?er that resulted in access to the

709/217, 218, 219, 223, 224, 225, 226, 227, 709/228, 229; 711/138

web page. The single pixel clear GIF request is not cache able and results in the request being transmitted to the origin

(51)

Int. Cl. G06F 15/16

711/138

See application ?le for complete search history. (56)

References Cited

server to accumulate an accurate number of hits on the web

U.S. PATENT DOCUMENTS 5,796,952 A

*

8/1998

web server when the client browser interprets the HTML ?le. The enriched data is stored in log ?les at the origin web

Davis et a1. ............... .. 709/224

cams ENGINE

page.

59 Claims, 15 Drawing Sheets

US RE41,440 E Page 2

US. PATENT DOCUMENTS 6,385,642 6,393,479 6,606,581 6,742,040

B1 B1 B1 B1

7,003,565 B2

5/2002

Chlan et a1. ............... .. 709/203

5/2002

Glommen et a1. ......... .. 709/224

8/2003 Nickerson et a1. . 5/2004

2002/0004733 A1 2002/0147772 A1 2008/0052392 A1

702/186

Toga ........................ .. 709/229

* cited by examiner

2/2006 Hind et a1. ................ .. 709/224 1/2002 Addante ...................... .. 705/7 10/2002 Glommen 2/2008 Webster et a1. ............ .. 709/224

US. Patent

Jul. 13, 2010

Sheet 1 or 15

US RE41,440 E

US. Patent

Jul. 13, 2010

CLIENT

Sheet 3 or 15

US RE41,440 E

302

BROWSER REQUESTS HTML WEB PAGE

/ 310 304 PAGE

Y

CACHED AT

DELIVER I-ITML FILE TO BROWSER TO

INTERPRET AND w

WENT?

auILD PAGE WITH souRcE OR CACHED IMAGES

/

316 326

RETRIEVE

cAcI-IED IMAGES AND coMPLErE BUILD OF PAGE

PROCESSING

312 320

306

DELIVER HTML

313

FILE TO CLIENT

PAGE

Y

CACHED AT ISP'?

/ TRANSMIT

l?g'ggg? :30

HTML FILE

REQUEST FOR

BU'LD PAGE

CONTAIN C.GIF CALL

uc.GII= TO ORIGIN WEB SERVER

wITI-I souRcE OR cAcI-IED IMAGES

314



aoa

TRANSMIT REQUEST TO

oRIGIN WEB SERVER

DEL'VER HTML

ENRCISQJEETNEB

FILE TO CLIENT BROWSER TO

SERVER ACTIVITY DATA &

INTERPRET AND __

BUILD PAGE WITH souRcE 0R cAcI-IED IMAGES

STORE ‘N LOG

F'LES

“1 END REQUEST PROCESSING

FIG. 3

322

324

US. Patent

Jul. 13, 2010

Sheet 4 or 15

US RE41,440 E

Site Level Analysis Week Ending

1,979 461 1

Week Ending 446,484,591 23 10 1 29

54.3

Week Ending

Week Ending

513 1,1 29 795, 20.760 4

30 1,781 2,364 1.357

86.

70.83

.

274 102.1

775 103.3

7.21

411.

11.1

7.

922.1

525

Download comma delimited format; Sitg Level Analygis

0 OverallU 0Custom MonthDate Granularity Range[ 6) 04/27/2000 Week Granularity MFMIZT/ZOOO 0 Day Granularity

F

l I Save Repor1

Run ReponOf?ine

FIG. 4

I

US. Patent

Jul. 13, 2010

US RE41,440 E

Sheet 5 0f 15

Referral Categories Referral Catego _

All "

All

Week Ending Week Ending Week Ending Week Ending Week Ending 05/27/2000 Traf?c Information 1 473 72 5 131

2,195,

6

95,341 Total Page

1,979

803,753

1 ,234,73

23,110,!

29,

2,364

Download comma delimited format: gage Vigws

m

Area of Ana_iysis Measurements O Tra?ic Type 0 Visitor Entry Page Tmmc VB" 0 Traffic Category 0 Visitor Exit Page E Page View‘ U visits 0 Visitor SubDomain 0 Visitor Browser D Seconds/Page view D Page VIEWS/W5"! 0 Visitor Domain

0 Visitor Piatfonn

@ Visitor Referral

0 Usage Group 0 Ad Analysis

D snows/Vii"

D Display Percentage of Total

[1 custom Date Range Louzmooo lv?'ommooo lvl 0 Overall 0 Month Granularity @ Week Granularity 0 Day Granularity @ Show data by categories 0 Show data without categories

E FIG. 5

US. Patent

Jul. 13, 2010

Sheet 6 or 15

US RE41,440 E

Referral Category: Search Engines and Directories - Display: Referral SubCategories

i

Re

‘i Search Engines and Direetorles

Citjcgory Week Ending Week Ending Week Ending Week Ending Week Ending All "'

Traffic Information 1 l

Engines Directories otal

Views

FIG. 6

US. Patent

Jul. 13, 2010

Sheet 7 or 15

US RE41,440 E

Referral Category: AltaVista - Display: Referrals ''_1

{

Referral Category

iSearch Engines and Directories AllaVista

Referral

mg

m

05/27/2000 06/03/2000 ;

I

Traf?c Information

E1 www.altavista.com/cgi-bin/query [:l vewwelttavista.fr/cgl-bm/qucry D

698E] 857

65,93 6|} 6,437‘?

15 I

883]!

76 L

424pl

82!

189“

J

105]

- 462]:

J

69 37

48 509]

40]

482

a 38‘

223} 151]

10]

9%

IE] www.altavistalcgi~binlquery

0]

2941

I[:laltavista.de/cgi-bin/query [jdimltavista.com/Top/Sports

][

l 1

218 100

i E] www.altavista.se/cgi-bin/query

]

22]

5s

.lUwwwvaltavista.ca/cgi-bin/query '|l:lwww.altavista.com/

I

a 3]

16] E 38] g

i U d1r.altav1sta.com/Top/Spo?s/Tenms/Toumaments

IL ]

T

[J altavista.fr/cgi-bin/query

dir.altavista.com/Top/SportsfI‘ennis/Toumaments/Grand_Slamsl [J www.altavista.nl/cgi-bin/query D www.altavista.dc/cgi-bin/query i'_ [J [:1 www.altavista.co.uk/cgi-bin/query ragingsearch.altavistaeom/cgi-bin/query

[J altzwistaadvalvas.be/scripts/avsearchdll

[:l wwwaltavista.it/cgi-bin/query i [j dir.altavista.com/Top/Sports/Tennis ‘_ [I] altavista.advalvas.be/scripts/AVsearch.dll

_

2674‘

1,279]?

U www.altavista‘fr/cgi-bin/ms [:l babel?slLaltavismcom/translatedyn

1

O} 45]

[E] www.altavista.com/cgi—bin/telia {El www‘almvistacanada.c0m/egi-bin/query

_] 7I

J 23“

0! 37'

1W

2] 33]

{Ualtavista.nl/cgi-bin/query lgwwwnltavisIa.c0m//cgi-bin/qucry LOther AltaVista Page Views [AltnVista Total Page Views

290]

FIG. 7

US. Patent

Jul. 13, 2010

Sheet 9 0f 15

US RE41,440 E

Content Category: Home Page - Display: Resources Content Categog Home Page

Week Ending

Resource Trat‘?c Information

1,415

1,187,831 657,21

Ellen

1,012,851

[j/en/indexhtml. III /fr/index.htm1= 1 7 U l?'l index. html= 1 6 , D/fr/index.html=15

1

1 [:l/fr/index.htrnl=l8

1

U/fr/index.htm1m=17 ome Page Total Page iews

227,2

,

Area of Analysis

1,845

2,428,731

J

238,92

Measurements

0 Traffic Type

0 Visitor Entry Page

Tram‘

@ Trat‘?c Category

0 Usage Group

E Page VIEWS



E] Seconds/Pace ‘new U Display Percentage of Total

1 0 Visitor Referral

Display Totals For:

.

.

.

1

_

: 0 Visitor SubDomatn I 0 Visitor Domain I

23,921

a

B Site

[:1 Cusmm Date Range Fora/2712000 H] 04/27/2000 [v]

ic> L 1

l

Overall O Month Granularity (9 Week Granularity 0 Day Granularity

Minimum Page View Threshold: 1 1 [Z1 Enable Single Object Selection

View Topzl 25 IFind 11:]

FIG. 9

l

I 1

1

US. Patent

Jul. 13, 2010

Sheet 10 0f 15

US RE41,440 E

Saved Reports

UserId: fropenOO $2 Level Reports

Hourly Reports Visi Dlstri

i n Re

rts

ljref?g Reports Qgptept Reports sgpDgmeln Reports Referral Repgrts Domgin Reports

E_nt[y Page Reports Exit Page Reports Brpwser Reports Pl_atfgrm Reports U§QQ.Q.__G..IQU [2. Reports Ad Reports

FIG. 10

Site Level Reports Date Ra Most Recent 5 Weeks

Granula

FIG. 11A

Shared es Standard

US. Patent

Jul. 13, 2010

Sheet 11 0115

US RE41,440 E

Visit Distribution Reports Dls'flflbuuon Minutes / Visit Views /

Date Range

ranu

Shared

Recent 5

eekly

g'andard

Recent 5

es

Standa

FIG. 11B Traffic Reports Usage Filte uste

Traffic Measu

V|s|t

shared

Measu

one Hits

eek‘y

Standard

FIG. 11C Content Reports Fi

Traffic Measu

Visit Measu

Date

ranula

shared es

(Standa

FIG. 11D

US. Patent

Jul. 13, 2010

Sheet 12 or 15

US RE41,440 E

SubDomain Reports Traf?c

Visit Measu

Shared es

Standard) as

Standard]

FIG. 11E Domain Reports Traffic

Visit Measu

FIG. 11F

Shared

US. Patent

Jul. 13, 2010

US RE41,440 E

Sheet 13 0f 15

Visit

Shared

0p 10

Op 10

es

5

Standarz

eeks

ES

Recent 5 eeks

Standarc

/27/2

OP

7

FIG. 11G Exit Page Reports If

usage

-

Traf?c

Visit

-

Date !

| '

Name Cluster Fliters Measures Measures V'ew Range Gran" anty

IQQ

Most

1Q

Exit

Shared

All

None

None

. .

Top

Recent

Visits

10

5

E'??é

Yes

Weekly

(standard)

Weeks

FIG. 11H Entry Page Reports Usage

.

Traf?c

Visit

.

Date

.

Name cluster Fllters Measures Measures View Range Granulanty

1011.)

Most

m

Ent _ All

Shared

None

None

. .

Top

Recent

V|s|ts

10

5

.P_age§

Weeks

FIG. 111

Yes

Weekly

(standard)

US. Patent

Jul. 13, 2010

US RE41,440 E

Sheet 14 0f 15

Browser Reports Visit Measu

ula

Shared es

Standard

es

Standard

FIG. 11]

Platform Reports Usage

.

Traffic

Visit

.

Date

.

Name cluster Filters Measures Measures View Range Granularlty

IQp_1Q Platforms b Pa g All

Shared

Mos?“ None

Pages

None

Top 10

Recent 5 Weekly

Yes (standard)

\Liew_ed

Weeks

T92 10

Top

Most Recent Weekly

Yes

1O

5

(Standard)

Platforms All . . bv VlSltS

None

None

Visits

Weeks

FIG. 11K

US. Patent

Jul. 13, 2010

Sheet 15 or 15

US RE41,440 E

Usage Group Reports Usage

.

Traf?c

Visit

.

Name cluster Filters Measures Measures

mg

Qgstg All

Visits “—

None

None

Visits

View

Date

.

Range Granularity M st Rgcent

Categorical 5

Weeks

Weekly

Shared

Yes 5

FIG. 11L

Ad Reports Traf?c

Visit

Measures M

FIG. 11M



( tandard,

Shared

US RE41,44O E 1

2

GATHERING ENRICHED WEB SERVER ACTIVITY DATA OF CACHED WEB CONTENT

to deal with the ?le that it is about to open. One of the most common schemes to access web pages is HypterText Trans

fer Protocol (HTTP). The second part of the URL is the name of a server where the ?le is located followed by the path that leads to the ?le and the ?le name. Sometimes, a URL ends in a trailing forward slash with no ?le name given. In this case, the URL refers to the default ?le in the last

Matter enclosed in heavy brackets [ ] appears in the original patent but forms no part of this reissue speci?ca tion; matter printed in italics indicates the additions made by reissue. This application is a reissue application 0fU.S. Pat. No. 7,216,149, issued May 8, 2007 on US. Ser. No. 09/641,495

directory in the path (i.e., index.html), which generally cor responds to the home page. For example, consider the web address “census.rolandgarros.org/rc/images/ . . .”. The

domain name is “censusrolandgarros.org”. This is the spe ci?c host computer on which corresponding web pages

?ledAug. 18, 2000.

The present invention relates generally to client-server computer systems and, more speci?cally, to information

reside. The next segment of the URL is the directory (“rc” and subdirectory “images”) on the host computer that con tains a speci?c web site. The last segment of the URL, repre sented by the ellipsis mark, is the ?lename of the speci?c

access requests to a web site server over a global communi

web page being requested.

cations network. All web pages are written with HyperText Markup Lan guage (HTML). Hypertext and universality are two essential

shows the entire path to the ?le, including the scheme, server name, the complete path, and the ?le name itself. A relative

BACKGROUND OF THE INVENTION

URLs can be either absolute or relative. An absolute URL 20

features of HTML. Hypertext means that a programmer can create a link on a web page that leads the visitor to any other

web page or to practically anything else on the Internet. Hypertext enables information on the web to be accessed from many different directions. Universality means that

25

because HTML documents are saved as ASCII or text only

?les, virtually any computer can read a web page. HTML

lets the web designer format text, add graphics, sound, and video, and save it all in a text or an American Standard Code

for Information Interchange (ASCII) ?le that any computer

30

can read. The key to HTML is in the tags, which are key

words enclosed between less than (<) and greater than (>) signs, that indicate the type of content coming up next. While practically any computer can display web pages, how those pages actually look depends on the type of computer, the monitor, the speed of the Internet connection, and the

URL describes the location of the desired ?le with reference to the location of the ?le that contains the URL itself. The relative URL for a ?le that is in the same directory as the current ?le is simply the ?le name and extension. To view a single page, the browser running on a client computer, may request and download numerous ?les from a web site server. The number of object access requests

(“hits”) stored in the web site server’s access log will typi cally exceed the number of distinct client sessions in which clients are accessing information on the web site, reducing the accuracy of the access log. Data networking is growing at a phenomenal rate. The number of web users is expected to increase by a factor of ?ve over the next few years. The resulting uncontrolled

35

growth of web access requirements is straining all attempts to meet the bandwidth demand. Additionally, although the

browser software used to view the page.

volume of web trai?c on the Internet is staggering, a large

Advanced web designers often use a scripting language called JavaScript and a system of naming parts of the web

percentage of that trai?c is redundant, i.e., multiple users at

page called the document object model (DOM), together

40

with HTML to create dynamic content on a page. These

any given site request much of the same content. This means that a signi?cant percentage of the wide area network (WAN) infrastructure carries the identical content and iden

effects are sometimes called dynamic HTML, or DHTML. HTML tags are commands written between angle brackets

tical requests for accessing it daily. Web caching performs a

(< >) that indicate how the browser should display the text. Examples of HTML tags are BASE, FORM, FRAME, IMG and SCRIPT. There are opening and closing versions for

requests more quickly, without sending the requests and the

local storage of web content to serve these redundant user 45

Caching is the technique of keeping frequently accessed

many tags and the affected text is contained within the two tags. The opening and closing tags use the same command

word; the closing tag carries an initial forward slash (/) sym bol. Many tags have special attributes that offer a variety of options for the contained text. The attribute is entered between the command word and the ?nal angle bracket. A series of attributes can be used in a single tag just by writing

50

information in a location close to the requester. A web cache stores web pages and content on a storage device that is physically or logically closer to the user. This access to stored web content is closer and faster than a web lookup. By reducing the amount of trai?c on wide area network links

and on already overburdened web servers, caching provides signi?cant bene?ts to Internet Service Providers (ISPs), enterprise networks, and end users. The two key bene?ts of

one after the other, in any order, with a space separating each one. The attributes in turn, often have values. In some cases, a selection of value is made from a small group of choices. Other attributes are more strict about the type of values they

resulting content over the wide area network.

55

web caching are cost savings due to the reduction of WAN

bandwidth and improved productivity for end users resulting from quicker access. ISPs can place cache engines at strate

accept. Examples of attributes are HREF, SRC, ACCESS

gic points on their networks to improve response times and

KEY and VALUE. A web page is nothing more than a text document written with HTML tags. Like any other text document, web pages have a ?le name that identi?es the documents to the web site designer, the web site visitors, and a visitor’s web browser.

lower the bandwidth demand on their backbones. ISPs can 60

overburdened web server. In enterprise networks, the dra matic reduction in bandwidth usage due to web caching

Uniform Resource Locators (URLs) contain information about where a ?le is located and what a browser should do with it. Each ?le on the Internet has a unique URL. The ?rst

part of the URL is called the scheme. It tells the browser how

station cache engines at strategic WAN access points to serve web requests from local storage, rather than from a distant or

65

allows a lower bandwidth WAN link to service the user base. Alternatively, the organization can add users or add more

services that make use of the free bandwidth on the existing WAN link. For the end user, the response of the local web

US RE41,44O E 4

3

FIG. 1 illustrates an implementation of Web cache engines

cache is almost three times faster than the download time for

over a global communications netWork.

the same content over the Wide area network. Therefore,

users see dramatic improvements in response times, and the

FIG. 2 illustrates an exemplary implementation of the

implementation of Web caching is completely transparent to

uncacheable single pixel GIF With CGI query string param

them. Web caching offers other bene?ts including access

eters added to enrich information recorded in Web logs.

FIG. 3 illustrates the processing logic for handling client requests for Web pages utiliZing the single pixel transparent

control, monitoring and operational logging. The cache engine provides netWork administrators With a simple,

GIF in accordance With a preferred embodiment of the

secure method to enforce a siteWide access policy through

present invention.

Uniform Resource Locator (URL) ?ltering. Network admin

FIG. 4 illustrates a site level analysis display that can be

istrators can learn Which URLs receive hits, the number of

generated based on the implementation of the single pixel transparent GIF of the present invention.

hits per second the cache is serving, the percentage of URLs that are served from the cache, along With other related

FIG. 5 illustrates an exemplary display of referral catego ries that can be generated based on the implementation of the

operational statistics. Web caching starts by an end user accessing a Web page over the Internet. While the page is being transmitted to the end user, the caching system saves the page and all of its associated graphics on local storage. The page content is

single pixel transparent GIF of the present invention. FIG. 6 illustrates an exemplary display of referral cat egory for search engines and directories that can be gener

noW cached. Another user, or the original user can then

access the Web page at a later time, but instead of sending the request over the Internet to the Web server, the Web cache

20

system delivers the Web page from local storage. This pro cess speeds doWnload times for the user, and reduces the bandWidth demand on the WAN link. Updating of the cache data can occur in a number of Ways depending upon the

25

ated based on the implementation of the single pixel trans parent GIF of the present invention. FIG. 7 illustrates an exemplary display of the referral results for a speci?c search engine that can be generated based on the implementation of the single pixel transparent GIF of the present invention. FIG. 8 illustrates exemplary content categories for various

design of the Web cache system. Web caching can be a major problem for publishers of

tion of the single pixel transparent GIF of the present inven

Web content. For example, a publisher can gather an inaccu

tion.

Web pages that can be generated based on the implementa

rate number of hits if some of the visitors access Web content

already in a caching server. Furthermore, if a caching server doesn’t update content promptly, it can return expired or

30

tion of the single pixel transparent GIF of the present inven

stale content to users.

SUMMARY OF THE INVENTION

Cache engines are becoming pervasive on the World Wide

35

Web. As a result, the origin Web servers do not serve or see

40

been used to ensure that some record is recorded by the origin server for advertisements for some years. HoWever,

tion pertaining to each HTTP request, including date and 45

50

during each period.

55

Internet broWser applications alloW an individual user to cache Web pages on his local hard disk. A user can con?gure the amount of disk space devoted to caching. The ?rst time a user vieWs a Website, that content is saved as ?les in a subdi rectory on that computer’s hard disk. The next time the user

remaining untouched. The single pixel clear GIF has been

points to this Website, the broWser gets the content from the cache Without accessing the netWork. Certain elements of the page, including buttons, icons and images, appear much more quickly then they did the ?rst time the page Was

in the Web logs for the uncacheable single pixel clear GIF by appending additional information to it as Common GateWay

Interface (CGI) query string parameters. This enables the log record created by the request for the single pixel clear GIF to function as a “surrogate” for the complete set of log records Which Would have been created if the page content had not been cached.

60

the accompanying draWings, Wherein:

opened. To limit bandWidth demand caused by the uncontrolled

groWth of Internet use, softWare developers have developed applications that extend local caching to the netWork level.

DESCRIPTION OF THE DRAWINGS

The invention is better understood by reading the folloW ing detailed description of the invention in conjunction With

time, the originating Internet Protocol (IP) address, the object requested, and the completion status of the request. The logs are analyZed on a periodic basis to determine the traf?c through the server in terms of hits, the number of pages served, and the level of demand for pages of interest

The use of a transparent GIF is a Way to discretely control

used before, but the data has not been enriched such that it can be used as a surrogate for the complete set of log records. The present invention enriches the information recorded

single pixel transparent GIF of the present invention. DETAILED DESCRIPTION OF THE INVENTION

Format) is the most ?exible tool in a Web designer’s toolbox. the layout of text and graphics on the Web page. No matter Where the transparent GIF is placed on the page, it Will remain unseen With all background graphics and ?lls

tation of the single pixel transparent GIF of the present

Web server softWare typically collects and saves informa

this solution only logs information about the request for the single pixel GIF ?le itself.

The single-pixel transparent GIF (Graphic Interchange

tion. FIG. 10 illustrates an exemplary display of the available saved reports that can be generated based on the implemen invention. FIGS. llAillM illustrate various available saved reports that can be generated based on the implementation of the

the majority of the user requests for Web site content. Packet sniffers Will not see the requests either, as they are satis?ed

by cache engines elseWhere on the Internet. The technique of using a single pixel clear GIF (Which is not cacheable) has

FIG. 9 illustrates an exemplary content category for a home page that can be generated based on the implementa

The tWo current types of netWork level caching products are 65

proxy servers and netWork caches. Proxy servers are soft Ware applications that run on general-purpose hardWare and

operating systems. A proxy server is placed on hardWare that

US RE41,44O E 5

6

is physically between a web browser client application and a

tion to router 18, routers 26, 46 are also shown connected to ISP server 30. Routers 18, 26, 46 are frequently referred to as Points-of-Presence (POPs). A POP is the location of an access point to the Internet and has a unique Internet IP

web server. The proxy server acts as a gatekeeper that

receives all the packets destined for the web server and examines each packet to determine whether it can ful?ll the request itself. If the proxy cannot ful?ll the request itself, it

address. A POP usually includes routers, digital/analog call aggregators, servers and frequently frame relay or Asynchro

forwards the request to the web server. Proxy servers can be

used to ?lter requests, e.g., to prevent employees from

nous Transfer Mode (ATM) switches. Shown connected to router 46 is cache engine 48. Connected to router 26 is cache engine 28 and router 24. Router 24 is connected to a corpo rate intranet 22. Because the router redirects packets destined for web servers to the cache engine, the cache engine operates trans parently to clients. Clients do not need to con?gure their browsers to be in proxy server mode. In addition, the opera

accessing speci?c websites. The problem with using proxy servers is that they are not optimiZed for caching and can fail under a heavy network load. Tra?ic is slowed to allow the proxy servers to examine each packet, and the failure of the proxy software or hardware causes all users to lose network

access. Furthermore, proxy servers require con?guration of each end-user’s browser, which is an unacceptable option for ISPs and large enterprises. Because of these shortcomings of

become popular. These caching-focused software applica tions are designed to improve performance by enhancing the

tion of the cache engine is transparent to the network. The router operates entirely in its normal role for non-web traf?c. A web object can contain a Hypertext Transfer Protocol

caching software and eliminating the other slow aspects of

(HTTP) header to instruct a browser in a caching server how

proxy servers, applications that create network caches have

proxy server implementations. Because a proxy server is run 20 to cache the web object. For a static image, such as a com

under a general purpose operating system that involves very high per-process context overhead, they are not easily scale able to large numbers of simultaneous processes. Networking product vendors offer cache engines as a single purpose network appliance that stores and retrieves content using caching and retrieval algorithms. Such cache engines are dedicated solely to content management and delivery. Since only web requests are routed to the cache engine, no other user traf?c is affected by the caching pro cess. For non-web tra?ic, the router functions entirely in its

pany logo, the expiration header can be set to “no expira tion” so that caching servers can keep the image in the cache forever. In order to gather the exact number of hits on a

speci?c page, e.g., an advertisement, a small image object 25

caching server will retrieve the object from the original web server, and the web server can then count the exact number 30

traditional role. The communications between a cache

engine and a router is de?ned by a cache control protocol. Under this protocol, the router directs only web requests to the cache engine rather than to the intended server. With a cache engine, a client requests web content in the usual man

35

40

content, it sends the request to the Internet or Intranet in the usual fashion. The content is returned to and stored at the cache engine. The cache engine returns the content to the

de?ne a mechanism to pass data about the request from the server to the script. Each element on a web page form will have a name and

value associated with it. The name identi?es the data being

45

over a global communications network such as the Internet.

A client computer 12, 14, 16 can request web content via a router 18.

The router 18 intercepts TCP Port 80 web tra?ic and routes it to the local cache engine 20. The client 12, 14, 16 is

independent manner. CGI is simply a standardized way for sending information between the server and the script. The CGI script is a program that communicates with the server in a standard way. Currently, the supported information servers are HTTP servers. Each CGI server implementation must

transaction, and no changes to the client or browser are

client. Upon subsequent requests for the same content, the cache engine ful?lls the requests from local storage. FIG. 1 illustrates an implementation of web cache engines

of requests. The Common Gateway Interface (CGI) is a simple inter face (protocol) for running external programs, software or gateways under an information server in a platform

ner. A router running a cache control protocol intercepts Transmission Control Protocol (TCP) port 80 web tra?ic and routes it to the cache engine. The client is not involved in the

required. If the cache engine does not have the requested

can be added to the page with the object set to expire immediately, so the caching server won’t cache the object. Then, every time a user requests that page, the browser or

50

sent. The value is the data and can either come from the web page designer or from the visitor who types it in a ?eld. When a visitor clicks the submit button, the name-value pair of each form element is sent to the server. CGI scripts gener ally have two functions. The ?rst is to take all the name

value pairs and separate them out into individual intelligible pieces. The second is to actually do something with that data, such as printing it out, multiplying ?elds together, sending an email con?rmation, or storing it on a server. The

not involved in this transaction and no changes to the client computer or browser are required. If the cache engine 20

form has three important parts: the form tag, which includes the URL of the CGI script that will process the form; the

does not have the requested content, it sends the request via

form elements, such as ?elds and menus; and the submit

router 18 to the Internet to access an Internet content server 55 button which sends the data to the CGI script on the server.

Scripts are little programs that add interactivity to a web page. Simple scripts can be written to add an alert box or some text to the web page; more complicated scripts can be

40, 42, 44. The content is returned to, and stored at, the cache engine 20. The cache engine 20 then returns the requested content to the client computer 12, 14, 16 via the router 18. Several cache engines 32, 34, 36 can be placed in a

written that load particular pages according to the visitor’s

cache farm in a hierarchical fashion at an Internet Service 60 browser or that change a frame’s background color depend

Provider (ISP) site 30. Requests from clients 12, 14, 16

ing on the visitor’s mouse clicks. Most scripts are written in

directed through router 18 and ISP server 30, are diverted to

a scripting language called JavaScript that is supported by most browsers, including Netscape Communicator and Microsoft Internet Explorer.

the cache farm 32, 34, 36 to ful?ll the client request from its storage. If the cache engines 32, 34, 36 are unable to ful?ll the request from local storage, a normal web request is made via ISP server 30 over the Internet 50 to an appropriate

JavaScript is an object-oriented language, which means that it works by manipulating objects on a web page, such as

server 40,42,44 for the requested Internet content. In addi

windows, images and documents. JavaScript commands are

65

Gathering enriched web server activity data of cached web content

May 8, 2009 - face (CGI) string of data that contains enhanced web activity data information ..... cache Web pages on his local hard disk. A user can con?gure.

2MB Sizes 1 Downloads 262 Views

Recommend Documents

Gathering enriched web server activity data of cached web content
May 8, 2009 - signi?cant bene?ts to Internet Service Providers (ISPs), enterprise networks, and end users. The two key bene?ts of web caching are cost ...

VISTO for Web Information Gathering and Organization
was designed and built based on recommendations from previous studies in a larger ..... International Conference on World Wide Web, Edinburgh,. Scotland ...

19_Merancang web data base untuk content server.pdf
19_Merancang web data base untuk content server.pdf. 19_Merancang web data base untuk content server.pdf. Open. Extract. Open with. Sign In. Main menu.

A Lightweight Multimedia Web Content Management System
Also we need email for notification. Supporting ... Content meta-data can be subscribed and sent via email server. .... content in batch mode. Anonymous user ...

Web Content Manager - ICARDA Corporate Systems
Mar 8, 2016 - email newsletters, and online outreach campaigns, ... Coordinate web, online and social media projects across ... Basic Adobe Photoshop skills.

web content management pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

Web Content Manager - ICARDA Corporate Systems
Closing date: 08 March 2016. Main purpose of the position ... Coordinate web, online and social media projects across departments. •. Maintain a consistent look ...

Content Inventory Web Governance -
This document is meant to help explore the many aspects of a website redesign. ..... backup to a separate web server. ˃ Check the uptime logs. If uptime is less ...

22924_indoff Counterpoint Consulting ... - Insight Web Server
Solution. For the team of self-proclaimed “open-source geeks” at Counterpoint Consulting, it was never a question of paying for email or other ... “Even if we went with an open-source solution, we knew we would have to maintain our own ... for

web server architecture pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. web server ...

21579_Hub City CS_091307_ry.indd - Insight Web Server
(shared calendaring), Google Talk. (instant messaging and voice over IP),. Google Docs (online document hosting and collaboration), Google Page Creator.

Microsoft Office Web Apps Server - F5 Networks
11 Dec 2012 - F5 iApp is a powerful new set of features in the BIG-IP system that provides a new way to architect application delivery in the data center, and it includes a holistic, application-centric view of how applications are managed and delive

SETING WEB SERVER DI CLEAROS.pdf
SETING WE ... LEAROS.pdf. SETING WEB ... CLEAROS.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying SETING WEB SERVER DI CLEAROS.pdf.

web server technology pdf
Page 1 of 1. web server technology pdf. web server technology pdf. Open. Extract. Open with. Sign In. Main menu. Displaying web server technology pdf.

22924_indoff Counterpoint Consulting ... - Insight Web Server
importance,” says Faulkingham, who notes that the uptime and reduced incoming spam have also increased productivity. Having switched from its own custom-developed tools based on open-source software,. Counterpoint now uses Apps extensively for emai

web server 1 HTML & CSS.pdf
web server 1 HTML & CSS.pdf. web server 1 HTML & CSS.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying web server 1 HTML & CSS.pdf. Page 1 ...

Microsoft Office Web Apps Server - F5 Networks
Dec 11, 2012 - BIG-IP® Local Traffic Manager™ (LTM) and Application Acceleration Manager (AAM) for high availability and optimization of Microsoft. Office Web Apps ..... the existing High. Availability infrastructure and allows for clustering, gra

Gathering Datasets for Activity Identification
archive.ics.uci.edu/ml/index.html ... times to train the system to react to his activities alone. In reality, it is .... ous explanation of the data fields in each sensor file, clear listing of ... part of Basadaeir4, which acts as an API exposing th

Factor Structure of Content Preparation for E-Business Web Sites
To enhance the quality of e-business web sites, a study of factor ..... The best way to determine what information customers want in e-business operation.

Work instructions for publication of web content on www.ema.europa.eu
3. Scope. These instructions cover all content (document or HTML text) that is published on ... Work instructions PUBLIC. WIN/EMA/0099, 04-JUL-2016. Page 2/4.

Data enriched linear regression - arXiv
big data set is thought to have similar but not identical statistical characteris- tics to the small one. ... using the big data set at the risk of introducing some bias. Our goal is to glean ...... Stanford University Press,. Stanford, CA. Stein, C.

Chris Abraham of Abraham Harrison - Insight Web Server
centric roots. Besides the two founders, the team consists of 14 contractors spread across the United States, Europe, India and Africa, who often change location ...