T6 AN EXPLORATION OF THE DYNAMICS OF AUXILIARY NETWORK RESOURCES Jack Young1 (UCL) Lionel Sacks2 (UCL) Chris Roadknight3 (BT Labs)

1

{[email protected]} Department of Electronic & Electrical Engineering, University College London, Torrington Place, London WC1E 7JE, United Kingdom.

2

{[email protected]} Department of Electronic & Electrical Engineering, University College London, Torrington Place, London WC1E 7JE, United Kingdom.

3

{[email protected]} B54/124, BT Labs, Adastral Park, Martlesham Heath, Ipswich IP5 3RE, United Kingdom.

AN EXPLORATION OF THE DYNAMICS OF AUXILIARY NETWORK RESOURCES

ABSTRACT With the coming of active networks, bandwidth will no longer be the only network resource. Programmable components inside the network introduce a computational load previously not present. To properly dimension network components requires an understanding of the expected demand for these non-bandwidth resources. Present-day World Wide Web caching is a good test bed for active network analysis in that it responds to demand for non-bandwidth resources, according to user interest and behaviour. Previous analysis of web cache statistics has found it to be statistically self-similar. One possible source of selfsimilarity is chaos. This idea, coupled with the notion that network behaviour is likely influenced by complex feedback loops, suggests that chaos may be present. The existence of chaos in traffic would give scope for better modelling and control of network resources over stochastic modelling. Our research uses the method of ‘Global False Nearest Neighbours’ to detect if chaos is present in cache traces, by distinguishing between deterministic chaos and random noise. Our results show that chaos is not readily present in the traces, despite the given reasons for expecting it. However, there is much scope for further study, both in terms of data sources for study and theory and method.

AN EXPLORATION OF THE DYNAMICS OF AUXILIARY NETWORK RESOURCES INTRODUCTION With the coming of active telecommunication networks in the future, bandwidth will no longer be the only network resource. Programmable components in the network will introduce a computational load that previously was not an issue. To properly dimension active network components we require an understanding of the expected demand for non-bandwidth resources such as this. The present-day World Wide Web is a good test bed for active network analysis, particularly web caching, in that it responds to people’s demand for non-bandwidth resources (in addition to bandwidth, of course), according to user interest and behaviour. Previous analyses of web cache traces[1][2], in common with general kinds of data traffic (for example, [3]), show it to be very bursty, exhibiting a very high variance over time. Furthermore, their statistics are self-similar (or fractal), meaning the behaviour seen at small time scales repeats at larger and very much larger time scales too. There is also long-range dependency in the data such that the behaviour seen at one instant would appear to be dependent on what happened an hour, a day, or a week earlier[1]. Chaos is associated with self-similarity and long-range dependence, and as such is worthy of investigation in this context. If indeed the origin of the fractal traffic patterns is chaos we have a very useful result. We would have a new component in our models for web traffic behaviour, whereby: §

we have a realistic new technique to simulate web behaviour;

§

we may be able to characterise traffic into different types;

§

the possibility of short-term prediction arises;

§

and a further investigation of the system could lead to an identification of the major physical variables of the system, and an insight into the mechanism causing it, giving scope for traffic controlability[4][5].

Whilst much present research focuses only on periodic and stochastic aspects to model traffic, the research presented here contributes that much further to the general effort of traffic modelling by applying notions of chaos, whereby the behaviour is random-looking, but nonetheless remains deterministic. Chaos is a property of some non-linear systems. What is interesting about these systems is that, while their behaviour Page 1

AN EXPLORATION OF THE DYNAMICS OF AUXILIARY NETWORK RESOURCES is non-periodic, looks random and sometimes is self-similar, they are actually governed by rules. Therefore we have greater scope for prediction and control of traffic if indeed the dynamics are chaotic. There is further reason to assert that chaotic behaviour is present here - chaos is often a feature of feedback systems. It is reasonable to suppose that the dynamics may be governed by some, albeit complex, feedback system — after all, while the network’s performance is dependent on how much its users demand of it, at the same time the users may change the way they use the network according to its performance[6]. Feedback could also originate purely from within the network, say, as a result of network protocol interactions such as TCP. METHODS Classical signal analysis methods, such as Fourier analysis are inadequate when dealing with chaos, often dismissing the dynamics as noise. It is therefore necessary to invoke new techniques which specifically handle chaos. The main tool used here is that of ‘global false nearest neighbours’ (GFNN), formulated by Abarbanel and his colleagues, and presented in a book [7] and in more detail in an earlier paper[8]. I will outline the method here. Chaotic systems are governed by a set of underlying deterministic dynamics, thus the phase space of the system , and its time-evolution, takes on a specific structure. The way to identify chaos is to look for this structure. In contrast, noise does not have a comparable structure. The technique of ‘embedding’ is used to take the scalar output data from the system and build it into an n-dimensional space. In some sense it is unfolding the hidden dynamics which are buried inside the scalar data, somewhat analogous to unfolding a cube from its net (Figure 1). The GFNN successively increases the dimension of the embedding and, roughly speaking, computes what proportion of the structure is still unfolded. If the system appears to never C B B

A,B,C

B C

A,C

A

A

Figure 1: Unfolding a cube. Notice how vertices A,B and C which were coincident when folded are no longer when the cube is unfolded.

Page 2

AN EXPLORATION OF THE DYNAMICS OF AUXILIARY NETWORK RESOURCES (i)

(iii)

(ii) GFNN for a Chaotic System

GFNN for a Noise

100

GFNN for Chaos+Noise

100 computed GFNN with last component shuffled

80

100 computed GFNN with last component shuffled

80

60

60

60

40

40

40

20

20

20

0

0 2

4

6 Dimension

8

10

computed GFNN with last component shuffled

80

0 2

4

6 Dimension

8

10

2

4 6 Dimension

8

10

Figure 2: Comparison of GFNN for (i)chaos, (ii)noise, and (iii)chaos+noise, showing the ‘noise ceiling’ computed by ‘shuffling the last component’

unfold we have what can loosely be termed ‘uncorrelated noise’. With this powerful tool, it is possible to detect patterns in a scalar data series which could arise from deterministic chaos. An important add-on feature of this method, is to compute the same statistic for a particular kind of randomisation of the data (see [8], concerning ‘Last Component Shuffling’). This gives us a ‘noise ceiling’ to which we can compare the GFNN results as if to say ‘this is what we would get if the signal was pure noise’. The results of the analysis are plotted as a proportion of ‘unsuccessfully-embedded’ points, against the value of the dimension of the embedding space (circles). The statistic for the shuffled data is shown as stars. Figure 2 shows this for three systems: (i)

a 2.06-dimensional (fractional-dimensional) chaotic system (a Lorenz system[9]), which requires 3 internal variables to uniquely specify the system’s state;

(ii)

a purely noise system, whereby no degree of embedding is sufficient to ensure the dynamics are deterministic (the working definition of noise for our purposes);

(iii)

a mixture of the outputs of the two systems above, noise plus chaos. The value of the statistic settles down at the same dimension as before, but remains above the x-axis. This indicates that the data can be ‘unfolded’ to a point, but beyond this it remains somewhat stochastic.

The initial tail-off in the graph is in fact an artefact of the method, and this necessitates the use of the shuffling technique.

Page 3

AN EXPLORATION OF THE DYNAMICS OF AUXILIARY NETWORK RESOURCES RESULTS We supplied the GFNN method with some measured Web data. The data used in our investigation included: the global cache hit rate; the request rate for the whole cache; the request rate for some individual URLs (www.hotmail.com, www.msn.com, home.netscape.com); and some of these statistics after various normalisations, deperiodisation, and smoothing using a primitive moving average method. The data was obtained from traces published by NLANR[10] top-level caches, and covered one- and three-week periods of time. See their web site for more information about their caches. The data was confirmed to be self-similar, in accordance with previous research[1] [11][12], showing Hurst parameters of around 0.7 to 0.9. The top row of Figure 3 shows the GFNN statistic computed for the cache hit rate, and is typical in appearance to that for the other data sets examined by us, namely the URL request rate, etc. It shows a partially-successful embedding, suggesting that the data expresses a noisy chaotic motion. Nevertheless, a deeper analysis is necessary to confirm this, or if otherwise it is an oddity in the data. For example, it is known that this kind of result can be produced from a nonlinear transformation on white noise[8]. Figure 3 also shows a simulated trace we constructed, by adding (self-similar) fractional gaussian noise to a 7-cycle sine wave (in the proportions 3:1 respectively, measured as RMS values) and a comparable linear downward trend (falling from 25% to 20%). Not only are the two time traces similar in appearance, but the GFNN plots are effectively indistinguishable. Cache Hit Rate %

Global False Nearest N e i g h b o u r s

35

100

30

80

25

60

20

40

15

20

10

computed GFNN with last component shuffled

0 0

24

48

72 96 time (hours)

120

144

168

2

4

6 Dimension

8

10

Global False Nearest N e i g h b o u r s

Simulated Hit Rate %

35

100

30

80

25

60

20

40

15

20

10

computed GFNN with last component shuffled

0 0

24

48

72 96 time (hours)

120

144

168

2

4

6 Dimension

8

10

Figure 3: Global cache hit rate, with its GFNN analysis, together with a simulated hit rate and its GFNN analysis

Page 4

AN EXPLORATION OF THE DYNAMICS OF AUXILIARY NETWORK RESOURCES The necessary conclusion from this is that we cannot distinguish - by this method - between the measured cache data and a noisy periodicity. Indeed, it is likely that the periodicity in the data is the cause of the partiallysuccessful embedding, since periodicity also embeds well as, just like chaos, it too represents a deterministic system. Further research we undertook looked at the underlying trends visible in the various data sets, where new data are produced by smoothing the measured traces. Residuals which overlay the varying trends were also examined. None of these were found to be clearly chaotic in their raw forms and we therefore presume that any chaos which may be present is at a very low level, not detectable by the method used. For a full presentation of this research the reader is referred to [13]. CONCLUSIONS The aim of this investigation has been to determine whether or not chaotic dynamics are present in web cache traces, in the context of analysing the expected demand for non-bandwidth resources of active networks. This has looked at a range of traces from the cache studied, together with some processed data in the form of normalisation, smoothing and residualing. In this study no signs of chaos have been found. This could mean that less-ideal stochastic modelling will have to be used for this kind of analysis. There are some simple possible causes that may have led to our negative results. It may the that the time resolution used is too coarse. Or, the data needs to be pre-processed in the correct manner to reveal any chaos, such as deperiodisation, or normalisation. The hypothesis that chaos should be present, based on the notion of feedback loops in the network, is still, nonetheless, a very reasonable proposition. Thus if chaos is present then it would appear to be only there in a small proportion compared to the stochastic contribution of network and user ‘noise’. A more interesting proposition is that the stochastic contributions to the system could conceivably be ‘tempered’ by chaos due to the supposed feedback mechanism. If so, we require somewhat more sophisticated methods for finding the chaos than the one used here. REFERENCES [1] Marshall, M. Boissaux, & C. Roadknight; Periodicity and Self-similarity of Web Cache Traces, Technical report available by e-mail from [email protected]. Page 5

AN EXPLORATION OF THE DYNAMICS OF AUXILIARY NETWORK RESOURCES [2]

M.E. Crovella & A. Bestavros, Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes; IEEE/ACM Transactions on Networking, 5(6):835-846, December 1997.

[3]

W.E. Leland, W. Willinger, M.S. Taqqu & D.V. Wilson; On the Self-Similar Nature of Ethernet Traffic, presented at ACM SIGComm. September 1993.

[4]

W.L. Ditto, M.L. Spano, J.F. Liner; Techniques for the Control of Chaos, Physica D86 1995 198-211.

[5]

E. Ott, M. Spano; Controlling Chaos, Physics Today, May 1995, 34-40

[6]

B.A. Huberman, R.M. Lukose; Social Dilemmas and Internet Congestion, Science Vol. 277, 25 July 1997.

[7]

H. Abarbanel; Analysis of Observed Chaotic Data, Springer-Verlag 1995. ISBN 0-387-94523-7 (HB) / 0-38798372-4 (PB).

[8]

M. Kennel & H. Abarbanel; False Neighbours and False Strands: A Reliable Minimum Embedding Dimension Algorithm, available at ftp://inls.ucsd.edu/pub/inlsucsd/false_strands.tar.Z

[9]

E.N. Lorenz; Deterministic Non-Periodic Flow, J. Atmos. Sci., 20, 130 (1963). See also general books on chaos, such as E. Ott; Chaos in Dynamical Systems, Cambridge University Press 1993.

[10] NLANR (National Laboratory for Applied Network Research.) See http://ircache.nlanr.net [11] M. Hawa; A Study of Self-Similarity and Deterministic Trends in Web Cache Traffic using Wavelet Analysis, M.Sc. Dissertation at UCL (1999), supervised by Lionel Sacks, available at http://www.ee.ucl.ac.uk/~lsacks/masters/proj99/m_hawa.pdf [12] N. Dao; Internet Traffic Characterisation, M.Sc. Dissertation at UCL (1999), supervised by Lionel Sacks, available at http://www.ee.ucl.ac.uk/~lsacks/masters/proj99/n_dao.pdf [13] J. Young; Characterising World Wide Web File Popularity Dynamics: Searching for Chaos in Web Traffic Traces, M.Res. dissertation (1999), available at http://www.ee.ucl.ac.uk/~lsacks/masters/proj99/j_young.pdf

Page 6