1. Introduction Analytical modeling of complex systems is crucial to detect error conditions or misbehaviors that arise from different realities such as bottlenecks, capacity planning problems and scalability issues, to name a few [17, 18, 6, 3]. It is possible to represent a system using stochastic modeling formalisms such as Markov Chains [23] or more structured approaches as Petri Nets [1], Markovian Process Algebras [14] or Stochastic Automata Networks (SAN) [19, 5]. We direct our attention to SAN formalism since, like other structured formalisms, it provides simple means to de∗ Authors

receive grants from Petrobras (0050.0048664.09.9). The order of authors is merely alphabetical. Paulo Fernandes is also funded by CNPq-Brazil (PQ 307272/2007-9). Afonso Sales receives grants from CAPES-Brazil (PNPD 02388/09-0).

pict components and communications among its elements (synchronous and asynchronous). Since its first definition, SAN was used to create modular representations of systems with local and independent behavior that occasionally interacts and synchronizes with other modules. As other Markovian based formalisms, SAN is used to derive performance indices for analysis and interpretation. Briefly, the process multiplies an initial probability vector by a non trivial data structure called descriptor, i.e., a representation of the underlying Markovian transition matrix [11]. SAN uses state-of-the-art algorithms to efficiently compute the probability vector and return the performance indices to its modelers [9, 8]. However, SAN solutions are bounded to specific limits. The current SAN solver, PEPS software tool [4], works with less than 65 million states. Related to Markovian modeling, this limit is very low as sometimes even small sized realities requires massive quantities of states to represent all possibilities. This problem is frequently defined as state space explosion problem and several approaches are used to mitigate its harmful effects. One valid technique is simulation [15, 21], where solution approximations can be successfully derived with an associated computational cost. Simulation itself has several drawbacks such as burning time, initial state definition and halt problems, however, it allows the solution of huge models without storage bounds. This advantage justifies its usage in several contexts where the associated precision must be measured in relation to the amount of samples produced, i.e., a process known as sampling. Sequential sampling of large Markovian models has a high computational cost, requiring huge amounts of resources to produce the desired number of samples. In light of this problem, recent years witnessed the adoption of parallel sampling techniques, where the task of producing large amounts of samples are divided by several workers, all synchronized by a master entity. This computational model helps producing more samples in less time. Given the set of different simulation techniques available for modelers, we

are interested in investigating how one can profit from parallel sampling when traditional [15, 21], forward [13, 21] and backward [12, 20] simulation are used, as well as the impact on result accuracy. The aim of this paper is to present a parallel generation of samples for the solution of SAN models, using the aforementioned simulation techniques in order to guarantee the results accuracy in several examples. It is worth mentioning that our results could be applied to other structured Markovian formalisms without loss of generality since they all have an underlying transition matrix used in the sampling process. The remainder of this paper is organized as follows. Section 2 presents the Stochastic Automata Networks (SAN) formalism and its principles. In Section 3, we debate three well-known simulation techniques for Markovian models. Section 4 presents the methodology for parallel sampling applied to the simulation techniques previously mentioned. Finally, Section 5 draws some future works and final remarks.

tivities with other automata. Uniquely, each automaton can be viewed as a MC having states and transitions that maps how this subsystem is interconnected. The transitions map a single event or a list of events that can be of two distinct types: local or synchronizing. Local events occur independently whereas synchronizing events depend upon other automata in order to be fired. Events are mapped to rates for a continuous-time MC or to probabilities for discrete-time MC (for that matter, solution must be adapted to each case). Finally, in a SAN model, rates can be constant (a scalar) or functional (dependent of the states of some automata to be fired). A key feature of SAN is to work with such functional rates that provide a slightly different concept for users than pure synchronizations, i.e., a simple representation to describe complex interactions. S (1)

S (2)

A

X

l2

s1

B

C

l1

s1

2. Stochastic Automata Networks (SAN) Markovian modeling formalisms are commonly used to investigate complex systems and infer performance indices. To profit from such representations one must depict the system under consideration using simple primitives such as states and transitions. Each transition informs how the system is interconnected and at which frequency state changes occur. Although very simplistic in its basic operations, Markov Chains (MC) usually require large amounts of states to compose any given system causing what is often referred as state space explosion problem. Structured formalisms were proposed to mitigate the harmful effects of representing systems with MC. Such representations strongly rely upon system composition and modularization, creating high-level mechanisms that eventually generates an underlying MC [23]. The emergence of structured formalisms enabled more complex interactions among components and presented simpler means to extract performance indices of different realities. Structured formalisms are numerous and examples are Petri Nets [1], Markovian Process Algebras such as Performance Evaluation Process Algebra (PEPA) [14] and Stochastic Automata Networks (SAN) [19, 5]. We focus our attention to SAN, which was originally proposed by Plateau. This formalism defines autonomous entities termed automata. Each automaton represents an element within a broad system, and it can possess local (independent) behavior and occasional dependencies, i.e., synchronizing behavior. SAN is specially suited to represent parallel and distributed systems that intensively operates locally or independently, however, sometimes, it needs to synchronize ac-

Type

Y

syn loc loc loc

Event Rate s1 r1 l1 f l2 r3 l3 r4

f = (S (1) 6= A) × r2

l3

Figure 1. SAN model with two automata. Figure 1 depicts a SAN model with two automata (S (1) and S (2) ), three local events (l1 ,l2 and l3 ) and one synchronizing (s1 ). Event l1 has a functional rate defined by the function1 f that verifies the state of automaton S (1) , and this event can occur with rate r2 if the state is equal to A. If the functional condition is not satisfied, event l1 is not allowed to occur. In a structured model, the product state space (PSS) is the combination of all local automata states, representing the global states of the model. For the example presented in Figure 1, the PSS is equal to 3 × 2 = 6 states. However, depending on the nature of synchronization and functions of the model, only some global states are valid, i.e., these global states are named as reachable. The set of reachable states (RSS) is actually the states of the underlying MC. The Markovian representation for the model is demonstrated in Figure 2. The functional rate caused a state (AY ) to be unreachable, as shown in the model with dotted lines. The function forces the event l1 to be fired only for BX (to BY ) and CX (to CY ), not for AX (to AY ), as depicted by the figure. 1 The interpretation of a function can be viewed as the evaluation of an expression of non-typed programming languages, e.g., C language. Each comparison is evaluated to value 1 (for true) and to value 0 (for false).

r2

3. Simulation techniques

BX

BY r3

r4

AY

r3

r4

r2

AX r1

CX

CY r2

Figure 2. Underlying MC obtained from the SAN model in Figure 1.

We stress the fact that, in SAN, the underlying MC is never stored, only accessed through tensor operators (using tensor algebra properties [11]). In the next section, we proceed discussing some SAN modeling examples.

2.1. Modeling examples Our numerical analysis is based on three SAN models describing different characteristics, such as: Alternate Service Patterns (ASP), First Available Server (FAS) and Resource Sharing (RS). For a more detailed description of these models, we suggest the reader to consult the previous works in the literature [22, 10, 2]. ASP model describes an Open Queueing Network [23] with servers that present different service patterns. In this model, four queues with K capacities are represented by four automata and another automaton is modeled to represent the quantity of service patterns. In this model, all states are reachable and it has the P SS = (K + 1)4 × P , where P is the number of service patterns. FAS model evaluates the availability of N servers, where every server is a two state automaton, representing the two possible server conditions: available or busy. In this example, tasks are assigned to the first server. If this server is busy, the task must be sent to the second server and so on, i.e., the first available server processes the task. This model has the P SS = 2N and all states are also reachable. The classical RS model maps R shared resources to P processes. In this model, each process is represented by an automaton with two states: idle or busy. The quantity of available resources is indicated by an automaton that counts the number of resources being used. This model has the P SS equal to 2P × (R + 1) and, due to the nature of the model transitions, not every state is reachable.

After the reality is modeled, the system is ready for solution, i.e., indices calculations for the set of components, transitions and rates. In the case of SAN, a Cartesian product comprising the local automata states is derived, thus originating the PSS of the model. The PSS corresponds to all possible states for any given system and, depending on how the automata are described, it can potentially have unreachable states, a phenomenon that was absent in pure Markovian modeling. Despite this drawback, SAN is a powerful modeling formalism. The main advantage of SAN is due to the memory savings since it uses tensor algebra properties to only access the transition matrix rather than instantiating it in memory [11]. Specialized algorithms are used to solve SAN models, in a procedure called vectordescriptor product (VDP). The VDP multiplies an initial probability vector by a non-trivial structure, i.e., a descriptor in an iterative fashion [9, 8]. When this operation is performed for a sufficient number of iterations, a convergence test is considered and stationary regime properties are analyzed. The final probability vector contains the state permanence probabilities when the initial state no longer affects the solution. These results are ready to be subjected to the modelers for further interpretation and refinement. The former alternative is particular to the solution of SAN models and it suffers from scalability issues, i.e., the current solution software tool PEPS [4] works with models having no more than 65 million states. Although, recent researches based on the monotonicity property has overcome these bounds [12]. The motivation behind the use of simulation relies on the fact that depending on the model under study, its numerical solution are computationally expensive. SAN, as a matter of fact, does not store the transition matrix, so it is theoretically the best option for huge models. However, SAN is bounded by the PSS model size because it needs to allocate at least three vectors of that size in order to compute VDP accordingly. Rather than use VDP to extract meaningful performance indices from Markovian models, one of the alternatives is simulation. This approach considers only the reachable state space of the model and approximates solution in function of the number of samples that is generated, which is a clear advantage of simulation over numerical solution. The accuracy of solution is heavily determined by the amount of simulation runs (or trials) that are executed, where different statistical methods can be used to attest its quality, such as confidence intervals. It is worth mentioning that, using simulation, otherwise intractable models can now produce samples that are proved to be generated from stationarity, since using backward coupling procedures [13, 20]. In this paper, we focus on three types of simulation applied to Markov Chains: i) traditional [15, 21]; ii) forward

iii) Backward Simulation: all possible states are fired in parallel, back in simulation time in different trajectories. Depending on the model transitions, some trajectories couple, merging the next steps. The process stops when all trajectories coupled in a unique state, which is the sample that is stored. An obvious disadvantage to use backward simulation is the need to fire all trajectories in parallel, however, it is proven that monotonic models are more suitable for adoption since only two (or few) trajectories must be considered for the method [24, 12] - Figure 3 (c). The three simulation types presented produce similar results in terms of accuracy. However, note that traditional and forward simulations suffer from initial state definition and it is hard to determine when to stop the simulation, i.e., determine a considerable amount of samples to achieve relevant statistical results. On the other hand, backward simulation does not have such problems, but it has a high computational cost to collect samples. It is evident that the major problem, when using simulation approaches, is to generate samples in a timely manner.

sam

pl e

ple sam

sample

8

9

...

sample

0

1

sample

2

3

4

5

6

7

...

Time

pre-defined stopping criteria

• simulation time: corresponds to the total time of simulation considering the production of n samples.

ii) Forward Simulation: the forward simulation approach is slightly different from traditional simulation. Rather than storing each trajectory state, for a trajectory length previously defined. The simulation process performs a pre-defined number of steps before collecting a sample - Figure 3 (b);

sample sample

• run or execution: consists of a full simulation experiment, eventually producing samples;

i) Traditional Simulation: this type of simulation was the first to be used in Markovian realities and consists of, from an initial state, performing random walks on the set of possible states and saving each step as a sample for further inspection - Figure 3 (a);

sam

pl e sam

Initial state

• sample: consists of a state state from the model’s RSS, collected from a sampling process;

Following we discuss the characteristics of different types of simulation approaches:

ple

t

ple

• trajectory: corresponds to a sequence of visited states (or steps) taken from a given initial state. The trajectory length comprises the amount of steps that was performed;

(a) Traditional States

sam

[13, 21]; and iii) backward [20, 13, 24, 12] simulation. Before explaining each type, some basic notions are needed to fully comprehend Markovian simulation, such as:

(b) Forward States

t

...

sample

Initial state

0

1

2

3

4

5

6

7

8

9

...

Time

pre-defined stopping criteria

(c) Backward

τ

States

sample

-8

-7

-6

-5

-4

-3

-2

-1

0

Time

stopping criteria (coupling)

Figure 3. Simulation methods

In order to compensate the sequential generation of samples, we shall proceed investigating how to accelerate sampling procedures through parallel estimations and its impact in the overall accuracy, as presented in the next section.

n = 1.000.000 samples

100

65.32 60.21

96.59 92.98

180.59 163.32

1000

Traditional Forward Backward

317.73 309.79

610.46 602.75

10.000

0.0313

0.0157

0.0352

1 0.1

0.0586

10 0.0625

Simulation techniques often demand lots of samples in order to calculate statistically relevant performance indices and such samples can be produced independently. Moreover, independent samples are suitable to be subjected to parallelization efforts. The basic principle is to divide the sampling process in work units and assign a load for different processors. Note that the computational complexity is shifted towards to: i) faster ways to compute each sample; and ii) the adaptation of parallel sampling techniques applied to the aforementioned simulation approaches. We focus our attention to parallel sampling and the problems involved to coordinate distributed efforts in parallel environments. In simulation, it is clear that high quantities of samples only help improving the solution accuracy. This section presents the main idea of parallel sampling applied for all three techniques in order to obtain the simulation time for our set of examples, as well as observe the differences among these approaches in terms of accuracy. The parallel model follows a master-worker pattern [25] where entities known as workers perform sample generation and the master assembles all produced data, adding the samples in its corresponding probability vector entries and uniformizing results for latter inspection. When a sample is computed, it is saved locally and, when the amount of collected samples reaches the required quantity, it is readily sent to the master. The final vector is compared to the analytical solution to verify the disparity between both distributions. It is worth mentioning that this calculated distance directly influences the quality of estimations. To validate the desired results, we run our models in a multiprocessor technology machine and perform the three simulation approaches for a variable number of samples. The implementation of our parallel version of these approaches was coded using C++ language and MPI primitives. Our tests were executed in a Dell PowerEdge R610 composed of two processors Intel Xeon E5520 QuadCore 2.27 GHz, Hyper Threading technology, 16GB memory, under Linux O.S. The prototype was compiled using gcc version 4.2.4 and MPICH library version 1.2.7p1. All simulation runs were repeated 30 times in order to produce samples having 95% confidence interval. Figure 4, 5 and 6 show the total simulation time needed to compute one million samples (106 ) for the ASP, FAS and RS models respectively, where the number of processors varied from a single processor to 16. This set of models was parametrized as follows: ASP model - every queue with capacity two (K = 2) and two service patterns (P = 2); FAS model - with nine servers (N = 9); and RS model with 10 processes (P = 10) and five resources (R = 5). These models are parametrized in a way that one can easily obtain the analytical solution in order to calculate the

precision achieved using those simulation techniques. Note that the parametrization for huge models impacts only on the amount of samples needed to obtain accurate results, but the time spent to generate a single simple is not affected in the traditional and forward techniques. It is important to observe that we use a logarithmic scale for the Y-axis in our graphics and the bars for each number of processors demonstrate the time spent for each simulation approach. In Figure 4, looking at only the traditional simulation bar, we notice an U shaped curve, where the optimal result was verified for four processors. This technique demands low computational cost to collect samples and, considering more than four processors, the communication among workers and master directly impacts in the total simulation time. That behavior is not that evident for both forward and backward simulation techniques, where the simulation time significantly drops as the number of processors increases (from one to 16). For example, in a single processor it took around 600 seconds to compute the whole set of samples whereas, for 16 processors, the simulation time plummeted to only around 60 seconds (with small variations from backward and forward techniques)2 .

Time (s)

4. Parallel generation of samples

0.01 0.001 1

2

4

8

16

Number of processors

Figure 4. Simulation time - ASP model A similar set of results was obtained for the FAS model (Figure 5). In contrast to the ASP model, the difference between backward and forward simulation is slightly larger for the FAS case. The difference particularly for one or two processors can be explained by the nature of the model synchronizations and local behaviors that influences the coupling time in the backward technique. For the traditional case, the results followed the expected pattern, as also shown in Figure 4, while the number of processors varied. At last, Figure 6 shows the results for the RS model. It is noticeable that the forward technique performed better than backward simulation, even for a large number of processors, where, for instance, 16 processors needed about 84 2 The dramatic decrease is more perceptible for a non logarithmic scale Y-axis.

n = 1.000.000 samples Traditional Forward Backward

10

0.01 0.001 4

8

16

Number of processors

Figure 5. Simulation time - FAS model seconds for forward and 57 seconds for backward. For the RS model, backward performs better than the forward simulation in terms of time spent. It also showed that the traditional approach took more time than the previous examples, due to the nature of model’s synchronizations. n = 1.000.000 samples Traditional Forward Backward

0.01 0.001 1

2

4

8

16

Number of processors

Figure 6. Simulation time - RS model

0.25 0.2 0.15 0.1 0.05

0.0061 0.0062 0.0074

0.1133

0.0313

0.0352

0.0391

0.1

0.0069 0.0063 0.0075

0.3

0.0110 0.0079 0.0089

1

Mean relative error

0.35 0.0781

10

Traditional Forward Backward

0.0285 0.0174 0.0181

0.4

0.1017 0.0505 0.0526

84.73 57.52

100

Time (s)

0.45

117.88 83.13

204.79 139.22

1000

387.62 266.96

759.26 523.84

10.000

0.3740

2

In order to test the results accuracy, we have conducted experiments where the number of processors was fixed to 16, varying the amount of samples from 104 to 109 . As we know for a fact that the amount of produced samples is directly related to the overall simulation precision, we proceed our analysis by investigating the role of precision itself for the available models. Our basis of comparison is the analytical solution using PEPS [4] as we test the effect of the total number of samples in relation to results accuracy. Figure 7, 8 and 9 show the mean and maximum relative errors for ASP, FAS and RS models. In each figure, the graphic presents the mean relative error in the Y-axis, varying the amount of samples from 104 to 109 , whereas the table indicates the maximum relative error found for each experiment using the three simulation techniques.

0.1613 0.1866

1

to generate one million samples. Our parallel results showed impressive results in terms of time reduction when applied from one to 16 processors. The next step is to verify the accuracy sensitivity in relation to the sampling technique and different amounts of samples.

4.1. Accuracy analysis

0.0586

0.0352

0.0195

0.1

0.0273

1

0.0586

Time (s)

100

58.54 75.39

86.59 112.29

143.79 199.90

1000

279.79 385.67

528.68 749.40

10.000

106

107

108

109

0 104

105

Number of samples

For all three models, the same pattern emerged: as the number of processors augmented, traditional simulation behaves as a U shaped curve from a single processor to 16, and both forward and backward simulation require around at least 10 times less effort to compute the one million samples. This is a remarkable result, showing that parallel sampling computation depends on simulation technique and traditional simulation begins to underperform as the number of processors is increased. The exact opposite occurs for forward and backward simulation, where the results showed to improve as the number of processors increases. This assumption is not readily evident, since intuitively, the traditional technique was expected to have at least the same behavior as the others. The ASP, FAS and RS examples demonstrated the behavior for three simulation approaches and the time needed

Traditional Forward Backward

104 14.9475 1.0000 1.2910

Maximum relative error 105 106 107 108 2.5496 0.1438 0.1034 0.0840 0.3814 0.1255 0.0648 0.0789 0.2862 0.1147 0.0668 0.0488

109 0.0805 0.0831 0.0522

Figure 7. Relative error - ASP model Observing the results for the ASP model (Figure 7), one can verify a greater distance from the analytical solution using a small quantity of samples (104 ). Note that as the amount of samples increases the mean (and also maximum) relative error reduces. Since simulation gives an approximation of the stationary solution, for the ASP case, 107 samples already produces significant results, i.e., there is a small accuracy variation between 107 and 109 samples.

Traditional Forward Backward

0.3151

0.35

0.2422 0.2491

0.3 0.25

0.0010 0.0008 0.0011

0.05

0.0033 0.0025 0.0025

0.1

0.0110 0.0079 0.0080

0.15

0.0325 0.0244 0.0247

0.2 0.1063 0.0752 0.0763

Mean relative error

Moreover, 104 samples are not adequate for all three techniques, whereas for 109 it practically resembles the numerical model solution (with a precision less than 10−2 ). Figure 8 shows the results for the FAS model. In comparison to the ASP model, it is noticeable that for 104 and 105 samples the mean relative error was still very high until 108 and 109 samples. In terms of techniques, the FAS model presents worst results for traditional simulation, except for 104 samples. From 105 samples, the forward simulation technique presents better results than backward. For this example, the maximum relative error is bigger than ASP for all simulation approaches and amounts of samples.

107

108

109

0 104

105

106

Number of samples

Traditional Forward Backward

1.2 1 0.8

0.0383 0.0379 0.0410

0.2

0.0582 0.0455 0.0494

0.4

0.1552 0.0967 0.1059

0.6

108

109

0 104

105

106

107

Number of samples

Traditional Forward Backward

104 2.4080 1.3856 1.3856

Maximum relative error 105 106 107 108 0.6529 0.1604 0.0466 0.0135 0.3632 0.1204 0.0367 0.0135 0.3802 0.1156 0.0349 0.0133

109 0.0065 0.0043 0.0046

Figure 9. Relative error - RS model 0.4214 0.2769 0.2995

Mean relative error

1.4

0.9986 0.9056 0.9176

1.6

Traditional Forward Backward

1.2876 1.5330 1.4945

1.8

104 50.0041 52.4735 47.2950

Maximum relative error 105 106 107 108 21.7341 3.6133 1.6371 0.2895 9.2008 1.7860 0.6074 0.2618 11.0172 2.3051 0.8220 0.2830

109 0.1809 0.1491 0.1768

Figure 8. Relative error - FAS model Figure 9 shows the results for the RS model. This particular example produces the smallest relative and maximum errors when compared to the previous examples. For all simulation techniques the accuracy variation is limited to low thresholds where for 107 and up, the precision is very adequate as well as the maximum relative error. Only for 104 samples we notice a broader variation for the mean and maximum relative errors, however, as stated before, 104 samples is very non representative for every simulation study, which is common to use at least 106 samples to achieve comprehensive results. The numerical indices presented in this section demonstrate the importance of accuracy when performing a complete simulation study and its relation to the quantity of collected samples. We remark that as a large quantity of samples is generated consequently better accuracy results are obtained for all techniques applied to aforecited models. Moreover, it is important to notice that we study a subset of models with different characteristics (such as different

synchronization patterns) and, in spite of presenting distinct levels of accuracy, all simulation techniques show a similar behavior for all models. We finalize the paper by discussing some final remarks and future works.

5. Conclusion The solution of Markovian models is not a trivial task mainly when stationary behavior is needed and simulation techniques could be applied. Each technique has advantages and drawbacks when compared to analytical solution. However, since taking advantage of sampling procedures, one must study the impact of using them to guarantee accuracy, balancing the simulation time. In this context, parallel implementations are suitable in order to obtain a large number of samples in a timely manner. We work with three types of simulation techniques widely discussed in the literature: traditional, forward and backward. Moreover, we investigate two aspects for different sets of models: i) time to generate lots of samples in parallel; and ii) the observed accuracy gain (or loss) considering different quantities of collected samples. We conclude that traditional simulation, despite its effectiveness in terms of time to generate a large quantity of samples, spend more time communicating with the master as the number of processors augments than actually generating samples. Splitting the task of generating samples in different processors help us calculate the gains to improve simulation accuracy as huge amounts of data are produced in parallel. In terms of accuracy, our results show that, de-

pending on the size of the model, the process can be stopped earlier and still produce statistically relevant results, very near to the analytical solution. Our future work is towards to the generation of more complex sampling procedures, for instance, Bootstrapping [16] which is a well-known statistical technique applied to many research fields to improve accuracy when performing sample estimations for complex distributions. Preliminary results using such technique produced even more precision in comparison with the numerical solution [7]. Another work in progress is to use variation reduction techniques and advanced programming efforts to improve the generation and storage of each sample also within a single processor.

[10]

[11]

[12]

References [1] M. Ajmone-Marsan, G. Balbo, G. Conte, S. Donatelli, and G. Franceschinis. Modelling with Generalized Stochastic Petri Nets. John Wiley & Sons, 1995. [2] C. Bertolini, L. Brenner, P. Fernandes, A. Sales, and A. F. Zorzo. Structured Stochastic Modeling of Fault-Tolerant Systems. In Proceedings of the The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS 2004), pages 139–146. IEEE Computer Society, 2004. [3] L. Brenner, P. Fernandes, J.-M. Fourneau, and B. Plateau. Modelling Grid5000 point availability with SAN. Electronic Notes in Theoretical Computer Science (ENTCS), 232:165– 178, March 2009. [4] L. Brenner, P. Fernandes, B. Plateau, and I. Sbeity. PEPS 2007 - Stochastic Automata Networks Software Tool. In Proceedings of the Fourth International Conference on Quantitative Evaluation of Systems (QEST ’07), pages 163– 164. IEEE Computer Society, 2007. [5] L. Brenner, P. Fernandes, and A. Sales. The Need for and the Advantages of Generalized Tensor Algebra for Kronecker Structured Representations. International Journal of Simulation: Systems, Science & Technology, 6(3-4):52–60, February 2005. [6] R. Chanin, M. Corrˆea, P. Fernandes, A. Sales, R. Scheer, and A. F. Zorzo. Analytical Modeling for Operating System Schedulers on NUMA Systems. Electronic Notes in Theoretical Computer Science (ENTCS), 151(3):131–149, 2006. [7] R. M. Czekster, P. Fernandes, A. Sales, D. Taschetto, and T. Webber. Simulation of Markovian models using Bootstrap method. In Proceedings of the 2010 Summer Simulation Multiconference, pages 564–569, July 2010. [8] R. M. Czekster, P. Fernandes, A. Sales, and T. Webber. Restructuring tensor products to enhance the numerical solution of structured Markov chains (accepted). In Proceedings of the 6th International Conference on the Numerical Solution of Markov Chains (NSMC ’10), pages 1–4, 2010. [9] R. M. Czekster, P. Fernandes, J.-M. Vincent, and T. Webber. Split: a flexible and efficient algorithm to vector-descriptor

[13] [14] [15] [16]

[17]

[18]

[19]

[20]

[21] [22]

[23] [24]

[25]

product. In Proceedings of the 2nd International Conference on Performance Evaluation Methodologies and Tools (ValueTools 2007), volume 321 of ACM International Conference Proceeding Series, pages 1–8, 2007. F. L. Dotti, P. Fernandes, A. Sales, and O. M. Santos. Modular Analytical Performance Models for Ad Hoc Wireless Networks. In Proceedings of the 3rd International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt 2005), pages 164–173. IEEE Computer Society, 2005. P. Fernandes, B. Plateau, and W. J. Stewart. Efficient descriptor-vector multiplication in stochastic automata networks. Journal of the ACM (JACM), 45(3):381–414, May 1998. P. Fernandes, J. M. Vincent, and T. Webber. Perfect Simulation of Stochastic Automata Networks. In International Conference on Analytical and Stochastic Modelling Techniques and Applications (ASMTA’08), volume 5055 of LNCS, pages 249–263, 2008. O. H¨aggstr¨om. Finite Markov Chains and Algorithmic Applications. Cambridge University Press, 2002. J. Hillston. A compositional approach to performance modelling. Cambridge University Press, USA, 1996. A. M. Law and D. W. Kelton. Simulation Modeling and Analysis. McGraw-Hill Higher Education, 2000. B. F. J. Manly. Randomization, Bootstrap and Monte Carlo Methods in Biology. Chapman & Hall/CRC, second edition, 1997. R. Marculescu and A. Nandi. Probabilistic Application Modeling for System-Level Performance Analysis. In Design Automation & Test in Europe (DATE), pages 572–579, March 2001. D. A. Menasce and V. A. F. Almeida. Capacity planning for web services: metrics, models, and methods. Prentice Hall, 2002. B. Plateau. On the stochastic structure of parallelism and synchronization models for distributed algorithms. In Proceedings of the 1985 ACM SIGMETRICS Conference on Measurements and Modeling of Computer Systems, pages 147–154. ACM, 1985. J. G. Propp and D. B. Wilson. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures and Algorithms, 9(1-2):223–252, 1996. S. M. Ross. Simulation. Academic Press, Inc., Orlando, FL, USA, 2002. A. Sales and B. Plateau. Reachable state space generation for structured models which use functional transitions. In International Conference on the Quantitative Evaluation of Systems (QEST’09), pages 269–278, September. W. J. Stewart. Probability, Markov Chains, Queues, and Simulation. Princeton University Press, USA, 2009. J. M. Vincent. Perfect Simulation of Queueing Networks with Blocking and Rejection. In Proceedings of the IEEE/IPSJ International Symposium on Applications and the Internet Workshops (SAINT 2005 Workshops), pages 268–271. IEEE Computer Society, February 2005. B. Wilkinson and M. Allen. Parallel Programming: techniques and applications using networked workstations and parallel computers. Prentice-Hall, 1999.