Performance Issues for Parallel Implementations of Bootstrap Simulation Algorithm 22nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2010)

Ricardo M. Czekster, Paulo Fernandes, Afonso Sales and Thais Webber Pontif´ıcia Universidade Cat´olica do Rio Grande do Sul (PUCRS) PaleoProspec Project - PUCRS/Petrobras Funded also by CAPES and CNPq - Brazil

Context Interest The solution of complex and large state-based stochastic models to extract performance indices.

Solution Numerical (iterative methods) ◮ ◮ ◮

Power method [Stewart 94] Arnoldi [Arnoldi 51] GMRES [Saad and Schultz 86]

Simulation ◮ ◮ ◮ ◮

Traditional [Ross 96] Monte Carlo [H¨aggstr¨om 02] Backward [Propp and Wilson 96] Bootstrap [Czekster et al. 10] ⋆ ⋆

reliable estimations high computational cost to generate repeated batches of samples

Context

Markovian simulation Generation of independent samples (parallel execution) Parallel sampling (e.g., master-worker approach) Possible sequence of states using the transition matrix ◮ ◮ ◮

random walk or simulation trajectory huge size → huge memory cost Stochastic Automata Networks (structured formalism, underlying Markov Chain)

Objective

Goal It is to present a parallel implementation of Bootstrap simulation, focusing on the overall technique performance by presenting a method to generate large amount of samples in less time.

Discussion processing x communication times model size x amount of generated samples

Outline

Stochastic Automata Networks (SAN) Bootstrap simulation Parallelization Experiments and results Conclusion and future works

Stochastic Automata Networks (SAN) • It allows the description of a large system in a structured manner by its parts (automata) SAN model

Underlying Continuous-Time Markov Chain

A

B

C

0

0

0

l1

s1 1

s2

s1

s2

1

s1 1

Type Event Rate Type Event Rate Type Event Rate syn s1 α syn s2 β loc l1 f f = [(B == 0) && (C == 0)] × γ

000

α

γ

111

β 100

Stochastic Automata Networks (SAN) • It allows the description of a large system in a structured manner by its parts (automata) SAN model

Underlying Continuous-Time Markov Chain

A

B

C

0

0

0

l1

s1 1

s2

s1

s2

1

s1 1

Type Event Rate Type Event Rate Type Event Rate syn s1 α syn s2 β loc l1 f f = [(B == 0) && (C == 0)] × γ

α

0

γ

1

β 2

Bootstrap

Method It is a well known statistical method applied to many fields to improve accuracy when performing sample estimations for complex distributions.

In the simulation context Bootstrap simulation provides more reliable estimations than the traditional simulation [SCSC’10].

Main feature Generation of repeated batches of samples that helps to improve the method accuracy.

Traditional simulation States

Transition Matrix

2

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

1

0 Time

n = trajectory length each visited state = sample

0

π′

π0′

π1′

π2′

0

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45 Initial state

0 Time

n = trajectory length each visited state = sample

0

π′

π0′

π1′

π2′

0

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45 Initial state

0 Time

n = trajectory length each visited state = sample

0 1 U = 0.08 π′

π0′

π1′

π2′

0

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45 Initial state

0

0 Time

n = trajectory length each visited state = sample

0 1 U = 0.08 π′

π0′

π1′

π2′

0

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45 Initial state

0

0 Time

n = trajectory length each visited state = sample

0 1 U = 0.08 π′

π0′

π1′

π2′

1

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45 Initial state

0

0 Time 0

each visited state = sample

1 2 U = 0.87 π′

n = trajectory length

π0′

π1′

π2′

1

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

Initial state

0

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 Time 0

each visited state = sample

1 2 U = 0.87 π′

n = trajectory length

π0′

π1′

π2′

1

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

Initial state

0

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 Time 0

each visited state = sample

1 2 U = 0.87 π′

n = trajectory length

π0′

π1′

π2′

1

0

1

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

Initial state

0

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 Time 0

1

each visited state = sample

2 3 U = 0.32 π′

n = trajectory length

π0′

π1′

π2′

1

0

1

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

1 Initial state

0

0 Time 0

1

each visited state = sample

2 3 U = 0.32 π′

n = trajectory length

π0′

π1′

π2′

1

0

1

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

1 Initial state

0

0 Time 0

1

each visited state = sample

2 3 U = 0.32 π′

n = trajectory length

π0′

π1′

π2′

1

1

1

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

1 Initial state

0

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

...

0 Time 0

1

3

2

π′

...

π0′

π1′

π2′

1

1

1

n

n = trajectory length each visited state = sample mean permanence ′ probability π = πn

Bootstrap simulation States

Transition Matrix

0 Initial state

0 Time

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

Transition Matrix

0 Initial state

0

0 Time

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 1 U = 0.08 K1

0 1 2

K2

0 1 2

Kz

0 ... 1 2

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

Transition Matrix

0 Initial state

0

0 Time

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 1 U = 0.08 K1

0 1 2

K2

0 1 2

Kz

0 ... 1 2

For each bootstrap, it is performed n ¯ trials to execute the resamplings

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

Transition Matrix

0 Initial state

0

0 Time

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 1 U = 0.08 K1

0 1 2

K2

0 1 2

Kz

0 ... 1 2

For each bootstrap, it is performed n ¯ trials to execute the resamplings

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

Transition Matrix

0 Initial state

0

0 Time

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 1 U = 0.08 K1

0 1 2

K2

0 1 2

Kz

0 ... 1 2

For each bootstrap, it is performed n ¯ trials to execute the resamplings

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

2

Transition Matrix

0 Initial state

0

0 Time 0

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

1 2 U = 0.87

K1

0 1 2

1

K2

0 1 2

Kz

0 ... 1 2

For each bootstrap, it is performed n ¯ trials to execute the resamplings

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

2

Transition Matrix

0

1 Initial state

0

0 Time 0

1

K1

0 1 2

K2

0 1 2

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

2 3 U = 0.32 Kz

0 ... 1 2

For each bootstrap, it is performed n ¯ trials to execute the resamplings

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

2

Transition Matrix 1 Initial state

0

...

0

0 Time 0

1

K1

0 1 2

K2

0 1 2

2 Kz

0 ... 1 2

3

...

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

n

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

2

Transition Matrix 1 Initial state

0

...

0

Time 1

K1

0 1 2

K2

0 1 2

2

3

...

n x¯1

Kz

0 ... 1 2

normalize

0 1 2

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 0

1

x¯2

0 1 2

x¯z

0 ... 1 2

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

2

Transition Matrix 1 Initial state

0

...

0

Time 1

K1

0 1 2

2

K2

0 1 2

3

...

n x¯1

Kz

0 ... 1 2

normalize

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 0

1

0 1 2

x¯2

x¯z

0 ... 1 2

0 1 2

n = trajectory length K: bootstrap z: number of bootstraps

π x¯1[0] + x¯2[0] + ··· + x¯z [0] z x¯1[1] + x¯2[1] + ··· + x¯z [1] z

= π0

x¯1[2] + x¯2[2] + ··· + x¯z [2] z

= π2

= π1

mean permanence Pz x ¯i probability π = i=1 z

Parallelization

Approach Split the bootstrap sampling tasks over the processing nodes Each node performs the full trajectory simulation but produces a different set of samples Master-worker pattern Implementation: C++ language and MPI primitives Executed on a cluster with 8 Dell PowerEdge R610 connected in a Gigabit Ethernet network

Experiments Number of bootstraps assigned to nodes in each configuration configuration 1 2 3 4 5 6 7 8

number of bootstraps 36 18 12 9 7 6 5 4

18 12 9 7 6 5 4

12 9 7 6 5 4

9 7 6 5 4

8 6 5 5

6 5 5

6 5

5

Models (examples) ASP - Alternate Service Patterns: describes an Open Queueing Network with servers that map P different service patterns. FAS - First Available Server: indicates the availability of N servers. RS - Resource Sharing: maps R shared resources to P processes.

Results Large models (million of states) n = 106 20

n = 107 100

Proc. Comm.

Time (s)

15 Time (s)

Proc. Comm.

80

10

5

60

40

20

0

0 1 2 3 4 5 6 7 8 ASP

1 2 3 4 5 6 7 8 FAS Number of nodes

1 2 3 4 5 6 7 8 RS

1 2 3 4 5 6 7 8 ASP

n = 108

1 2 3 4 5 6 7 8 RS

n = 109 10000

Proc. Comm.

800

8000

600

6000

Time (s)

Time (s)

1000

1 2 3 4 5 6 7 8 FAS Number of nodes

400

200

Proc. Comm.

4000

2000

0

0 1 2 3 4 5 6 7 8 ASP

1 2 3 4 5 6 7 8 FAS Number of nodes

1 2 3 4 5 6 7 8 RS

1 2 3 4 5 6 7 8 ASP

1 23 4 5 6 7 8 FAS Number of nodes

1 23 4 5 6 7 8 RS

Results Small models (hundred of states) n = 106

n = 107 100

Proc. Comm.

8

80

6

60

Time (s)

Time (s)

10

4

2

Proc. Comm.

40

20

0

0 1 2 3 4 5 6 7 8 ASP

1 2 3 4 5 6 7 8 FAS Number of nodes

1 2 3 4 5 6 7 8 RS

1 2 3 4 5 6 7 8 ASP

n = 108

1 2 3 4 5 6 7 8 RS

n = 109 10000

Proc. Comm.

800

8000

600

6000

Time (s)

Time (s)

1000

1 2 3 4 5 6 7 8 FAS Number of nodes

400

200

Proc. Comm.

4000

2000

0

0 1 2 3 4 5 6 7 8 ASP

1 2 3 4 5 6 7 8 FAS Number of nodes

1 2 3 4 5 6 7 8 RS

1 2 3 4 5 6 7 8 ASP

1 23 4 5 6 7 8 FAS Number of nodes

1 23 4 5 6 7 8 RS

Conclusion and future works

Summary An efficient implementation of a novel simulation algorithm Considerable speedup for very large models ◮

specially for long trajectories

The speedup was consistent with different SAN models ◮

nearly 5 times speedup for 8 nodes

The processing demands depend only on the simulation trajectory length (n) The communication demands depend only on the reachable state space size of the model

Conclusion and future works

Future works Study of bootstrap distribution over non-uniform memory architectures ◮

some levels of shared memory could be highly beneficial to cope with high communication (short trajectories for large models)

Blending methods ◮

combination of parallel Bootstrap approach with more sophisticated simulation approaches (e.g., Perfect Simulation)

Thank you for your attention.

Performance Issues for Parallel Implementations of ...

Performance Issues for Parallel Implementations of Bootstrap Simulation Algorithm. 22nd International Symposium on Computer Architecture and High ...

657KB Sizes 0 Downloads 247 Views

Recommend Documents

Performance Evaluation of Parallel Opportunistic ...
Department of Computer Science and Engineering, Dankook University, 152 ... Second, computer simulations are performed to verify the performance of the ...

Performance Evaluation of Parallel Opportunistic Multihop ... - CiteSeerX
of the IEEE International Conference on Communications, Seattle,. WA, pp. 331-335 ... From August 2008 to April 2009, he was with Lumicomm Inc.,. Daejeon ...

Performance models for master/slave parallel programs
visualization tools [23,13,22] offer important information about performance of an existing parallel implementation, but if you have a trace of execution much.

Scalable Performance of the Panasas Parallel File System
caching, file locking services, and internal cluster management to provide a scalable, fault tolerant, high performance distributed file system. The clustered.

Performance of Parallel Prefix Circuit Transition ... - Semantic Scholar
can be the most time-consuming step in jitter measurement because ..... We call the FSMs described in Figures 1 .... IEEE AUTOTEST Conference, 2013, pp. 1–6 ...

Performance of Parallel Prefix Circuit Transition ... - Semantic Scholar
analysis of jitter continues to be fundamental to the design, test, commissioning, and ... programmable gate array), and custom hardware; flexible ap- proaches are .... parallel methods for repeated application of that operation. The methods ...

Design and performance evaluation of a parallel ...
The host eacecutes the sequential part of the BM proce- dure and drives the ... provide good quality solutions, but have the disadvan- tage of a very high ...

Performance Issues and Optimizations for Block-level ...
Institute of Computer Science (ICS), Foundation for Research and ... KEYWORDS: block-level I/O; I/O performance optimization; RDMA; commodity servers.

Performance Issues and Optimizations for Block-level ...
Computer Architecture & VLSI Laboratory (CARV). Institute of Computer Science (ICS). Performance Issues and Optimizations for. Block-level Network Storage.

Implementation and Performance Evaluation Issues of Privacy Policies ...
In this paper we study about social network theory and privacy challenges which affects a secure range of ... In recent years online social networking has moved from niche phenomenon to mass adoption. The rapid .... OSN users are leveraged by governm

Implementation and Performance Evaluation Issues of Privacy Policies ...
In this paper we study about social network theory and privacy challenges which affects ... applications, such as recommender systems, email filtering, defending ...

Efficient implementations of predictive control
(ADERSA, Fr), Kevin R. Hilton (CSE Controls, UK), Luiping Wang (RMIT, .... functions in optimal predictive control (OMPC) to change the optimisation problem ...

Gaussian Particle Implementations of Probability ...
Ba Tuong Vo. Ba-Ngu Vo. Department of ... The University of Western Australia ...... gineering) degrees with first class hon- .... Orlando, Florida [6235-29],. 2006.

Challenges for Large-Scale Internet Voting Implementations - Kyle ...
Challenges for Large-Scale Internet Voting Implementations - Kyle Dhillon.pdf. Challenges for Large-Scale Internet Voting Implementations - Kyle Dhillon.pdf.

Low-cost haptic mouse implementations
Jun 18, 2004 - Actuator For Teleoperator Robot Control,” Bachelor of Sci ence Thesis, MIT, May ... lssues in Force Display,” Computer Science Dept. Univer-.