Performance Issues for Parallel Implementations of ...

Viewer
Transcript

Performance Issues for Parallel Implementations of Bootstrap Simulation Algorithm 22nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2010)

Ricardo M. Czekster, Paulo Fernandes, Afonso Sales and Thais Webber Pontif´ıcia Universidade Cat´olica do Rio Grande do Sul (PUCRS) PaleoProspec Project - PUCRS/Petrobras Funded also by CAPES and CNPq - Brazil

Context Interest The solution of complex and large state-based stochastic models to extract performance indices.

Solution Numerical (iterative methods) ◮ ◮ ◮

Power method [Stewart 94] Arnoldi [Arnoldi 51] GMRES [Saad and Schultz 86]

Simulation ◮ ◮ ◮ ◮

Traditional [Ross 96] Monte Carlo [H¨aggstr¨om 02] Backward [Propp and Wilson 96] Bootstrap [Czekster et al. 10] ⋆ ⋆

reliable estimations high computational cost to generate repeated batches of samples

Context

Markovian simulation Generation of independent samples (parallel execution) Parallel sampling (e.g., master-worker approach) Possible sequence of states using the transition matrix ◮ ◮ ◮

random walk or simulation trajectory huge size → huge memory cost Stochastic Automata Networks (structured formalism, underlying Markov Chain)

Objective

Goal It is to present a parallel implementation of Bootstrap simulation, focusing on the overall technique performance by presenting a method to generate large amount of samples in less time.

Discussion processing x communication times model size x amount of generated samples

Outline

Stochastic Automata Networks (SAN) Bootstrap simulation Parallelization Experiments and results Conclusion and future works

Stochastic Automata Networks (SAN) • It allows the description of a large system in a structured manner by its parts (automata) SAN model

Underlying Continuous-Time Markov Chain

A

B

C

0

0

0

l1

s1 1

s2

s1

s2

1

s1 1

Type Event Rate Type Event Rate Type Event Rate syn s1 α syn s2 β loc l1 f f = [(B == 0) && (C == 0)] × γ

000

α

γ

111

β 100

Stochastic Automata Networks (SAN) • It allows the description of a large system in a structured manner by its parts (automata) SAN model

Underlying Continuous-Time Markov Chain

A

B

C

0

0

0

l1

s1 1

s2

s1

s2

1

s1 1

Type Event Rate Type Event Rate Type Event Rate syn s1 α syn s2 β loc l1 f f = [(B == 0) && (C == 0)] × γ

α

0

γ

1

β 2

Bootstrap

Method It is a well known statistical method applied to many fields to improve accuracy when performing sample estimations for complex distributions.

In the simulation context Bootstrap simulation provides more reliable estimations than the traditional simulation [SCSC’10].

Main feature Generation of repeated batches of samples that helps to improve the method accuracy.

Traditional simulation States

Transition Matrix

2

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

1

0 Time

n = trajectory length each visited state = sample

0

π′

π0′

π1′

π2′

0

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45 Initial state

0 Time

n = trajectory length each visited state = sample

0

π′

π0′

π1′

π2′

0

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45 Initial state

0 Time

n = trajectory length each visited state = sample

0 1 U = 0.08 π′

π0′

π1′

π2′

0

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45 Initial state

0

0 Time

n = trajectory length each visited state = sample

0 1 U = 0.08 π′

π0′

π1′

π2′

0

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45 Initial state

0

0 Time

n = trajectory length each visited state = sample

0 1 U = 0.08 π′

π0′

π1′

π2′

1

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45 Initial state

0

0 Time 0

each visited state = sample

1 2 U = 0.87 π′

n = trajectory length

π0′

π1′

π2′

1

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

Initial state

0

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 Time 0

each visited state = sample

1 2 U = 0.87 π′

n = trajectory length

π0′

π1′

π2′

1

0

0

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

Initial state

0

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 Time 0

each visited state = sample

1 2 U = 0.87 π′

n = trajectory length

π0′

π1′

π2′

1

0

1

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

Initial state

0

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 Time 0

1

each visited state = sample

2 3 U = 0.32 π′

n = trajectory length

π0′

π1′

π2′

1

0

1

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

1 Initial state

0

0 Time 0

1

each visited state = sample

2 3 U = 0.32 π′

n = trajectory length

π0′

π1′

π2′

1

0

1

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

1 Initial state

0

0 Time 0

1

each visited state = sample

2 3 U = 0.32 π′

n = trajectory length

π0′

π1′

π2′

1

1

1

mean permanence ′ probability π = πn

Traditional simulation States

Transition Matrix

2

1 Initial state

0

0 1 2 0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

...

0 Time 0

1

3

2

π′

...

π0′

π1′

π2′

1

1

1

n

n = trajectory length each visited state = sample mean permanence ′ probability π = πn

Bootstrap simulation States

Transition Matrix

0 Initial state

0 Time

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

Transition Matrix

0 Initial state

0

0 Time

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 1 U = 0.08 K1

0 1 2

K2

0 1 2

Kz

0 ... 1 2

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

Transition Matrix

0 Initial state

0

0 Time

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 1 U = 0.08 K1

0 1 2

K2

0 1 2

Kz

0 ... 1 2

For each bootstrap, it is performed n ¯ trials to execute the resamplings

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

Transition Matrix

0 Initial state

0

0 Time

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 1 U = 0.08 K1

0 1 2

K2

0 1 2

Kz

0 ... 1 2

For each bootstrap, it is performed n ¯ trials to execute the resamplings

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

Transition Matrix

0 Initial state

0

0 Time

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 1 U = 0.08 K1

0 1 2

K2

0 1 2

Kz

0 ... 1 2

For each bootstrap, it is performed n ¯ trials to execute the resamplings

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

2

Transition Matrix

0 Initial state

0

0 Time 0

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

1 2 U = 0.87

K1

0 1 2

1

K2

0 1 2

Kz

0 ... 1 2

For each bootstrap, it is performed n ¯ trials to execute the resamplings

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

2

Transition Matrix

0

1 Initial state

0

0 Time 0

1

K1

0 1 2

K2

0 1 2

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

2 3 U = 0.32 Kz

0 ... 1 2

For each bootstrap, it is performed n ¯ trials to execute the resamplings

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

2

Transition Matrix 1 Initial state

0

...

0

0 Time 0

1

K1

0 1 2

K2

0 1 2

2 Kz

0 ... 1 2

3

...

1

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

n

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

2

Transition Matrix 1 Initial state

0

...

0

Time 1

K1

0 1 2

K2

0 1 2

2

3

...

n x¯1

Kz

0 ... 1 2

normalize

0 1 2

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 0

1

x¯2

0 1 2

x¯z

0 ... 1 2

n = trajectory length K: bootstrap z: number of bootstraps mean permanence Pz x ¯i probability π = i=1 z

Bootstrap simulation States

2

Transition Matrix 1 Initial state

0

...

0

Time 1

K1

0 1 2

2

K2

0 1 2

3

...

n x¯1

Kz

0 ... 1 2

normalize

2

0 0.10 0.65 0.25 1 0.25 0.55 0.20 2 0.30 0.25 0.45

0 0

1

0 1 2

x¯2

x¯z

0 ... 1 2

0 1 2

n = trajectory length K: bootstrap z: number of bootstraps

π x¯1[0] + x¯2[0] + ··· + x¯z [0] z x¯1[1] + x¯2[1] + ··· + x¯z [1] z

= π0

x¯1[2] + x¯2[2] + ··· + x¯z [2] z

= π2

= π1

mean permanence Pz x ¯i probability π = i=1 z

Parallelization

Approach Split the bootstrap sampling tasks over the processing nodes Each node performs the full trajectory simulation but produces a different set of samples Master-worker pattern Implementation: C++ language and MPI primitives Executed on a cluster with 8 Dell PowerEdge R610 connected in a Gigabit Ethernet network

Experiments Number of bootstraps assigned to nodes in each configuration configuration 1 2 3 4 5 6 7 8

number of bootstraps 36 18 12 9 7 6 5 4

18 12 9 7 6 5 4

12 9 7 6 5 4

9 7 6 5 4

8 6 5 5

6 5 5

6 5

5

Models (examples) ASP - Alternate Service Patterns: describes an Open Queueing Network with servers that map P different service patterns. FAS - First Available Server: indicates the availability of N servers. RS - Resource Sharing: maps R shared resources to P processes.

Results Large models (million of states) n = 106 20

n = 107 100

Proc. Comm.

Time (s)

15 Time (s)

Proc. Comm.

80

10

5

60

40

20

0

0 1 2 3 4 5 6 7 8 ASP

1 2 3 4 5 6 7 8 FAS Number of nodes

1 2 3 4 5 6 7 8 RS

1 2 3 4 5 6 7 8 ASP

n = 108

1 2 3 4 5 6 7 8 RS

n = 109 10000

Proc. Comm.

800

8000

600

6000

Time (s)

Time (s)

1000

1 2 3 4 5 6 7 8 FAS Number of nodes

400

200

Proc. Comm.

4000

2000

0

0 1 2 3 4 5 6 7 8 ASP

1 2 3 4 5 6 7 8 FAS Number of nodes

1 2 3 4 5 6 7 8 RS

1 2 3 4 5 6 7 8 ASP

1 23 4 5 6 7 8 FAS Number of nodes

1 23 4 5 6 7 8 RS

Results Small models (hundred of states) n = 106

n = 107 100

Proc. Comm.

8

80

6

60

Time (s)

Time (s)

10

4

2

Proc. Comm.

40

20

0

0 1 2 3 4 5 6 7 8 ASP

1 2 3 4 5 6 7 8 FAS Number of nodes

1 2 3 4 5 6 7 8 RS

1 2 3 4 5 6 7 8 ASP

n = 108

1 2 3 4 5 6 7 8 RS

n = 109 10000

Proc. Comm.

800

8000

600

6000

Time (s)

Time (s)

1000

1 2 3 4 5 6 7 8 FAS Number of nodes

400

200

Proc. Comm.

4000

2000

0

0 1 2 3 4 5 6 7 8 ASP

1 2 3 4 5 6 7 8 FAS Number of nodes

1 2 3 4 5 6 7 8 RS

1 2 3 4 5 6 7 8 ASP

1 23 4 5 6 7 8 FAS Number of nodes

1 23 4 5 6 7 8 RS

Conclusion and future works

Summary An efficient implementation of a novel simulation algorithm Considerable speedup for very large models ◮

specially for long trajectories

The speedup was consistent with different SAN models ◮

nearly 5 times speedup for 8 nodes

The processing demands depend only on the simulation trajectory length (n) The communication demands depend only on the reachable state space size of the model

Conclusion and future works

Future works Study of bootstrap distribution over non-uniform memory architectures ◮

some levels of shared memory could be highly beneficial to cope with high communication (short trajectories for large models)

Blending methods ◮

combination of parallel Bootstrap approach with more sophisticated simulation approaches (e.g., Perfect Simulation)

Thank you for your attention.

Performance Evaluation of Parallel Opportunistic ...