Sub-Consensus hierarchy conjecture is false

Viewer
Transcript

Sub-Consensus hierarchy conjecture is false (for symmetric, participation-aware tasks)

Piotr Zieli´ nski [email protected] Google November 11, 2009

Abstract Each query to failure detector ¬Ωk outputs n − k processes; at least one correct process is eventually never output. The “folklore” sub-Consensus hierarchy conjecture states states that any task not solvable with ¬Ωk requires ¬Ωk−1 . This paper disproves this conjecture for symmetric, participation-aware tasks. Consider any sequence (ki ) = k1 , . . . , kn with ki ≤ ki+1 ≤ ki + 1. The agreement task T(ki ) decides on at most ki proposals, where i is the number of participating processes. Detector D(ki ) consists of sub-detectors D1 , . . . , Dn , such that at least one of D1 , . . . , Dkc behaves like Ω, where c is the number of correct processes. This paper shows that detector D(ki ) can implement task T(ki0 ) iff ki ≤ ki0 for all i. As a result, no two different D(ki ) ’s are equivalent. Moreover, D(ki ) is the weakest failure detector for T(ki ) . The number of D(ki ) detectors (2n−1 ) exceeds the number of ¬Ωi detectors (n), violating the sub-Consensus conjecture.

1

Introduction

Consider an n-process asynchronous shared memory system equipped with the k-anti-Ω (¬Ωk ) failure detector. Each query to ¬Ωk outputs n − k processes, in such a way that at least one correct process is eventually never output [7]. The “folklore” sub-Consensus hierarchy conjecture states that any task not solvable with ¬Ωk requires ¬Ωk−1 [7]. The case k = n is true: any non-wait-free-implementable task requires ¬Ωn−1 [13]. Previous work showed that, if we allow non-symmetric tasks, the conjecture is false (Theorem 1.1). Gafni and Kuznetsov [6] showed that the conjecture holds for tasks that are symmetric, participation-oblivious, and require only weak termination. This paper shows that the conjecture is false for general symmetric, participation-aware tasks (Theorem 1.2). Theorem 1.1. The “folklore” sub-Consensus conjecture is false for non-symmetric tasks. Proof. Adapted from [9]. Consider a 3-process system {p1 , p2 , p3 }. Let task T12 require processes p1 and p2 to agree; p3 can decide on anything. Consider detector Ω12 such that if at least one p ∈ {p1 , p2 } is correct, it eventually outputs the same correct p ∈ {p1 , p2 } forever; otherwise Ω12 can behave arbitrarily. To disprove the conjecture, we need to show that (i) ¬Ω2 cannot implement Ω12 , (ii) Ω12 can implement T12 , and (iii) Ω12 cannot implement ¬Ω1 .

1

(i) If we kill p3 , detector Ω12 becomes a 2-process Ω, which is not implementable. On the other hand, ¬Ω2 is now implementable by always outputting p3 . (ii) T12 can be implemented with Ω12 by using any Ω-based Consensus protocol to reach agreement between p1 and p2 . (iii) If we kill p1 , then Ω12 becomes implementable by always outputting p2 . On the other hand, ¬Ω1 = Ω is still not implementable in a 2-process system {p2 , p3 }. Theorem 1.2. The “folklore” sub-Consensus conjecture is false even for symmetric tasks. Proof. Consider a 3-process system {p1 , p2 , p3 }. Consider an agreement task T that guarantees that (i) there are no more than two different decisions, and (ii) if at most two processes participate, all processes decide on the same value. In other words, T is set agreement that becomes Consensus if some process does not participate. Consider a detector D that consists of two sub-detectors, D1 and D2 , such that (i) at least one of D1 , D2 behaves as Ω, and (ii) if some process is faulty, then D1 behaves like Ω. It is sufficient to show that (i) ¬Ω2 cannot implement D, (ii) D can implement T , and (iii) D cannot implement ¬Ω1 . (i) Kill p1 . Detector ¬Ω2 is now trivially implementable by always outputting p1 . On the other hand, D corresponds to an Ω detector for {p2 , p3 }, which is not wait-free implementable. (ii) Consider two Ω-based instances of Consensus: C1 that uses D1 , and C2 that uses D2 . Processes propose they input to both C1 and C2 , except that nothing is proposed to C2 until all processes start to participate. Processes adopt the decision of any of C1 or C2 . Termination. If all processes are correct, then all participate, and propose to both C1 and C2 . At least one of D1 , D2 behaves like Ω, so at least one of C1 , C2 will decide. If some process is faulty, D1 behaves like Ω, so C1 will eventually decide, because processes always propose to C1 . Agreement. Since instances C1 and C2 each decides on at most one value, there are no more than two decisions. If not all processes participate, nothing is proposed to C2 , so only C1 can decide: at most one decision. (iii) Assume D can implement ¬Ω1 = Ω. Consider a run r with all three processes correct, and D2 always outputting p1 . The emulated Ω must reach a state, say at time t, in which it will always output some fixed process p, regardless of what will happen after time t. Now consider run r0 which is identical to r, except that p fails at some point after time t, and D1 eventually stabilizes on some correct process q 6= p. However, Ω will keep outputting the faulty process p. This contradiction proves that D cannot implement Ω.

2

k1

k2

k3

task T(ki )

detector D(ki )

1 1 1 1

2 2 1 1

3 2 2 1

trivial (output your proposal) set agreement T in Theorem 1.2 Consensus

trivial (Section 3) anti-omega (¬Ω1 ) D in Theorem 1.2 omega (Ω)

Figure 1: Examples of 3-process T(ki ) and D(ki ) for various sequences (ki ) = k1 , k2 , k3 .

1.1

Roadmap

Theorem 1.2 showed that the sub-Consensus conjecture is false in a three-process system, using task T and detector D. Section 2 introduces generalized n-process versions of these: def tasks T(ki ) and detectors D(ki ) , where (ki ) = k1 , . . . , kn is an increasing integer sequence. Section 3 shows that D(ki ) can implement T(ki0 ) iff ki ≤ ki0 for all i, and uses this fact to falsify the conjecture. Section 5 proves that D(ki ) is the weakest failure detector for T(ki ) . Section 4 justifies the assumption ki ≤ ki+1 ≤ ki + 1.

2

Task T(ki ) and detector D(ki ) def

Consider an n-process system. Consider any non-decreasing integer sequence (ki ) = k1 , . . . , kn . Let task T(ki ) be an agreement task such that if exactly i processes participate, then there can be at most ki decisions. Similarly, let detector D(ki ) consist of n subdetectors D1 , . . . , Dn . If c processes are correct, at least one of D1 , . . . , Dkc behaves like Ω, that is, eventually keeps outputting the same correct process forever. Figure 1 shows examples of tasks T(ki ) and detectors D(ki ) for various sequences k1 , k2 , k3 . In general, the smaller the sequence (element-wise), the stronger the corresponding task or detector is. The weakest case is 123, which corresponds to the trivial task and detector. The strongest is 111, which corresponds to Consensus and Ω [3]. In the middle, 112, corresponds to T and D from the proof of Theorem 1.2. Many possible sequences k1 , k2 , k3 are not included in Figure 1, for example, 134 or 233. This is because this paper considers only sequences k1 , . . . , kn with the following one-jump property: Definition 2.1. k1 , . . . , kn is one-jump iff k1 = 1 and ki ≤ ki+1 ≤ ki + 1 for all i. The detailed justification of this assumption is postponed to Section 4. In brief, every task T(ki ) is equivalent to some T(ki0 ) with a one-jump (ki0 ). The same holds for detectors D(ki ) . Note that k-set agreement [5] and k-vector-Ω [13] correspond to T(ki ) and D(ki ) with ki = min {i, k}.

3

Implementability of T(ki ) and D(ki )

For any two sequences (ki ) and (ki0 ), let (ki ) (ki0 ) mean that ki ≤ ki0 for all i. This section will show that detector D(ki ) can implement task T(ki0 ) if and only if (ki ) (ki0 ). Theorem 3.1. D(ki ) can implement T(ki0 ) if (ki ) (ki0 ).

3

1 2 3 4 5 6 7 8 9 10

C1 , . . . , Ckn are Ω-based Consensus instances using detectors D1 , . . . , Dkn participates ← [f alse, . . . , f alse] function propose(v) is participates[p] ← true start an asynchronous task: for i = 1, . . . , n do wait until participates has at least i true elements propose v to Cki0 if not proposed yet wait until any C1 , . . . , Ckn decides, say on v 0 return v 0 Figure 2: Using detector D(ki ) to implement task T(ki ) , code for process p

1 2 3 4 5 6

steps ← [0, 0, . . . , 0] when process p takes a step do increment steps[p] let q1 , . . . , qn be the permutation of 1, . . . , n ordered wrt decreasing steps[i] (ties broken deterministically) rank[qi ] ← i for i = 1, . . . , n

{ shared }

{ local }

Figure 3: Auxiliary variables steps and rank, used throughout the paper. Proof. Figure 2 presents the algorithm. For each i, it uses a Consensus instance Ci based on sub-detector Di of D, executed in parallel. Each process p proposes its value to instance Cki0 as soon as there are at least i participants. Decision of any Ci is adopted as the final decision. To simplify the proof, let us count a process as participating only after completing line 4. Let i be the number of participating processes. First, only instances C1 , . . . , Cki0 will be proposed to, which means at most ki0 different decisions (Agreement). On the other hand, all correct processes propose to all instances C1 , . . . , Cki0 . Each correct process participates, so there are at most i correct processes. This means that at least one D1 , . . . , Dki ≤ki0 stabilizes, so at least one of C1 , . . . , Cki ≤ki0 decides (Termination). Before proving the complementary Theorem 3.2, Figure 3 introduces some auxiliary variables that are periodically recomputed at each process, and will be used for several detector transformations in this paper. Local variable rank[p] is the estimate of the speed rank of process p: eventually for all correct processes p have lower rank[p] than any failed processes. This is done by counting the number of steps made by processes, and assigning rank[p] = 1 to the process p that took most steps, rank[p0 ] = 2 to the runner up, and so on. Variable rank lets us easily implement D(ki ) for (ki ) = 1, 2, . . . , n in an asynchronous system by setting Di = min{ p | rank[p] ≤ i }. Each Di outputs the minimum id of the i most often stepping processes. In particular, Dc will eventually keep outputting the correct process with the minimum id (c is the number of correct processes).

4

Theorem 3.2. D(ki ) can implement T(ki0 ) only if (ki ) (ki0 ). Proof. To obtain a contradiction, assume that D(ki ) can implement T(ki0 ) , despite ki > ki0 for some i. Consider runs in which (i) processes pi+1 , . . . , pn never take steps, (ii) of processes p1 , . . . , pi , fewer than ki are faulty (more than i − ki are correct). This is equivalent to an i-process system with fewer than ki faulty processes. The number c of correct processes satisfies i − ki < c ≤ i, that is c = i − ki + k ∗ for some k ∗ with 1 ≤ k ∗ ≤ ki . We can implement D(ki ) by setting Dk = min{ p | rank[p] ≤ i − ki + k } for all k = 1, . . . , ki . In other words, for each k, we take the i − ki + k processes with the most steps so far, and output the one with the maximum index as Dk . Since exactly c = i − ki + k ∗ processes are correct, Dk∗ will stabilize. Moreover, the one-jump property of (ki ) implies kc ≥ ki − i + c, so kc ≥ ki − i + i − ki + k ∗ = k ∗ , which means that we have implemented D(ki ) . By our assumption, by implementing D(ki ) we have also implemented T(ki0 ) . Since at most i processes ever take steps (i), T(ki0 ) trivially implements ki0 -set agreement. In other words, we have implemented ki0 -set agreement in an i-process system with ki > ki0 possibly faulty processes, which is impossible [1, 8, 11]. This contradicts our assumption that D(ki ) can implement T(ki0 ) , and proves the assertion.

3.1

Consequences

Detector ∆ is weaker than ∆0 if ∆0 can implement ∆. Detector ∆ is task-weaker than ∆0 if any task implementable with ∆ is implementable with ∆0 . Preorder weaker, although natural, is too fine for practical detector classification: it has at least 654 equivalence classes for just 3 processes [12]. Its Hasse diagram does not exhibit a regular structure either [12]. Preorder task-weaker is coarser than weaker (weaker implies task-weaker but not vice versa, eg ♦P is task-weaker than Ω, but not weaker ). For this reason, task-weaker may seem to be a better candidate for detector classification, as it produces fewer equivalence classes. Indeed, the sub-Consensus hierarchy conjecture basically states that the equivalence classes of task-weaker form an order linear both in size and structure (each detector is task-equivalent to some ¬Ωk ). Theorem 3.3 shows that detectors D(ki ) provide a counterexample to this conjecture. Theorem 3.3. Detector D(ki0 ) is task-weaker than D(ki ) iff (ki0 ) (ki ). Proof. If (ki ) (ki0 ), then, by definition, any detector of class D(ki ) is also of class D(ki0 ) . As a result, any task implementable with D(ki0 ) is also implementable with D(ki ) . In other words, D(ki0 ) is task-weaker than D(ki ) , as required. If (ki ) (ki0 ), then, D(ki0 ) can implement T(ki0 ) (Theorem 3.1), but D(ki ) cannot (Theorem 3.2). This means that D(ki0 ) is not task-weaker than D(ki ) , as needed. As a result, D(ki0 ) and D(ki ) are task-equivalent iff (ki0 ) (ki ) (ki0 ), that is, iff (ki ) = (ki0 ). Therefore, each sequence (ki ) corresponds to its own equivalence class. Since, ki+1 ∈ {ki , ki + 1}, the number of such classes is 2n−1 . This number is significantly higher than n, predicted by the sub-Consensus conjecture. For example, as we have seen in Figure 1, for n = 3, the hierarchy consists of (at least) four classes (k1 , k2 , k3 ): (1, 2, 3) (1, 2, 2) (1, 1, 2) (1, 1, 1). 5

proposals ← [⊥, ..., ⊥] decision1 ← ⊥

1 2

function T(ki0 ) (v) at process p is proposals[p] ← v 1 (v) if decision1 6= ⊥ then decision1 ← T(k i) if count of proposals = i∗ + 1 then v ← min(proposals) else v ← decision1 2 (v) return T(k i)

3 4 5 6 7 8 9

Figure 4: Solving task T(ki0 ) using two instances of T(ki ) . For n = 4, the structure of the task-weaker preorder becomes non-linear (1, 2, 3, 4) (1, 2, 3, 3) (1, 2, 2, 3) X, Y (1, 1, 2, 2) (1, 1, 1, 2) (1, 1, 1, 1), where X = (1, 1, 2, 3),

Y = (1, 2, 2, 2),

and X Y and Y X. In other words, neither DX nor DY is task-weaker than the other; they are task-incomparable. This contradicts the sub-Consensus conjecture.

Justification of ki+1 ≤ ki + 1

4

The definitions of T(ki ) and D(ki ) work for arbitrary non-decreasing integer sequences k1 , . . . , kn . This section shows that, without loss of generality, we can assume the onejump property defined in Section 2: ki+1 ≤ ki + 1.

4.1

Task T(ki )

First, any implementation of T(ki ) satisfies ki ≤ i: there are no more decisions than proposals. In particular, k1 ≤ 1, which implies k1 = 1. Lemma 4.1. Consider (ki ) such that ki∗ +1 > ki∗ +1 for some i∗ . Algorithm in Figure 4 def def uses T(ki ) to implement T(ki0 ) with ki0 = ki for all i, except that ki0∗ +1 = ki∗ + 1 < ki∗ +1 . 1 2 . We Proof. The algorithm in Figure 4 uses two instances of T(ki ) , named T(k and T(k i) i) need to show that (i) when i processes participate, T(ki0 ) decides on at most ki values, and (ii) when exactly i∗ + 1 processes participate, T(ki0 ) decides on at most ki∗ + 1 values. 2 , becomes the decision of T 0 . Since the For (i), observe that the decision of T(k (ki ) i) former decides on at most ki values, so does the latter, which proves this part. For (ii), consider the case when exactly i∗ +1 processes participate. Let us investigate 2 how many different values v can be proposed to T(k in line 9: i)

• Case 1: v is assigned to in line 7. This means that all i∗ + 1 participating processes registered their proposals in proposals. Therefore, v can be assigned only one value here: the minimum proposal from all i∗ + 1 participating processes. 6

for p = p1 , . . . , pn do for k = 1, . . . , n do changed[p, k] ← 0

1 2 3

{ shared }

when Dk changes at process p do increment changed[p, k]

4 5

when process p takes a step do for k = 1, . . . , n do changed[k] ← changed[p1 , k] + · · · + changed[pn , k]

6 7 8

{ local }

Figure 5: Auxiliary variable changed, used in Section 4.2. • Case 2: v is assigned to in line 8. This means that, at the point before line 6, at most i∗ processes passed line 4, so at most ki∗ values have been written to decision1 . Since decision1 6= ⊥ (line 5), no other values will ever be written. Therefore, at most ki∗ different values can ever be written to decision1 , and subsequently to v in line 8. 2 As a result, at most ki∗ + 1 different proposals can be passed to T(k (one from line 7, i) 2 and ki∗ from line 8). As a result, T(ki ) , and therefore T(ki0 ) , can make at most ki∗ + 1 decisions, as required.

Theorem 4.2. Any task T(ki ) is equivalent to some T(ki0 ) such that (ki0 ) has the one-jump property. Proof. Let us construct a sequence of sequences (ki1 ), (ki2 ), (ki3 ), . . . in the following way. First, (ki1 ) = (ki ). If sequence (kij ) is one-jump, it is the last element. Otherwise, j j let i∗ be such that kij∗P 1, and let kij+1 = kij for all i, except that kij+1 ∗ +1 = ki∗ + 1. +1 > ki∗ + P By construction, i ki1 > i ki2 > . . ., so the sequence (ki1 ), (ki2 ), . . . , is finite. Let 0 (ki ) be the last element. By construction (ki0 ) is one-jump. By repeated application of Lemma 4.1, tasks T(k1 ) , T(k2 ) , . . . , T(ki0 ) are all equivalent, which proves the assertion. i

4.2

i

Detector D(ki )

This section shows that any D(ki ) is equivalent to some D(ki0 ) such that (ki0 ) is one-jump. Consider a detector D(ki ) . Figure 3 defined an auxiliary local variable rank[p] which estimates the “speed rank” of each process p. Figure 5 defines an auxiliary local variable changed[k] represents an estimate of how many times the sub-detector output Dk has changed at any process. Let i∗ be an index at which (ki ) = k1 , . . . , kn is not one-jump: ki∗ > ki∗ −1 + 1. Let def k ∗ = ki∗ , and consider the following algorithm that emulates some detector D(ki0 ) : Dk0 ∗ = min argk { changed[k] | k ∈ {k ∗ , k ∗ − 1} } def

Dk0 ∗ −1 = min argp { rank[p] | p ∈ {Dk∗ , Dk∗ −1 } } def

Dk0 ∗ −1

def

Dk0

def

=

Dk0 ∗

(1) if rank[Dk0 ∗ ] > i∗ , if

rank[Dk0 ∗ ] ∗

∗

≤i ,

(3)

∗

(4)

for k ∈ / {k , k − 1}.

= Dk 7

(2)

Here, all Dk0 except for Dk∗ and Dk∗ −1 are simply copied from the corresponding Dk . The value Dk0 ∗ is taken to be the output of Dk∗ or Dk∗ −1 which changed less often so far (the “more stable” sub-detector). For Dk0 ∗ −1 , we have two cases. If Dk0 ∗ , computed above, is one of the i∗ most-often-stepping processes so far, then it is used for Dk0 ∗ −1 too. Otherwise, Dk0 ∗ −1 becomes the more-often-stepping of Dk∗ and Dk∗ −1 (the “more correct” process). Lemma 4.3. Consider (ki ) such that ki∗ > ki∗ −1 + 1 for some i∗ . Algorithm (1–4) implements D(ki0 ) with ki0 = ki for all i, except that ki0∗ = ki∗ − 1. Proof. Let c be the number of correct processes. We need to show that there is k ≤ kc0 such that Dk0 eventually stabilizes on a correct process. Let us make two assumptions: (i) We have i∗ ≤ c. Otherwise, c ≤ i∗ − 1 implies kc0 = kc ≤ ki∗ −1 ≤ ki∗ − 2 = k ∗ − 2. This implies that some Dk with k ≤ kc ≤ k ∗ − 2 is stable and correct, which by (4) implies the same for Dk0 . Since k ≤ kc = kc0 , detector D(ki0 ) would be good. (ii) At least one of Dk∗ and Dk∗ −1 is stable and correct. Otherwise, some Dk with k ≤ kc and k ∈ / {k ∗ , k ∗ − 1} is stable and correct, which by (4) implies the same 0 for Dk . We need to show that k ≤ kc0 : • if i∗ < c, then k ≤ kc = kc0 , • if i∗ = c, then k ≤ kc = k ∗ and k ∈ / {k ∗ , k ∗ − 1} means k < k ∗ −1 = kc −1 ≤ kc0 . From (ii), at least one of changed[k ∗ ] and changed[k ∗ − 1] will eventually stop increasing, so Dk0 ∗ will eventually be stable (but possibly faulty). • Case 1: Dk0 ∗ is stable and correct. – If i∗ < c, then k ∗ = ki∗ ≤ kc = kc0 , as needed. – If i∗ = c, then rank[Dk0 ∗ ] ≤ c = i∗ , so Dk0 ∗ −1 = Dk0 ∗ by (3), is also stable and correct. We have k ∗ − 1 = ki∗ − 1 = ki0∗ = kc0 , as needed. • Case 2: Dk0 ∗ is stable and faulty. Therefore, eventually rank[Dk0 ∗ ] > c ≥ i∗ , so Dk0 ∗ −1 is determined by (2). Then, (ii) implies that exactly one of {p1 , p2 } = {Dk∗ , Dk∗ −1 }, say p1 is correct, and p2 is faulty. Therefore eventually rank[p1 ] < rank[p2 ] forever, so Dk0 ∗ −1 = p1 is correct and stable. We have k ∗ − 1 = ki∗ − 1 ≤ ki0∗ ≤ kc0 , as needed. Theorem 4.4. Any detector D(ki ) is equivalent to some D(ki0 ) such that (ki0 ) has the one-jump property. Proof. Similar to that of Theorem 4.2. Let us construct a sequence of sequences (ki1 ), (ki2 ), (ki3 ), . . . in the following way. First, (ki1 ) = (ki ). If sequence (kij ) is one-jump, it is the last element. Otherwise, then let i∗ be such that kij∗ > kij∗ −1 + 1, and let kij+1 = kij for all i, except that P kij+1 = kijP ∗ ∗ − 1. 1 By construction, i ki > i ki2 > . . ., so the sequence (ki1 ), (ki2 ), . . . , is finite. Let (ki0 ) be the last element. By construction (ki0 ) is one-jump. By repeated application of Lemma 4.3, detectors D(k1 ) , D(k2 ) , . . . , D(ki0 ) are all equivalent, which proves the i i assertion. 8

1 2 3 4 5

for each I ⊆ {p1 , . . . , pn } with |I| = i do start extract∆ (I) in the background loop forever let I = { p | rank[p] ≤ i } ¬Ωi ← extract∆ (I)

{ Figure 3 }

Figure 6: Extracting ¬Ωi from Alg∆ . Detector ¬Ωi behaves like ki -anti-Ω if i ≥ c, and arbitrarily otherwise. Note. This paper assumes that the sequences (ki ) are non-decreasing. As opposed to tasks T(ki ) , there no obvious reason for this requirement for detectors D(ki ) . Moreover, it seems that detectors D(ki ) for which this requirement does not hold might not be not equivalent to any detectors D(ki ) for which this property holds. Such detectors are beyond the scope of this paper.

5

D(ki ) is the weakest failure detector for T(ki )

Section 3 showed that D(ki ) is the weakest failure detector in the D(ki0 ) family to solve T(ki ) . This section drops the “D(ki0 ) family” restriction: Theorem 5.1. D(ki ) is the weakest failure detector for T(ki ) . Proof Sketch. Theorem 3.1 proves sufficiency. For the necessity part, consider an algorithm Alg∆ that implements T(ki ) using some failure detector ∆. We need to show that ∆ can implement D(ki ) . The proof proceeds in two parts. First, Section 5.1 uses Alg∆ to extract detectors ¬Ω1 , . . . , ¬Ωn . Detector ¬Ωi behaves as ki -anti-Ω if i ≥ c, and arbitrarily otherwise. Then, Section 5.2 uses the extracted detectors ¬Ω1 , . . . , ¬Ωn to construct detector D(ki ) .

5.1

Extracting ¬Ω1 , . . . , ¬Ωn from Alg∆

Recall that Alg∆ is an algorithm that implements T(ki ) using detector ∆. This section shows how to use Alg∆ to extract detectors ¬Ω1 , . . . , ¬Ωn . Detector ¬Ωi behaves like ki -anti-Ω if i ≥ c, and arbitrarily otherwise (c is the number of correct processes). Consider an n-process, system in which only processes from a known set I, of size i, are allowed to take steps. This is equivalent to an i-process system, so Alg∆ implements ordinary ki -set agreement in this system. We can now use the algorithm from [7] to extract ki -anti-Ω. Since the extraction depends on I, let us call this extraction algorithm extract∆ (I). Algorithm extract∆ (I) treats I as the only processes in the system, and ignores all others. The ki -anti-Ω extraction algorithm in Figure 6 runs in the original n-process system. It runs instances of extract∆ (I) in parallel for all possible I of size i. Each process repetitively computes the set I of the i processes that have taken most steps so far. Then, it returns the ki -vector-omega output from the corresponding extract∆ (I). Lemma 5.2. If i ≥ c, then ¬Ωi from Figure 6 behaves like ki -anti-Ω. Proof. Since i ≥ c, all processes will eventually have the same I forever. Since I contains at least one correct process, the conclusion follows. 9

for i = 1, . . . , n do ¬Ωi ← Figure 6 ~ i ← extract Ω ~ from ¬Ωi using the algorithm in [13], Ω ~i,...,Ω ~i ← Ω ~i ,...,Ω ~i Ω

1 2 3 4

1

ki

ki

1

{ reverse the order }

i←n for k = kn , . . . , 1 do ~i Dk ← Ω k ∗ ~i] i ← rank[Ω k if ki∗ < k then i ← max{ i | ki = k − 1 }

5 6 7 8 9 10

~ 1, . . . , Ω ~ n to extract D(k ) . Figure 7: Using detectors Ω i

5.2

Implementing D(ki ) from detectors ¬Ω1 , . . . , ¬Ωn

Having extracted ¬Ω1 , . . . , ¬Ωn , we can now proceed to implementing D(ki ) . Lines 1–4 in Figure 7 process each i independently. The processing is only meaningful for i ≥ c, so let us assume that this is the case. For each i, the algorithm extracts ki -anti-Ω detector ¬Ωi . Then, it transforms it into ~ i . For reasons ~ i , which consists of ki sub-detectors Ω ~i, ..., Ω a ki -vector-Ω detector Ω 1 ki explained later, we reverse the order of the subdetectors. As a result, we have a family of ki -vector-Ω detectors; one for each i ≥ c. The goal is to take this family, and transform it into a single detector D(ki ) = D1 , . . . , Dkn . If we knew the number c of correct processes, then we could just copy the outputs of c ~ ~ c into D1 , . . . , Dk , and set Dk +1 , . . . , Dk to arbitrary values. Since we do Ω1 , . . . , Ω c n c kc not know c, we have to estimate it. Lines 5–10 in Figure 7 populate Dn , . . . , D1 , while maintaining an upper bound i on c, initially n. If we ignored lines 8–10, then algorithm ~ n into Dk , . . . , D1 . This is a correct behaviour if ~n , ..., Ω would have just copied Ω n 1 kn c = n. For c < n, this may not be correct. The problem is that the only stable and correct ~ n , copied to Dk , may fail to satisfy k ≤ kc , which violates the definition of D(k ) . In Ω k i ~ n implies rank[Ω ~ n ] ≤ c, which implies ki∗ ≤ kc < k, this case, however, correctness of Ω k k triggering the “if” in lines 9-10. Line 10 will lower i so that k = ki at the beginning of the next iteration. This essentially restarts lines 5–10 with a new n = i. Several such “restarts” can happen before the loop in lines 6–10 terminates. Theorem 5.3. Algorithm in Figure 7 extracts D(ki ) . Proof. To obtain a contradiction, assume that the extracted D(ki ) is not good. Let c ~ i that be the number of correct processes. Consider the time after all sub-detectors Ω k eventually stabilize have done so, all faulty processes have died, and rank[p] ≤ c for all correct processes p. In order to obtain a contradiction, I will show that the following two invariants hold: (i) The value of i is always an upper bound on the number of correct processes (i ≥ c). ~ i for i < c are meaningless. This is important because Ω k

10

(ii) The algorithm is deterministic: it always outputs the same sequence D1 , . . . , Dn . This means we only need to show that some Dk with k ≤ kc is correct. Note that these invariants hold only assuming that the extracted D(ki ) is not good. The algorithm in Figure 7 in executed periodically. An expression is stable if it is the same in all executions of that algorithm. In order to show (ii), we need to ensure ~ i read in line 7 is stable, and that the condition “ki∗ < k” in line 9 is also that every Ω k stable. ~i. First, the algorithm is well formed, meaning that it never accesses nonexistent Ω k For this, we need that we always have k ≤ ki in line 7. In the first iteration, k = kn = ki . Executing line 10 ensures that k = ki at the beginning of the next iteration. Iterations that do not execute line 10 only decrease k, maintaining the invariant k ≤ ki . Finally, note that k ≤ ki , line 10 always decreases i. ~ i in line 7. Consider the possible types of Ω k ~ i is stable and faulty. This means that i∗ = rank[Ω ~ i ] is stable so the • Case 1: Ω k k “if” condition “ki∗ < k” in line 9 is stable too (ii). We need to prove that (i) still holds if line 10 is executed. From line 9, we have ~ i is faulty, c < i∗ , which implies kc ≤ ki∗ ≤ k − 1. This means ki∗ ≤ k − 1. Since Ω k that the new i satisfies kc ≤ k − 1 = ki , and since i = max{ i | ki = k − 1 }, we have c ≤ i, as needed (i). ~ i is stable and correct. This implies i∗ = rank[Ω ~ i ] ≤ c, so ki∗ ≤ kc . • Case 2: Ω j k – Case kc < k. Then ki∗ ≤ kc ≤ k − 1. This means that the “if” condition in line 9 is always holds (ii), and the new i ≥ c, using the same argument as in Case 1 above (i). – Case kc ≥ k. This means that Dk has been set to a correct and stable process ~ i . Since k ≤ kc , we have implemented detector D(k ) , a contradiction. Ω k i ~ i is not stable. This case cannot happen. In the iterations of the • Case 3: Ω k ~i , ..., loop (lines 6–10) with the current i, we have already encountered all Ω k+1 i i i ~ ~ ~ Ωki in line 7. The ordering of Ω1 , . . . , Ωki (line 4) guarantees that have already ~ i 0 (Case 2) [13]. However, Case 2 always results in encountered a correct stable Ω k lowering i, a contradiction. Consider the value of i after the last execution of line 10. In the subsequent intera~i , ..., Ω ~ i . Line 10 is not executed for any of these, so tions, line 7 is executed for all Ω 1 ki ~i , ..., Ω ~ i are faulty, which all of them must be Case 1. This means that all processes Ω 1 ki contradicts the definition of ki -vector-Ω. This contradiction shows that our assumption that the algorithm in Figure 7 does not implement D(ki ) is incorrect, which proves the assertions.

5.3

Implementing T(ki ) using ki -set agreement tasks

Section 5.2 presented an algorithm that implements D(ki ) using a family of ki -vector-Ω ~ 1, . . . , Ω ~ n . For completeness, this section presents an algorithm that impledetectors Ω ments T(ki ) using a family of objects T1 , . . . , Tn . Each Ti implements ki -set agreement,

11

1 2 3 4 5 6 7 8 9

participates ← [f alse, . . . , f alse] function T(ki ) (v) at process p is participates[p] ← true for i = 1, . . . , n do propose v to Ti wait until either when Ti decides, say on v 0 then v ← v 0 when more than i entries in participates are true then pass return v Figure 8: Implementing T(ki ) using subtasks Ti .

except that if i is lower than the number of participating processes, then Ti does not have to terminate. The algorithm in Figure 8 loops over i from 1 to n. In each iteration i, it proposes the current estimate v to Ti . If the number of participants grows above i, it stops waiting, and progresses to the next iteration. Otherwise, Ti eventually decides, on at most ki values. These values become estimates for later rounds, so no other values can ever be decided on. Theorem 5.4. Algorithm in Figure 8 implements T(ki ) . Proof. Termination. We need to show that line 6 always terminates. Let i∗ be the number of participating processes. For i < i∗ , line 8 will eventually stop the “wait”. For i ≥ i∗ , object Ti will eventually decide (line 7). Agreement. Object Ti∗ will decide on at most ki∗ different values, which will be assigned to the estimates v. Since only these estimates can be decided on by later rounds, the total number of different decisions cannot exceed ki∗ . Validity follows directly from Validity of the individual Ti objects.

6

Related work

The study of unreliable failure detectors was initiated by Chandra and Toueg [2], who also introduced the Ω failure detector, and showed it to be the weakest failure detector for Consensus [3]. Set agreement was first proposed by Chaudhuri [5], and then proved non-wait-freesolvable in [1, 8, 11]. My earlier papers showed that Anti-Ω and vector-Ω are two equivalent weakest failure detectors for set agreement [12, 13]. Raynal [10] generalized these detectors to k-anti-Ω and k-vector-Ω, and conjectured that these are the weakest failure detectors for k-set agreement. This conjecture was proven by Gafni and Kuznetsov [7]. Recently, Gafni and Kuznetsov [6] attacked the “folklore” conjecture problem from the other end. They showed that the conjecture is true if we consider only tasks that are participation-oblivious (do not depend on the set of participating processes) and require only weak termination (only one process needs to decide). Both [6] and this paper assume that tasks are symmetric (invariant to process id permutations).

12

7

Conclusion

This paper investigated the “folklore” sub-Consensus hierarchy conjecture. The conjecture states that any task not solvable with ¬Ωk requires ¬Ωk−1 [7]. Alternatively, every failure detector is task-equivalent to some ¬Ωk . The conjecture is disproved by presenting a family of tasks T(ki ) and a family of detectors D(ki ) . For any sequence (ki ) = k1 , . . . , kn with ki ≤ ki+1 ≤ ki + 1, the agreement task T(ki ) decides on at most ki proposals, where i is the number of participating processes. Detector D(ki ) consists of sub-detectors D1 , . . . , Dn , such that at least one of D1 , . . . , Dkc behaves like Ω, where c is the number of correct processes. Detector D(ki ) can implement task T(ki0 ) iff (ki ) (ki0 ). This implies that no two D(ki ) are task-equivalent, which disproves the conjecture: the number of equivalence classes of the task-equivalent relation (2n−1 ) is significantly higher than conjectured (n). Detector D(ki ) is the weakest failure detector for T(ki ) . Finally, the impossibility and weakest-failure-detector results proved in this paper build on top of similar results for k-set agreement and k-anti-Ω. In particular, the proofs avoid the need for the complexities of simulation forests etc [4, 13? ] that are needed to prove the original results.

References [1] E. Borowsky and E. Gafni. Generalized FLP impossibility result for t-resilient asynchronous computations. In Alok Aggarwal, editor, Proceedings of the 25th Annual ACM Symposium on the Theory of Computing, pages 91–100, San Diego, CA, USA, May 1993. ACM Press. ISBN 0-89791-591-7. [2] Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267, 1996. [3] Tushar Deepak Chandra, Vassos Hadzilacos, and Sam Toueg. The weakest failure detector for solving Consensus. Journal of the ACM, 43(4):685–722, 1996. [4] Tushar Deepak Chandra, Vassos Hadzilacos, Sam Toueg, and Bernadette CharronBost. On the impossibility of group membership. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing, pages 322–330, New York, USA, 1996. ACM. [5] Chaudhuri. More choices allow more faults: Set Consensus problems in totally asynchronous systems. INFCTRL: Information and Computation, 105, 1993. [6] Eli Gafni and Petr Kuznetsov. On set consensus numbers. In Idit Keidar, editor, Distributed Computing, 23rd International Symposium, DISC 2009, Elche, Spain, September 23-25, 2009. Proceedings, volume 5805 of LNCS, pages 35–47. Springer, 2009. ISBN 978-3-642-04354-3. [7] Eli Gafni and Petr Kuznetsov. The weakest failure detector for solving k-set agreement. In Proceedings of the 28th Annual ACM Symposium on Principles of Distributed Computing (PODC 2009), Calgary, Canada, August 2009. [8] Herlihy and Shavit. The topological structure of asynchronous computability. JACM: Journal of the ACM, 46, 1999. 13

[9] Petr Kouznetsov. The “folklore” conjecture. Private communication. [10] Michel Raynal. k-anti-omega failure detector. Rump session at PODC 2007. [11] Saks and Zaharoglou. Wait-free k-set agreement is impossible: The topology of public knowledge. SICOMP: SIAM Journal on Computing, 29, 2000. [12] Piotr Zieli´ nski. Automatic classification of eventual failure detectors. In Proceedings of the 21st International Symposium on Distributed Computing (DISC), Lemesos, Cyprus, September 2007. [13] Piotr Zieli´ nski. Anti-omega: the weakest failure detector for set agreement. In Proceedings of the 27th Annual ACM Symposium on Principles of Distributed Computing (PODC 2008), Toronto, Canada, August 18-21, 2008, pages 55–64, 2008.

14

Sub-Consensus hierarchy conjecture is false

Nov 11, 2009 - Adapted from [9]. Consider a 3-process system {p1,p2,p3}. Let task T12 require processes p1 and p2 to agree; p3 can decide on anything. Consider detector Î©12 such that if at least one p â {p1,p2} is correct, it eventually outputs the same correct p â {p1,p2} forever; otherwise Î©12 can behave arbitrarily.

Download PDF

198KB Sizes 0 Downloads 173 Views

Report

Sub-Consensus hierarchy conjecture is false

Recommend Documents