LITERATURE REVIEW: Concurrent Lock-Free and Wait-Free Algorithms Medha Vasanth School of Computer Science Carleton University Ottawa, Canada K1S 5B6 medha
[email protected] October 16, 2012
1
Introduction
Significant enhancements in hardware has led to the development of large multicore systems with hundreds of processors capable of performing complex computations simultaneously. This necessitates the need to develop software that can be run in parallel, making optimal use of all the processors. Intuitively parallel data structures form an integral part of processing, since it is the basic building block of every software. Parallel data structures is an old research area with contributions made in both the blocking (locking) and non-blocking (lock-free) structures. This project focuses on the implementation of a method for creating fast wait free data structures. Wait freedom ensures that each process completes its operation in a finite number of steps. This document provides a literature review of the state-of-the-art algorithms for implementing parallel data structures.
2
Literature Review
The traditional way to achieve parallelism among processes is by the mutual exclusion technique. In this technique, a process has exclusive access to the shared object in the critical section. Before and after the critical section, the process executes the entry and exit sections. One of the earliest implementations of the mutual exclusion algorithms was given by Lamport [8]. In his paper, Lamport makes an assumption that contention among processes are rare and that processes can easily gain access into the critical section. Another assumption is that the code outside the critical section does not modify the shared variable. He presents an algorithm that uses exactly seven memory accesses for the shared memory object in the absence of contention. While he provides a proof of correctness and deadlock freedom, this algorithm is not starvation free. The starvation problem of [8] was addressed by Anderson [12]. They propose a mutual exclusive algorithms using atomic reads and writes with O(log N) time complexity. This algorithm is scalable under heavy contention. Processes begin execution at the leaf of a binary arbitration tree. After executing their critical sections, they traverse the tree in the reverse order to execute the exit sections. They also propose a fast path algorithm that has 1
a time complexity of O(1) for no contention. However, the time complexity of the fast path algorithm increases to O(N) for high contention. In [1], Anderson et al. combine [8] and their results in the previous algorithm to achieve a time complexity of O(log N) for high contention systems. They also provide proof that their implementation is starvation free. A universal construction is one which accepts a sequential implementation of any object and automatically converts it into a parallel implementation. A lock-free or non-blocking algorithms ensures that some process completes execution in a finite number of steps. In contrast, a wait-free algorithms ensures that each process completes execution in a finite number if its own steps, regardless of the actions performed by other processes. It ensures that no process is allowed to starve. Herlihy [4] claims that the atomic read/write primitives cannot be used to construct concurrent implementations of even simple structures such as stacks, queues or sets. He proposes that a universal object can be constructed for a system of n processes only if the consensus number is greater than or equal to that of n itself. The following are some implementations of universal constructions. Herlihy [5] proposes a universal construction using the Fetch&Add primitive. The basic idea of his approach is two fold: every operation is implemented sequentially without synchronization, and then special memory management strategies are used to convert from the sequential implementation into the lock-free parallel implementation. However, this approach is applicable only for small objects, since the algorithm involves copying the shared object to make modifications. Chuong et al. [2] propose a transaction friendly universal construction. An implementation is said to be transaction friendly if it allows a process to exit from an uncompleted operation. A process executes the Perform procedure when it wishes to perform an operation, it receives cooperative helping from other processes using the Help procedure. Compare&Swap primitives are used to implement this construction. Fatourou and Kallimanis [3] propose a universal wait-free construction using the the Fetch&Add and the load-linked(LL)/store-conditional(SC) primitives. They experimentally show that their algorithm (Sim) outperforms existing algorithms, although the theoretical complexity of their algorithm is unacceptable. They provide practical wait-free implementations for the stack and queue structures. The limitation of this approach is that it involves copying the shared object into local memory and is not transaction friendly. Also, this approach is not applicable for large data sets such as search trees. M. M. Michael and M. L. Scott [9] propose a non-blocking and a blocking (two locks) implementation of a queue. It is one of the widely accepted algorithms in literature. They prove that their algorithm is linearizable and non-blocking. The CAS primitive is used to construct the non-blocking implementation. Another implementation inspired from the previous approach is the one proposed by Kogen and Petrank [6]. This paper proposes the first concurrent implementation of a FIFO queue with multiple enqueuers and dequeuers. They provide a cooperative helping mechanism, where one process helps another process complete its operation. Although the algorithm is slower than other lock-free implementations, it highly depends on the Operating System configuration and can be improved. They extend their idea of cooperative helping to implement the fast-path and slow-path algorithm [7], where a process helps another process only if it comes in way of execution of its own operation. This approach follows the fast-path (lock-free) implementation for no contention systems and reverts to 2
the slow-path (wait-free) implementation when it detects contention. They prove that their approach is wait-free and linearizable. They have also extended the same idea to implement wait-free linked lists [11]. Another approach to implement concurrent objects is to use multiword CAS (MWCAS) primitives instead of single word CAS. A MWCAS is similar to the single word CAS, in that it accepts the number of words, list of addresses and a list of old and new values as arguments. The operation succeeds only if all the old values are equal to the values at the addresses. Moir [10] proposes a conditional wait-free algorithm using MWCAS primitives. This approach follows the wait-free path only when necessary. Complexity of this approach rises in handling the MWCAS operations efficiently and hence not widely used.
References [1] James H. Anderson and Yong-Jik Kim. A new fast path mechanism for mutual exclusion. Distributed Computing, 14:17–29, 2001. [2] Phong Chuong, Faith Ellen, and Vijaya Ramachandran. A universal construction for wait-free transaction friendly data structures. In Proc. ACM Symposium on Parallel Algorithms and Architectures, pages 164–164. IEEE Comp. Soc. Dig. Library, 2010. [3] Panagiota Fatourou and Nokolaos D. Kallimanis. A highly efficient wait-free universal construction. In Proc. ACM Symposium on Parallel Algorithms and Architectures, pages 325–334, 2011. [4] Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 11(1):124–141, January 1991. [5] Maurice Herlihy. A methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems, 15(5):745–770, November 1993. [6] Alex Kogan and Erez Petrank. Wait-free queues with multiple enqueuers and dequeuers. In Proc. ACM Symposium on Principles and Practice of Parallel Programming, pages 223–234, 2011. [7] Alex Kogan and Erez Petrank. A methodology for creating fast wait-free data structures. In Proc. ACM Symposium on Principles and Practice of Parallel Programming, pages 141–150, 2012. [8] Leslie Lamport. A fast mutual exclusion algorithm. ACM Transactions on Computer Systems, 5(1):1–11, February 1987. [9] M. M. Michael and M. L. Scott. Simple, fast and practical non-blocking and blocking concurrent queue algorithms. In Proc. ACM Symposium on the Principles of Distributed Computing PODC, pages 267–275, 1996. [10] Mark Moir. Transparent support for wait-free transactions. In Proc. Conference on Distributed Computing, 1998.
3
[11] Shahar Timnat, Anastasia Braginsky, Alex Kogan, and Erez Petrank. Wait-free linked lists. In To appear in Proc. ACM Symposium on Principles and Practice of Parallel Programming, 2012. [12] Jae Heon Yang and James H. Anderson. A fast scalable mutual exclusion algorithm. Distributed Computing, 9:1–9, 1994.
4