Three-dimensional reconstruction of cellular structures ...

Viewer
Transcript

ARTICLE IN PRESS

J. Parallel Distrib. Comput. 64 (2004) 285–300

Three-dimensional reconstruction of cellular structures by electron microscope tomography and parallel computing Jose´-Jesu´s Ferna´ndez,a, Jose´-Marı´ a Carazo,b and Inmaculada Garcı´ aa a b

Department of Computer Architecture and Electronics, University of Almerı´a, 04120 Almerı´a, Spain Biocomputing Unit, Centro Nacional de Biotecnologı´a, Universidad Auto´noma, 28049 Madrid, Spain Received 4 December 2002; revised 15 April 2003

Abstract Electron microscope tomography has emerged as the leading technique for structure determination of cellular components with a resolution of a few nanometers, opening up exciting perspectives for visualizing the molecular architecture of the cytoplasm. This work describes and analyzes the parallelization of tomographic reconstruction algorithms for their application in electron microscope tomography of cellular structures. Efﬁcient iterative algorithms that are characterized by a fast convergence rate have been used to tackle the image reconstruction problem. The use of smooth basis functions provides the reconstruction algorithms with an implicit regularization mechanism, very appropriate for highly noisy conditions such as those present in high-resolution electron tomographic studies. Parallel computing techniques have been applied so as to face the computational requirements demanded by the reconstruction of large volumes. An efﬁcient domain decomposition scheme has been devised that leads to a parallel approach with capabilities of interprocessor communication latency hiding. The combination of efﬁcient iterative algorithms and parallel computing techniques have proved to be well suited for the reconstruction of large biological specimens in electron tomography, yielding solutions in reasonable computational times. This work concludes that parallel computing will be the key to afford high-resolution structure determination of cells, so that the location of molecular signatures in their native cellular context can be made a reality. r 2003 Elsevier Inc. All rights reserved. Keywords: Parallel computing; Electron tomography; High performance computing; Iterative reconstruction algorithms

1. Introduction Electron microscopy is central to the study of many structural problems in disciplines related to Biosciences. Electron microscopy together with sophisticated image processing and three-dimensional (3D) reconstruction techniques yields quantitative structural information about the 3D structure of biological specimens [8,15,22]. Knowledge of three-dimensional structure is critical to understanding biological function at all levels of detail [29]. In contrast to earlier instruments, high-voltage electron microscopes (HVEMs) are able to image relatively thick specimens that contain complex 3D structure. Electron tomography simpliﬁes the determination of complex 3D structures and their subsequent

Corresponding author. Fax: +34 950 015 486. E-mail address: [email protected] (J.-J. Ferna´ndez).

0743-7315/$ - see front matter r 2003 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2003.06.005

analysis. This method requires a set of HVEM images acquired at different orientations, via tilting the specimen around one or more axes. Rigorous structural analyses require that image acquisition and reconstruction introduce as little noise and artifact as possible at the spatial scales of interest, for a proper interpretation and measurement of the structural features. As a consequence of the need for structural information over a relatively wide range of spatial scales, electron tomography of complex biological specimens requires large projection HVEM images (typically 1024 1024 pixels or larger). Electron tomography on this scale yields large reconstruction ﬁles and requires an extensive use of computational resources and considerable processing time. High-Performance Computing addresses such computational requirements by means of the use of parallel computing on supercomputers or networks of workstations, sophisticated code optimization techniques, intelligent use of the

ARTICLE IN PRESS 286

J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

hierarchical memory systems in the computers and awareness of communication latencies. Weighted backprojection (WBP) [28] has been one of the most popular methods in the ﬁeld of electron tomography of large specimens due to its computational simplicity. Nevertheless, this method is susceptible to artifacts due to the particularities of limited angle data and extremely low signal-to-noise ratio present in electron tomography. Series expansion reconstruction methods constitute one of the main alternatives to WBP to image reconstruction. Despite their potential advantages compared to WBP [6,18], these methods still have not been extensively used in the ﬁeld of electron tomography due to their high computational costs. This work explores the use of recently developed series expansion methods [3,4] (component averaging methods) characterized by a really fast convergence, which achieve least-squares solutions in a very few iterations. One of the main advantages of these methods is the ﬂexibility to represent the volume elements by means of basis functions different from the traditional voxels. During the 1990s [17,21], spherically symmetric volume elements (blobs) have been thoroughly investigated and, as a consequence, the conclusion that blobs yield better reconstructions than voxels has been drawn in Medicine [14,20] and Electron Microscopy [6,18]. The use of blob basis functions provides the series expansion methods with an implicit regularization mechanism which makes them better suited for noisy conditions than WBP [6]. This work addresses the parallelization of series expansion methods using blobs as basis functions for its application in electron tomography of complex biological specimens. Parallelization and other highperformance computing techniques allow to yield determinations of the 3D structure of large volumes in reasonable computation time. Previous works on parallelization in image reconstruction have dealt with either the problem of 2D image reconstruction from 1D projections (e.g. [3,5]) or the problem of 3D reconstruction using voxels as basis functions (e.g. [9,24,26,30– 33]). In the latter, parallelization is straightforward because the 3D problem can be decomposed into a set of independent 2D reconstruction problems. Nevertheless, the use of blobs as basis functions in 3D reconstruction involves signiﬁcant additional difﬁculties in the parallelization and, in particular, makes substantial communications among the processors needed. In this work, a latency hiding strategy has been devised which has proven to be very efﬁcient to deal with the communications. The aim of this work is two-fold: First, to show that blob-based series expansion methods may constitute real alternatives to WBP as far as the trade-off computation time vs. quality in the reconstruction is concerned; second, to show that parallel computing is imprescind-

ible to face the computational requirements demanded by high-resolution structure determination. This work has been structured into the following: Section 2 presents an introduction to electron tomography and describes the standard reconstruction method currently used for structure determination, weighted backprojection. Section 3 is devoted to the iterative reconstruction methods: a review of the main concepts and a description of the component averaging methods. The concept of blobs and their main properties are also brieﬂy described. Section 4 is devoted to the high-performance computing approach, including the latency hiding strategy, devised for this application. Section 5 then presents the experimental results, in terms of computation time and speedup rate, that have been obtained. Also the experimental results from the application of the approach to experimental biological data are shown, for purposes of quality comparison. Finally, the results are analyzed in detail in Section 6.

2. Electron tomography Electron microscope tomography allows the 3D investigation of structures over a wide range of sizes, from cellular structures to single macromolecules. Depending upon the nature of the specimen and the type of structural information being sought, different approaches of data collection and 3D reconstruction are used (refer to [8,15,22] for a review). This work is focussed on electron tomography of complex biological structures (e.g., cell organelles, subcellular assemblies, or even whole cells). In this approach, due to the thickness of the specimens (50–150 nm), high-voltage electron microscopes (HVEM, 300–500 kV) are used because of the penetration power of the electron beam. Electron tomography of cellular structures has great potential to bridge the gap between methods providing near-atomic resolution and microscopic techniques which allow the examination of living cells such as confocal or dark ﬁeld light microscopy. In electron tomography of complex structures, 3D resolution of 4– 10 nm is currently obtained. Resolutions in the order of 2–4 nm would allow the study of the 3D organization of structural components at a level sufﬁcient for the identiﬁcation of single macromolecules and molecular interactions in their native cellular context [12]. Such information about supramolecular organization and interactions is critical to understand the cellular function [29]. Steps in this direction are already being taken [7]. Examples of recent applications of electron tomography for structural determination of cellular components are: mitochondria [26], Golgi apparatus [16], whole cells [10,23]. In electron tomography of complex biological specimens, the so-called single-axis tilting is the data

ARTICLE IN PRESS J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

collection geometry of choice to record the set of projection images required to compute the 3D reconstruction. For the collection of a single-axis tilt series, a single specimen is tilted over a range typically 760 or 70 in small tilt increments (1–2 ), and an image of the same object area is recorded at each tilt angle via, typically, CCD cameras (see sketch in Fig. 1). Sometimes, for better angular coverage, another tilt series is taken with the specimen rotated by 90 (the so-called double-axis tilting geometry) [19,25]. Typical electron tomographic data sets then range from 60 to 280 images. Due to the resolution requirements, the image size typically range from 1024 1024 up to 2048 2048 pixels. The computation of a distortion-free 3D reconstruction from a single-axis tilt series would require that a data set from the full tilt range ð790 Þ be available. Due to physical limitations of microscopes, the angular tilt range is limited and, as a result, tomographic tilt series have a wedge of missing data corresponding to the uncovered angular range. This limitation, which is not present in medical tomography, causes distortions in 3D reconstructions. First, the structural features in the 3D reconstruction become elongated along the Z-direction (there is a signiﬁcant loss of resolution in the Zdirection). Second, the missing wedge also causes anisotropic resolution (i.e., direction-dependent) so that features oriented perpendicular to the tilt axis tend to fade from view. In order to obtain high resolution reconstructions, another requirement that has to be satisﬁed is that, in imaging the specimen, the electron dose must be kept within tolerable limits to prevent radiation damage erasing ﬁner details of the structure. As a result, the signal-to-noise ratio in the projection images taken with the microscope is extremely low, currently lower than 0.1.

Fig. 1. Single-tilt axis data acquisition geometry. The specimen is imaged in the microscope by tilting it over a range typically 760 or 70 in small tilt increments. As a result, a set projection images needed to structure determination is collected.

287

As a consequence, high-resolution electron tomography of subcellular structures requires a method of ‘‘3D reconstruction from projections’’ able to deal with limited angle conditions and extremely low signal-tonoise ratios of the projection images. Currently, the standard method in the ﬁeld is the well-known weighted backprojection (WBP). Weighted backprojection works in the following manner: It assumes that projection images represent the amount of mass density encountered by imaging rays. The method simply distributes the known specimen mass present in projection images evenly over computed backprojection rays. In this way, specimen mass is projected back into a reconstruction volume (i.e., backprojected). When this process is repeated for a series of projection images recorded from different tilt angles, backprojection rays from the different images intersect and reinforce each other at the points where mass is found in the original structure. Therefore, the 3D mass of the specimen is reconstructed from a series of 2D projection images. The backprojection process involves an implicit low-pass ﬁltering that makes reconstructed volumes strongly blurred. In practice, in order to compensate the transfer function of the backprojection process, a previous high-pass ﬁlter (i.e., weighting) is applied to the projection images, hence the term ‘‘weighted backprojection’’. Fig. 2 shows a sketch of the backprojection process to 3D reconstruction from projections. For a more in-depth description of the method, refer to [28]. The relevance of WBP in electron tomography mainly stems from the linearity and the computational simplicity of the method ðOðN 3 MÞ; where N 3 is the number of voxels of the volume and M is the number of projection images). The main disadvantages of WBP are that (i) the results may be strongly affected by limited tilt angle data obtained with the microcope, and (ii) WBP does not implicitly take into account the transfer function of the HVEM or the noise conditions. As a consequence of the latter, a posteriori regularization techniques (such as low-pass ﬁltering) may be needed to attenuate the effects of the noise. On the other hand, series expansion reconstruction methods are characterized by an inherent regularization nature which makes them robust to face the particularities of electron microscope tomography: limited angle data and extremely noisy conditions [6,18]. However, they have not been used extensively in this ﬁeld because of their computational demands. Fig. 3 illustrates the behavior of WBP and SIRT (stands for simultaneous iterative reconstruction technique, a series expansion method) in a simple, noiseless, case. On the left-hand side of the ﬁgure, an artiﬁcial volume resembling a biological macromolecule is shown. Sixty-one projection images were computed by tilting that volume around the tilt axis indicated in the

ARTICLE IN PRESS 288

J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

Fig. 2. Three-dimensional reconstruction from projections by backprojection. The projection images are projected back into the volume to be reconstructed.

approximated by a linear combination of a ﬁnite set of known and ﬁxed basis functions bj f ðr; f1 ; f2 ÞE

J X

xj bj ðr; f1 ; f2 Þ

ð1Þ

j¼1

Fig. 3. Comparison of reconstruction method in a simple case. Left: the model to be reconstructed. Center: Reconstruction by WBP. Right: Reconstruction by 100 iterations of SIRT.

(where ðr; f1 ; f2 Þ are spherical coordinates), and that the aim is to estimate the unknowns xj : These methods also assume an image formation model where the measurements depend linearly on the object in such a way that: yi E

J X

li;j xj ;

ð2Þ

j¼1

ﬁgure, covering an angular range of ½60 ; þ60 at intervals of 2 : From those projection images, reconstructions were computed using WBP and SIRT. At the center of the ﬁgure, the reconstruction that was obtained using WBP is shown. On the right-hand side, the reconstruction resulting from 100 iterations of SIRT is presented. The strong effects of elongation in the Zdirection and fade around the tilt axis are clearly appreciable in the WBP reconstruction. SIRT, however, exhibits an excellent behavior, minimizing the artifacts present in the WBP reconstruction. Nonetheless, SIRT required two orders of magnitude of computation time more than WBP.

3. Iterative image reconstruction methods 3.1. Series expansion methods Series expansion reconstruction methods assume that the 3D object or function f to be reconstructed can be

where yi denotes the ith measurement of f and li;j the value of the ith projection of the jth basis function. Under those assumptions, the image reconstruction problem can be modeled as the inverse problem of estimating the xj ’s from the yi ’s by solving the system of linear equations given by Eq. (2). Such systems of equations are typically solved by means of iterative methods. Algebraic reconstruction techniques (ART) constitute one of the best-known families of iterative algorithms to solve such systems [13]. Series expansion methods have some advantages over weighted backprojection: First, they exhibit better behavior with limited tilt angle geometries due to the inherent way in which these methods ﬁll up the missing wedge. Second, they show better robustness under extreme noise conditions because of the inherent regularization involved by iterative approaches. Third, their ﬂexibility in the spatial relationships between the object to be reconstructed and the measurements taken, which allows the use of different basis functions, grids to arrange them, and spacing among them. And ﬁnally, the

ARTICLE IN PRESS J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

possibility of incorporating spatial constraints implies that any a priori information about the object or the image formation process may be used to control the solution. The main drawbacks of series expansion methods are (i) their high computational demands and (ii) the need of parameter optimization to properly tune the model (basis functions, the image formation, etc). The high computational requirements may be reduced by using the efﬁcient iterative methods that will be described below. Also, the proper selection of parameters may speed computations [6,13]. 3.2. Component averaging methods Component averaging methods have recently arisen [3,4] as efﬁcient iterative algorithms for solving large and sparse systems of linear equations. In essence, these methods have been derived from ART methods [13], with the important innovation of a weighting related to the sparsity of the system. This component-related weighting provides the methods with convergence rate that may be far superior to the ART methods, specially at the early iteration steps. Assuming that the whole set of equations in the linear system (Eq. (2)) may be subdivided into B blocks each of size S; a generalized version of component averaging methods can be described via its iterative step from the kth estimate to the ðk þ 1Þth estimate by P S X yi Jv¼1 li;v xkv k xkþ1 ¼ x þ l l ; ð3Þ PJ b k j j 2 i;j s¼1 v¼1 sv ðli;v Þ where lk denotes the relaxation parameter; b ¼ ðk mod BÞ; and denotes the index of the block; i ¼ bS þ s; and represents the ith equation of the whole system, and sbv denotes the number of times that the component xv of the volume contributes with nonzero value to the equations in the bth block. The processing of all the equations in one of the blocks produces a new estimate. All blocks are processed in one iteration of the algorithm. This technique produces iterates which converge to a weighted least-squares solution of the system of equations provided that the relaxation parameters are within a certain range and the system is consistent [4]. The efﬁciency of component averaging methods stems from the explicit use of the sparsity of the system, represented by the sbv term in Eq. (3). The componentrelated weighting makes component averaging methods progress through the iterations based on oblique projections onto the hyperplanes constituting the linear system [3,4]. Traditional ART methods, on the other hand, are based on orthogonal projections (in ART methods, sbv ¼ 1). Oblique projections allow component averaging methods to have a fast convergence rate, specially at the early iterate steps.

289

Component averaging methods, as ART methods, can be classiﬁed into the following categories as a function of the number of blocks involved: Sequential. This method cycles through the equation one-by-one producing consecutive estimates ðS ¼ 1Þ: This method exactly matches the well-known row-action ART method [13] and is characterized by a fast convergence as long as relaxation factors are optimized. Simultaneous. This method, known simply as ‘‘component averaging’’ (CAV) [3], uses only one block ðB ¼ 1Þ; considering all equations in the system in every iterative step. This method is inherently parallel, in the sense that every equation can be processed independently from the others. Simultaneous methods are characterized, in general, by a slow convergence rate. However, CAV has an initial convergence rate far superior to simultaneous ART methods (SIRT, for instance). Block-iterative. Block-iterative component averaging methods (BICAV) represent the general case. In essence, these methods sequentially cycle block-by-block, and every block is processed in a simultaneous way. BICAV exhibits an initial convergence rate signiﬁcantly superior to CAV and on a par with row-action ART methods, provided that the block size and relaxation parameter are optimized. BICAV methods (with S41) also exhibit an inherent nature for parallelization. Conceptually, iterative methods (either CAV or ART) proceed in the following way (see Fig. 4): First, they start from an initial model (either a null model, a constant model, or even the result of a previous reconstruction method, e.g. WBP), and the model is progressively reﬁned as iterations evolve. In every iteration, (1) projections are computed from the current model; (2) the error between the experimental projections and those computed from the model is calculated; (3) the model is reﬁned so that the error is minimized (that is known as ‘‘error backprojection’’). In essence, this process is the one analytically expressed by Eq. (3).

Fig. 4. Conceptual scheme of the iterative methods for threedimensional reconstruction from projections.

ARTICLE IN PRESS 290

J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

3.3. Basis functions In the ﬁeld of image reconstruction, the choice of the set of basis functions to represent the object to be reconstructed greatly inﬂuences the result of the algorithm [17]. Spherically symmetric volume elements (blobs) with smooth transition to zero have been thoroughly investigated [17,21] as alternatives to voxels for image representation, concluding that the properties of blobs make them well suited for representing natural structures of all physical sizes. The use of blobs basis functions provides the reconstruction algorithm with an implicit regularization mechanism. In that sense, blobs are specially suited for working under noisy conditions, yielding smoother reconstructions where artifacts and noise are reduced with relatively unimpaired resolution. Speciﬁcally in electron tomography, the potential of blob-based iterative reconstruction algorithms with respect to WBP have already been highlighted by means of an objective task-oriented comparison methodology [6,18]. Blobs are a generalization of a well-known class of window functions in digital signal processing called Kaiser-Bessel [17]. Blobs are spatially limited and, in practice, can also be considered band-limited. The shape of the blob and its spectral features are controlled by three parameters: the radius, the differentiability order and the density drop-off. The appropriate selection of them is highly important. The blob full-width at halfmaximum (FWHM), determined by the parameters, is chosen on the basis of a compromise between the resolution desired and the data noise suppression: narrower blobs provide better resolution, wider blobs allow better noise suppression. For a detailed description of blobs, refer to [17]. The basis functions in Eq. (1) are shifted versions of the selected blob arranged on a simple cubic grid, which is the same grid as the one used with voxels, with the important difference that blobs have an overlapping nature. Due to this overlapping property, the arrangement of blobs covers the space with a pseudocontinuous density distribution (see Figs. 5 and 6) very suitable for image representation. On the contrary,

voxels are only capable of modeling the structure by nonoverlapping density cubes which involve certain spatial discontinuities. The use of blobs allows an efﬁcient computation of the forward- and backward-projection stages in any iterative, either ART or component averaging, method. The spherical symmetry of blobs makes the projection of the blob along any direction the same. Consequently, it is possible to pre-compute the projection of the generic blob, and store it in a look-up table [21]. In this way, the computation of the li;j terms (Eq. (3)) in the forwardand backward-projection stages are transformed into simple references to the look-up table. Furthermore, a blob-driven approach allows the use of incremental algorithms. Consequently, the use of blobs as basis functions enables substantial computational savings compared to the use of voxels [21].

4. Parallel computing in electron tomography Parallel computing has been widely investigated for many years as a means of providing high-performance

Fig. 6. Image representation with blobs. Left: The single-tilt axis acquisition geometry is sketched. The slices are those one-voxel-thick planes orthogonal to the tilt axis. Center: Depicts the slices by means of columns, and the circle represents a generic blob which, in general, extends beyond the slice where it is located. Right: Blobs create a pseudo-continuous 3D density distribution representing the structure.

Fig. 5. Blobs. Right: proﬁle of a blob: continuous function with smooth transition to zero. Center: Arrangement of overlapping blobs on a simple cubic grid. Left: For comparison purposes, arrangement of voxels on the same grid.

ARTICLE IN PRESS J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

computational facilities for large-scale and grandchallenge applications. In the ﬁeld of electron tomography of large specimens, parallel computing has already been applied to large-scale reconstructions by means of WBP as well as voxel-based iterative methods [24,26,31,32]. The use of voxels as basis functions makes the single-tilt axis reconstruction algorithms relatively straightforward to implement on massively parallel supercomputers. The reconstruction of each of the one-voxel-thick slices orthogonal to the tilt axis of the volume (see Fig. 6) is assigned to an individual node on the parallel computer. In that sense, this is an example of an embarrassingly parallel application that is no longer valid for blob-based methodologies, as will be shown below. For the blob case, more effort is needed for a proper data decomposition and distribution across the nodes. 4.1. Data decomposition The single-tilt axis data acquisition geometries typically used in electron tomography allow the application of the single-program multiple-data (SPMD) computational model for parallel computing. In the SPMD model, all the nodes in the parallel computer execute the same program, but for a data subdomain. Single-tilt axis geometries allow a data decomposition into slabs of slices orthogonal to the tilt axis. In the SPMD model for this decomposition, the number of slabs equals the number of nodes, and each node reconstructs its own slab. Those slabs of slices would be independent if voxel basis functions were used. However, due to their overlapping nature, the use of blobs as basis functions makes the slices, and consequently the slabs, interdependent (see Fig. 6). Therefore, the nodes in the parallel computer have to receive a slab composed of its corresponding subdomain together with additional redundant slices from the neighbor nodes. The number

291

of redundant slices depends on the blob extension. Fig. 7(a) shows a scheme of the data decomposition. The slices in the slab received by a given node are classiﬁed into the following categories (see scheme in Fig. 7(b)): Halo: These slices are only needed by the node to reconstruct some of its other slices. They are the redundant slices mentioned above, and are located at the extremes of the slab. Halo slices are coming from neighbor nodes, where they are reconstructed. Unique: These slices are to be reconstructed by the node. In the reconstruction process, information from neighbor slices is used. These slices are further divided into the following subcategories: Edge: Slices that require information from the halo slices coming from the neighbor node. Own: Slices that do not require any information from halo slices. As a result, these slices are independent from those in the neighbor nodes. It should be noted that edge slices in a slab are halo slices in a neighbor slab. 4.2. The parallel iterative reconstruction method In this work, the block-iterative (with S41) and the simultaneous versions of the component averaging methods have been parallelized following the SPMD model and the data decomposition just described. The row-action version of those methods has been discarded since the BICAV method yields a convergence rate roughly equivalent to it but with better speedup due to its inherently parallel nature. Conceptually, as mentioned above, block-iterative and simultaneous reconstruction algorithms may be decomposed into three subsequent stages: (i) Computation of the forward-projection of the model; (ii) Computation of the error between the experimental and the calculated projections; and (iii) reﬁnement of the model by means of backprojection of the error. Those

Fig. 7. Data decomposition. (a) Every node in the parallel system receives a slab including unique slices (light-gray), and additional redundant slices (dark-gray) according to the blob extension. (b) Classiﬁcation of the slices in the slab.

ARTICLE IN PRESS 292

J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

reconstruction algorithms iteratively pass through those stages for every block of equations and for every iteration, as sketched in Fig. 8(a). Initially, the model may be set up to zero, constant value or even from the result of another reconstruction method. The interdependence among neighbor slices due to the blob extension implies that, in order to compute either the forward-projection or the error backprojection for a given slab, there has to be a proper exchange of information between neighbor nodes. Speciﬁcally, updated halo slices are required for a correct forwardprojection of the edge slices. On the other hand, updated halo error differences are needed for a proper error backprojection of the edge slices. The need of communication between neighbor nodes for a mutual update of halo slices is clear. The ﬂow chart in Fig. 8(a) shows a scheme of the iterative algorithm, pointing out the communication points: Just before and after the error backprojection. Fig. 8(b) also shows another scheme depicting the slices involved in the communications: halo slices are updated with edge slices from the neighbor node. Our parallel SPMD approach then allows all the nodes in the parallel computer to independently progress with their own slab of slices. Notwithstanding this fact, there are two implicit synchronization points in every pass of the algorithm in which the nodes have to wait for the neighbors’ messages. Those implicit synchronization points are the communication points just described. In any parallelization project where communication between nodes is involved, latency hiding becomes an issue. That term stands for overlapping communication and computation so as to keep the processor busy while waiting the communications to be completed. In this work, an approach that further exploits the data decomposition has been devised for latency hiding. In

essence, the approach is based on ordering the way the slices are processed between communication points. Fig. 9 sketches this approach. First of all, the left edge slices are processed, and they are sent as soon as they are ready. The communication of left edge slices is then overlapped with the processing of right slices. Similarly, the communication of right edge slices is overlapped with the processing of own slices. This strategy is applied to both communication points, just before and after the error backprojection stage. On the other hand, the ordered way the nodes communicate with each other also makes this parallel approach deadlock-free. Finally, the data decomposition described in the previous subsection makes our parallel SPMD approach implicitly load balanced in homogeneous parallel systems, since all the nodes receive a slab of the same (or as similar as possible) size. This load balancing capability holds as long as the parallel system where the application is to be executed is homogeneous in workload (for example, in dedicated systems). 4.3. Cluster computing The availability of high speed networks and increasingly powerful commodity microprocessors are making clusters of workstations a readily available alternative to expensive, large and specialized high-performance computing platforms. Cluster computing [2] turns out to be a cost effective vehicle for supercomputing, based on the usage of commodity hardware and standard software components, an increasingly popular alternative. Furthermore, clusters of workstations have the important advantage over supercomputers that the turnaround time (the time elapsed from the program launching until the results are available) is much lower. This is due to the usually long wait times in the queues of supercomputers.

Fig. 8. The parallel iterative reconstruction method. (a) Flow chart of the iterative reconstruction algorithm, including communication/ synchronization points. (b) Communications in the parallel algorithm.

ARTICLE IN PRESS J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

293

Fig. 9. Latency hiding: Overlapping communication and computation. The boxes represent stages of the iterative algorithm, and the discontinuous lines denote the transmission of the data already processed. The latency hiding is applied for both communication points in the parallel algorithm: Just before and after the error backprojection stage.

The availability of APIs (application programming interfaces) such as MPI (message-passing interface) [11] allows programmers to implement parallel applications independently from the computing platforms (either cluster of workstations, supercomputers, or even the computational grid). In the work reported here, we have implemented the parallel iterative reconstruction method using standard C and MPI, and it has been evaluated in a cluster of workstations, whose technical features are described below.

5. Evaluation of the parallel approach: results The experiments that have been carried out had several aims. First, the effective speedup and the computation times that our parallel approach yields were computed so as to evaluate its global efﬁciency. Second, the efﬁciency of the latency hiding approach devised in this work was also evaluated. Finally, it has been carried out a comparison between the WBP and the blob-based component averaging methods for experimental data as well. For those purposes, we tested CAV and BICAV algorithms under different parameters: (see [6] for a detailed description of the inﬂuence of parameters and guidelines on their use): *

Number of blocks for component averaging methods: 1; 10; y; 70: We have considered that the block size should be multiple of the size of the projection images so that all the pixels of the same image belong to the same block.

*

Blob parameters. We have tested two different blob radii: 2.0 and 3.0. Those two values allow to analyze the whole spectrum of blob parameters in terms of the costs in the computation and communications in the algorithm. All the blobs whose radius is in the range ]1.0,2.0] involve a neighborhood of one slice. All the blobs whose radius is in the range ]2.0,3.0] involve a neighborhood of two slices. With those two values of blob radius, the whole range of blob sizes that are used in practice is covered (see [6] for details).

The performance results that are described and analyzed below were measured in a cluster of workstations at the MRC-LMB (Medical Research Council—Laboratory of Molecular Biology, Cambridge, UK). The cluster consists of 20 single-cpu 1:3 GHz AMD Athlon-based computers with 768 Mb memory each, using Linux as an operating system. The computers in the cluster are switched by a fast Ethernet network (100 Mbit Ethernet). We have evaluated the scalability of the parallel approach using different number of processors in the cluster: 1, 4, 8, 12, 16, 20. Finally, we have evaluated the efﬁciency of the latency hiding approach of the application. For that purpose, we have used two different implementations of the algorithm: one version provided with the latency hiding approach described above, and another one in which the communications are not overlapped with the computation. For the evaluation, we have used a data set consisting of 70 projection images of 480 480 pixels taken from the specimen tilted in the range ½70 ; þ68 at intervals of 2 : The volume to be reconstructed was 480 480 480 volume-elements (represented by single precision

ARTICLE IN PRESS J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

294

5.1. Speedup

ﬂoating point numbers) in size. This size resembles the one typically used in the current studies in structural biology by electron tomography (volumes in the range [256, 512] Mbytes). We have selected this speciﬁc size because it allows a uniform data distribution across the processors in the cluster. This data set involves that the communication of the errors between the experimental and the calculated projections consists of 480 70 ﬂoating point numbers for each neighbor processor, if blob size 2.0 is chosen (twice that number for blob size 3.0). Also, the communication of the reconstructed slices consists of 480 480 ﬂoating point numbers for each neighbor processor, if blob size 2.0 is chosen (twice that number for blob size 3.0).

The efﬁciency of any parallelization approach is usually evaluated in terms of the speedup [34], deﬁned as the ratio between the computation time required for the application to be executed in a single-processor environment and the corresponding time in the parallel system. The speedup has been computed for the different number of blocks, the different blob sizes, with and without latency hiding, using the data set just described. Fig. 10 shows the speedups obtained for component averaging methods (BICAV) using three representative numbers of blocks: 70 blocks, 35 blocks and 1 block. The results obtained for the other intermediate numbers of blocks that have been tested

20 ideal Bicav, 70 blocks, L.H. Bicav, 70 blocks, NO L.H. Bicav, 35 blocks, L.H. Bicav, 35 blocks, NO L.H. Bicav, 1 block, L.H. Bicav, 1 block, NO L.H.

Speedup

16

12

8

4

1 1

4

8

(a)

12 No. Processors

16

20

16

20

20 ideal Bicav, 70 blocks, L.H. Bicav, 70 blocks, NO L.H. Bicav, 35 blocks, L.H. Bicav, 35 blocks, NO L.H. Bicav, 1 block, L.H. Bicav, 1 block, NO L.H.

Speedup

16

12

8

4

1 1

(b)

4

8

12 No. Processors

Fig. 10. Speedups obtained for component averaging methods. Results for BICAV with 70 blocks, 35 blocks and 1 block are shown. L.H. stands for Latency-Hiding. (a) blob size 2.0; (b) blob size 3.0.

ARTICLE IN PRESS J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

and not shown here follow a similar tendency. This speedup was computed over the turn-around time for 10 iterations of the algorithms. The speedup for WBP was also computed but not shown here since the differences with the ideal case are negligible. The sequential versions of WBP and blob-based iterative methods that were used to compute the speedups implemented, respectively, the algorithm described in Section 2 (Fig. 2) and the algorithm in Eq. (3) (Fig. 8(a)) applied to the whole volume, with no data decomposition, no communications and no MPI stuff. Fig. 10(a) shows the speedup obtained for the case in which the blob size 2.0 was used, whereas Fig. 10(b) corresponds to blob size 3.0. The former implies that the communications with the two neighbor nodes involve one slice; the latter implies that the communications are two slices in size. This makes the speedups for the latter slightly lower than for the former. This is especially evident for the cases in which no latency hiding is used. On the other hand, since the amount of communications in BICAV is scaled by the number of blocks, using a small number of blocks implies less communications per iteration, and consequently, the speedup is better. It is clearly observed in Fig. 10 that the less blocks used, the higher the speedup. Fig. 10 clearly shows that there exist a signiﬁcant difference in using the latency hiding approach or not. In general, the speedup curves in Fig. 10 corresponding to the use of the latency hiding approach exhibit a global behavior that approaches the ideal, linear, speedup. Still, using less blocks yields better speedup rates. Nevertheless, if latency hiding is not used, the speedup quickly (around 8 processors) changes the slope to a lower one. 5.2. Influence of the number of blocks In order to quantify the inﬂuence of the number of the blocks used in BICAV, we measured the percentages of the time per iteration dedicated to the different stages of the parallel algorithm: (a) forward projection and error computation; (b) communication of the errors; (c) error backprojection; (d) communication of the reconstructed slices. We took those measures for the different values of blob size, and both cases using latency hiding and not. We restricted ourselves to the case in which the maximum number the processors was used: 20. We evaluated 8 different values of the number of blocks: 1, 10, 20, 30, 40, 50, 60, 70. Fig. 11 shows the results. Fig. 11(a) shows the results measured in the case in which the blob size was 2.0, and Fig. 11(b) for the blob size 3:0: In the ﬁgures, there are two groups of graphs, according to the use or not of latency hiding. In each group of graphs, the results for the different number of blocks are shown. For every number of block, there are four bars which represent the percentage of the time in

295

Fig. 11. Percentage of the time of the iteration dedicated to the different stages of the algorithm as a function of the number of blocks. The stages are: (1) forward projection and error computation; (2) communication of the errors; (3) error backprojection; (4) communication of the reconstructed slices. Each group of four bars represents these four percentages in that order: (a) blob size 2.0; (b) blob size 3.0.

the iteration dedicated to the four different stages just mentioned (a–d), in the same order as presented above. (Note that in almost all the cases, the second bar which represents the time dedicated to the communication of the errors has a negligible height). From Fig. 11 is clearly observed that the percentage of time dedicated to communication of reconstructed slices is scaled with the number of blocks, in all the cases (with and without latency hiding, and with both blob sizes). Correspondingly, the percentage of time dedicated to computation is then decreased. In the cases in which no latency hiding was used, it is evident that a signiﬁcant percentage of time is dedicated to communication. Even the time dedicated to the communication of the errors, which is normally negligible, becomes onerous. Approximately from 30 blocks on, the turn-around time of the application is dominated by communications. In the cases in which the latency hiding was used, the turn-around time is dominated by computations. The communication of the errors is absolutely negligible.

ARTICLE IN PRESS 296

J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

And, concerning the communication of the reconstructed slices, the highest rate of communications takes place for the maximum number of blocks, and involves around 10–14% of the turn-around time, depending on the blob size. These results means that the latency hiding approach succeeds in hiding the potential latency due to the communications of the errors. However, the communication of the slices (d) is not completely hidden, although that time remains signiﬁcantly lower than the one dedicated to computation. That is the reason why the speedups shown above approaches the linear behavior, relatively independent of the number of blocks (although the more blocks, the farther away from the ideal case). 5.3. Influence of the number of processors The number of processors may inﬂuence signiﬁcantly the time dedicated to communications because of the potential overhead in the network. With the aim of analyzing that inﬂuence, we measured the percentage of time dedicated to the different stages of the algorithm (the same four stages as in the previous subsection) using different number of processors. We also took into account the blob sizes and the use of latency hiding. Figs. 12 and 13 show the results for the cases in which the blob size was 2.0 and 3.0, respectively. Figs. 12(a) and (b) show the results with and without the use of latency hiding. Similarly, Figs. 13(a) and (b) are the corresponding ones for the blob size 3.0. In all of those ﬁgures, there are three groups of graphs. The group on the left corresponds to the case of BICAV with 70 blocks; the group in the middle corresponds to BICAV with 35 blocks; and ﬁnally, on the right, BICAV with 1 block. Each of those groups of graphs shows the percentages of time for the different stages of the algorithm measured for different numbers of processors: 1, 4, 8, 12, 16, 20. For every number of processors, there are four bars which represent the percentage of the time in the iteration dedicated to the four different stages (a–d) mentioned in the previous subsection. From those ﬁgures, it is clearly evident that the number of processors inﬂuence the percentages of the time dedicated to the different stages. However, this inﬂuence is also dependent on the number of blocks used. If 1 block is used (i.e., the CAV version of the algorithm), most of the percentage of the iteration is dedicated to computation, independently of the use of the latency hiding approach or not. The reasoning underlying this independence from the latency hiding approach is that, in either case, the time dedicated to communication in BICAV with 1 block is almost negligible compared to the computation time. Figs. 12(b) and 13(b) show the corresponding results in the case in which no latency hiding is used. It is clearly observed that the use of more blocks and more

Fig. 12. Percentage of the time of the iteration dedicated to the different stages of the algorithm as a function of the number of processors, using a blob size 2.0. The stages are: (1) forward projection and error computation; (2) communication of the errors; (3) error backprojection; (4) communication of the reconstructed slices. Each group of four bars represents these four percentages in that order: (a) latency hiding; (b) no latency hiding.

processors involves a signiﬁcant penalty due to the communications. In the extreme case in which 70 blocks and 20 processors are used, the communications dominate signiﬁcantly the turn-around time. Figs. 12(a) and 13(a) show the results when latency hiding is used. In those cases, the percentage of communications time follows an increasing curve as a function of the number of processors. However, the latency hiding allows a much better control of the penalties due to communications, and, at the worst case, the communications of the reconstructed slices imply 10–14% of the turn-around time of the iteration. In all the cases, the communications of the errors are completely hidden. 5.4. WBP vs. BICAV in turn-around time A comparison of component averaging methods with WBP in terms of the computational burden was also carried out. For that purpose, we measured the time

ARTICLE IN PRESS J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

demanded by WBP and the time required for BICAV methods, measuring the time per iteration and using different numbers of blocks, in the task of reconstructing the 480 480 480 volume under the conditions described at the beginning of this section. Undoubtedly WBP is the lowest resources-consuming method, since it

297

requires 170 s for the whole reconstruction using 20 processors. However, one iteration of the component averaging methods using the same number of processors only requires a little more time. If a blob radius 2.0 is used, the time is in the range 252–288 s; (the minimum corresponds to BICAV with 1 block, the maximum to BICAV with 70 blocks). If a blob radius 3.0 is used, the time is in the range 362–426 s: Taking into account that these iterative methods are really efﬁcient and yield suitable solutions after a few iterations (in a range usually between 1 and 10, see [6]), these computation times make component averaging methods real alternatives to WBP, according to the trade-off computation time vs. quality of the reconstruction (as will be illustrated below with an experimental example). 5.5. Experimental application to electron tomography of mitochondria data

Fig. 13. Percentage of the time of the iteration dedicated to the different stages of the algorithm as a function of the number of processors, using a blob size 3.0. The stages are: (1) forward projection and error computation; (2) communication of the errors; (3) error backprojection; (4) communication of the reconstructed slices. Each group of four bars represents these four percentages in that order: (a) latency hiding; (b) no latency hiding.

As an example of the behavior of the regularized algorithm described in this article, we have applied the BICAV and WBP reconstruction methods to real mitochondria data obtained from HVEM and prepared using photooxidation procedures, which are characterized by the low contrast and the extremely low signal-tonoise ratio (approaching SNR ¼ 1:0) exhibited by the images. Seventy projection images (with tilt angles in the range ½70; þ68 ) were combined to obtain the reconstructions. The projection images were 1024 480; and the volume 1024 480 256: We have tested the algorithms under the same conditions as described in [6]. A montage showing one z-section of the volume reconstructed with the different methods is presented in Fig. 14. Fig. 14(a) shows the result coming from WBP. Fig. 14(b) shows the reconstruction obtained from BICAV with 70 blocks after 4 iterations with a relaxation factor of 1.0 and a blob with radius 2.0. Fig. 14 clearly shows the potential of blob-based BICAV in dealing with extremely noisy conditions, yielding solutions much cleaner than WBP and, moreover, at the same resolution level. The excellent behavior exhibited by BICAV under so noisy situations comes

Fig. 14. Comparison of WBP and component averaging methods in an experimental application. One of the slices along the Z-axis is shown. (a) Result from WBP. (b) Result from BICAV with 70 blocks after 4 iterations. Reprinted from [6], Copyright 2002, with permission from Elsevier Science.

ARTICLE IN PRESS 298

J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

from the regularization nature of the iterative algorithms and blobs. However, WBP provides a ‘‘noisy’’ solution due to the high noise level in the experimental projection images. We have also tested wider blobs in the BICAV methods, and the results prove to be smoother than those presented. However, the strong smoothing provided by wider blobs may not be desirable from the point of view of the biologist who wants to measure structural features in the reconstructions. Regarding the computation times, the reconstructions were done in the cluster of workstations at the MRCLMB, using 20 processors. For the results shown in Fig. 14, WBP took around 225 s of computation time to obtain the solution, whereas BICAV took around 1600 s (which involves around 400 s per iteration).

6. Discussion and conclusion In this work we have described and analyzed the parallelization of blob-based series expansion methods for their application in electron tomography of complex biological specimens. We have made use of efﬁcient iterative methods to tackle the problem of image reconstruction. Parallel computing has been applied so as to face the high computational requirements that structure determination by electron tomography demands. The results that we have obtained clearly show that the combination of those reconstruction methods and high-performance computing is well suited to tackle determination of cellular structures by electron tomography, yielding solutions in reasonable computation times. A parallel approach for the iterative algorithms has been devised, exhibiting a speedup nearly linear with the number of processors in the system. This approach is provided with a latency hiding strategy that has been proved to be very efﬁcient to deal with the communications among the processors. The results indicate that the latency due to communications succeeds in hiding most of the communication by overlapping it with computation. This parallel strategy combined with the use of 20 processors allows the iterative methods to take around 5 computation minutes per iteration in the reconstruction of volumes of 420 Mbytes in size, similar to the sizes used in typical electron tomographic studies. The parallel approach is also exploited for WBP in such a way that a result of the same size is obtained after nearly 3 computation minutes. Computation times for reconstruction of volumes of 1 Gbyte (1024 1024 256 volume-elements represented with single precision ﬂoating point numbers) from 140 projection images of 1024 1024 using this approach with 20 processors have proved to take around 16 min for WBP and

around 38 min for one iteration of BICAV with 140 blocks. The current challenge in the ﬁeld of electron tomography of cells is the reconstruction of whole cells with sufﬁcient resolution for visualizing the molecular architecture of the cytoplasm. In that way, the location of ‘‘molecular signatures’’ in the native cellular context will allow to ﬁnd out evidences about how the fundamental cellular functions are carried out [1,27]. To reach such high resolution levels (in the range of around 2 nm), it has been estimated that around 300 large projection images (around 2048 2048 pixels in size), taken at tilt increments of 0:5 ; need to be combined to yield high-resolution reconstructions, which may involve volumes with 2048 2048 512 volume-elements. High-Performance Computing will make it possible to afford those ‘‘grand challenge’’ applications currently unapproachable by uni-processor systems due to the computational resources requirements. On the other hand, high-resolution structure determination of subcellular assemblies by electron tomography implies a number of unfavorable conditions, such as limited angle data and extremely low signal-to-noise ratio. The use of reconstruction techniques that account for those conditions to avoid, compensate or alleviate the potential artifacts are highly desirable. The regularized iterative reconstruction methods we have used here are very valuable in those conditions [6] and, moreover, are very efﬁcient, providing least squares solutions in a very few of iterations. Speciﬁcally, BICAV with a large number of blocks yields good reconstructions in a number of iterations in the range [1,10], depending on the parameters. Furthermore, the use of blobs as basis functions provides the reconstruction algorithms with an implicit regularization mechanism which makes them well suited for noisy environments. As a consequence, the solutions yielded by blob-based iterative methods are smoother than those by WBP, but with relatively unimpaired resolution. In particular, under extremely noisy conditions, this type of algorithms clearly outperform WBP. In those situations, WBP would require a strong low-pass ﬁltering post-reconstruction stage which would limit the maximum resolution attainable. In conclusion, this work has shown that the combination of efﬁcient iterative methods and parallel computing is well suited to tackle high resolution structure determination of large, complex, biological specimens by electron microscope tomography.

Acknowledgments This work has been partially developed while J.J.Fernandez was in a research stay at the Medical Research Council—Laboratory of Molecular Biology,

ARTICLE IN PRESS J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

Cambridge, UK. The authors wish to acknowledge the MRC-LMB for the use of the cluster, and especially Dr. Terry Horsnell for his help and support. The authors also wish to thank Drs. G. Perkins and M.H. Ellisman who kindly provided the real mitochondria data. This work has been partially supported through grants from the Spanish CICYT TIC2002-00228 and BIO2001-1237.

References [1] W. Baumeister, Electron tomography: towards visualizing the molecular organization of the cytoplasm, Curr. Opin. Structural Biol. 12 (2002) 679–684. [2] R. Buyya (Ed.), High Performance Cluster Computing, Vols. I and II, Prentice-Hall, Englewood Cliffs, NJ, 1999. [3] Y. Censor, D. Gordon, R. Gordon, Component averaging: an efﬁcient iterative parallel algorithm for large and sparse unstructured problems, Parallel Comput. 27 (2001) 777–808. [4] Y. Censor, D. Gordon, R. Gordon, BICAV: a block-iterative, parallel algorithm for sparse systems with pixel-related weighting, IEEE Trans. Med. Imaging 20 (2001) 1050–1060. [5] Y. Censor, S.A. Zenios, Parallel optimization, Theory, Algorithms and Applications, Oxford University Press, Oxford, 1997. [6] J.J. Fernandez, A.F. Lawrence, J. Roca, I. Garcia, M.H. Ellisman, J.M. Carazo, High performance electron tomography of complex biological specimens, J. Structural Biol. 138 (2002) 6–20. [7] A.S. Frangakis, J. Bohm, F. Forster, S. Nickell, D. Nicastro, D. Typke, R. Hegerl, W. Baumeister, Identiﬁcation of macromolecular complexes in cryoelectron tomograms of phantom cells, Proc. Natl. Acad. Sci. USA 99 (2002) 14153–14158. [8] J. Frank (Ed.), Electron tomography, Three-Dimensional Imaging with the Transmission Electron Microscope, Plenum Press, New York, 1992. [9] I. Garcia, J. Roca, J. Sanjurjo, J.M. Carazo, E.L. Zapata, Implementation and experimental evaluation of the constrained ART algorithm on a multicomputer system, Signal Process. 51 (1996) 69–76. [10] R. Grimm, H. Singh, R. Rachel, D. Typke, W. Zillig, W. Baumeister, Electron tomography of ice-embedded prokaryotic cells, Biophys. J. 74 (1998) 1031–1042. [11] W. Gropp, E. Lusk, A. Skjellum, Using MPI Portable Parallel Programming with the Message-Passing Interface, MIT Press, Cambridge, MA, 1994. [12] K. Grunewald, O. Medalia, A. Gross, A.C. Steven, W. Baumeister, Prospects of electron cryotomography to visualize macromolecular complexes inside cellular compartments: implications of crowding, Biophys. Chem. 100 (2003) 577–591. [13] G.T. Herman, Algebraic reconstruction techniques in medical imaging, in: C. Leondes (Ed.), Medical Imaging, Systems, Techniques and Applications, Gordon and Breach Science, London, 1998, pp. 1–42. [14] P.E. Kinahan, S. Matej, J.S. Karp, G.T. Herman, R.M. Lewitt, A comparison of transform and iterative reconstruction techniques for a volume-imaging PET scanner with a large axial acceptance angle, IEEE Trans. Nuclear Sci. 42 (1995) 2281–2287. [15] A.J. Koster, R. Grimm, D. Typke, R. Hegerl, A. Stoschek, J. Walz, W. Baumeister, Perspectives of molecular and cellular electron tomography, J. Structural Biol. 120 (1997) 276–308. [16] M.S. Ladinsky, D.N. Mastronarde, J.R. McIntosh, K.E. Howell, L.A. Staehelin, Golgi structure in three dimensions: functional insights from the normal rat kidney cell, J. Cell Biol. 144 (1999) 1135–1149.

299

[17] R.M. Lewitt, Alternatives to voxels for image representation in iterative reconstruction algorithms, Phys. Med. Biol. 37 (1992) 705–716. [18] R. Marabini, G.T. Herman, J.M. Carazo, 3D reconstruction in electron microscopy using ART with smooth spherically symmetric volume elements (blobs), Ultramicroscopy 72 (1998) 53–56. [19] D.N. Mastronarde, Dual-axis tomography: an approach with alignment methods that preserve resolution, J. Structural Biol. 120 (1997) 343–352. [20] S. Matej, G.T. Herman, T.K. Narayan, S.S. Furuie, R.M. Lewitt, P.E. Kinahan, Evaluation of task-oriented performance of several fully 3D PET reconstruction algorithms, Phys. Med. Biol. 39 (1994) 355–367. [21] S. Matej, R.M. Lewitt, G.T. Herman, Practical considerations for 3-D image reconstruction using spherically symmetric volume elements, IEEE Trans. Med. Imaging 15 (1996) 68–78. [22] B.F. McEwen, M. Marko, The emergence of electron tomography as an important tool for investigating cellular ultrastructure, J. Histochem. Cytochem. 49 (2001) 553–564. [23] O. Medalia, I. Weber, A.S. Frangakis, D. Nicastro, G. Gerisch, W. Baumeister, Macromolecular architecture in eukaryotic cells visualized by cryoelectron tomography, Science 298 (2002) 1209–1213. [24] S.T. Peltier, A.W. Lin, D. Lee, S. Mock, S. Lamont, T. Molina, M. Wong, L. Dai, M.E. Martone, M.H. Ellisman, J. Parallel Distributed Comput. 63 (2003) 539–550. [25] P. Penczek, M. Marko, K. Buttle, J. Frank, Double-tilt electron tomography, Ultramicroscopy 60 (1995) 393–410. [26] G.A. Perkins, C.W. Renken, J.Y. Song, T.G. Frey, S.J. Young, S. Lamont, M.E. Martone, S. Lindsey, M.H. Ellisman, Electron tomography of large, multicomponent biological structures, J. Structural Biol. 120 (1997) 219–227. [27] J.M. Plitzko, A.S. Frangakis, S. Nickell, F. Forster, A. Gross, W. Baumeister, In vivo veritas: electron cryotomography of cells, Trends Biotechnol. 20 (2002) 40–44. [28] M. Rademacher, Weighted back-projection methods, in: J. Frank (Ed.), Electron Tomography, Three-Dimensional Imaging with the Transmission Electron Microscope, Plenum Press, New York, 1992, pp. 91–115. [29] A. Sali, R. Glaeser, T. Earnest, W. Baumeister, From words to literature in structural proteomics, Nature 422 (2003) 216–225. [30] D.W. Shattuck, J. Rapela, E. Asma, A. Chatzioannou, J. Qi, R.M. Leahy, Internet2-based 3D PET image reconstruction using a PC cluster, Phys. Med. Biol. 47 (2002) 2785–2795. [31] S. Smallen, H. Casanova, F. Berman, Applying scheduling and tuning to on-line parallel tomography, in: Proceedings of the ACM/IEEE Conference on Supercomputing, ACM Press, New York, 2001, pp. 12–12. [32] S. Smallen, W. Cirne, J.F. Frey, F. Berman, R. Wolski, M.H. Su, C. Kesselman, S. Young, M.H. Ellisman, Combining workstations and supercomputers to support grid applications: the parallel tomography experience, in: Proceedings of the ninth Heterogenous Computing Workshop, Cancun, Mexico, 2000, pp. 241–252. [33] S. Vollmar, C. Michel, J.T. Treffert, D.F. Newport, M. Casey, C. Knoss, K. Wienhard, X. Liu, M. Defrise, W.D. Heiss, HeinzelCluster: accelerated reconstruction for FORE and OSEM3D, Phys. Med. Biol. 47 (2002) 2651–2658. [34] B. Wilkinson, M. Allen, Parallel Programming, Prentice-Hall, Englewood Cliffs, NJ, 1999.

Jose´-Jesu´s Ferna´ndez received the M.S. and Ph.D. degrees in Computer Science from the University of Granada, Spain, in 1992 and 1997, respectively. He was a PhD student at the BioComputing unit of Spanish National Center for BioTechnology (CNB) from the Spanish National Science Research Council (CSIC), Madrid, Spain.

ARTICLE IN PRESS 300

J.-J. Ferna´ndez et al. / J. Parallel Distrib. Comput. 64 (2004) 285–300

He became an Assistant Processor in October 1997 and, subsequently, Associate Professor in 2000 in Computer Architecture at the University of Almeria, Spain. He is a member of the supercomputing-algorithms research group. His current research interests include high performance computing, image processing and tomographic reconstruction. Jose´-Marı´ a Carazo is Senior Scientist of the Spanish High Research Council (CSIC). He leads the BioComputing Unit of the National Center for Biotechnology in Madrid, Spain, having served as its Deputy Director for Research between 1998 and 2002. He also holds an adjunct Associate Professorship appointment in the area of Computer Architecture at the University ’’Autonoma de Madrid (UAM)’’. At present he is Deputy Director for research planning and monitoring of the Spanish Ministry of Science and Technology. His

research interests include image processing, parallel computing and structural databases. Inmaculada Garcı´ a received her B.Sc. degree in Physics in 1977 from the Complutense University of Madrid (Spain) and her Ph.D. degree in 1986 from the University of Santiago de Compostela (Spain). From 1977 to 1987 she was an Assistant professor at the University of Almerı´ a and since 1997 she is a Full Professor at the University of Almerı´ a and the head of the Department of Computer Architecture and Electronics (University of Almerı´ a). During 1994-1995 she was a visiting researcher at the University of Pennsylvania. She is head of the supercomputing-algorithms research group. Her research interest lies in the ﬁeld of parallel algorithms for irregular problems related to Image Processing, Global Optimization and Matrix Computation.

Size Optimization of Truss Structures By Cellular ...

Reconstruction of Threaded Conversations in Online Discussion ...

Reconstruction of Orthogonal Polygonal Lines

PATELLAR TENDON GRAFT RECONSTRUCTION OF THE ACL.pdf ...

RTTI reconstruction - GitHub

FAIRNESS OF RESOURCE ALLOCATION IN CELLULAR NETWORKS

Cellular Video.pdf

Cellular Signalling.pdf

CELLULAR lRELESsNETWORKS -

cellular adaptations

CELLULAR lRELESsNETWORKS -

A reconstruction of Kulik's table of factors (1825) - LOCOMAT - Loria

The Effect of Recombination on the Reconstruction of ...

MOBILE CELLULAR TELECOMMUNICATIONS.pdf

Schematic Surface Reconstruction - Semantic Scholar

Keil - A Reconstruction of Aristotle's Account of Honesty.pdf ...

A reconstruction of Kulik's table of factors (1825) - LOCOMAT - Loria

Schematic Surface Reconstruction - Changchang Wu

ODT data reconstruction

Vodafone Cellular - Taxscan.pdf

cellular automata pdf

Cellular communications system with sectorization

Download Cellular Communications