Journal of Structural Biology 164 (2008) 1–6

Contents lists available at ScienceDirect

Journal of Structural Biology journal homepage: www.elsevier.com/locate/yjsbi

Minireview

High performance computing in structural determination by electron cryomicroscopy J.J. Fernández * Department of Computer Architecture, University of Almeria, Almeria 04120, Spain Centro Nacional de Biotecnologia – CSIC, Campus Universidad Autonoma, Cantoblanco, 28049 Madrid, Spain

a r t i c l e

i n f o

Article history: Received 28 May 2008 Received in revised form 4 July 2008 Accepted 7 July 2008 Available online 16 July 2008 Keywords: High performance computing Parallel computing Electron cryomicroscopy Single particle electron cryomicroscopy Electron cryotomography Electron tomography

a b s t r a c t Computational advances have significantly contributed to the current role of electron cryomicroscopy (cryoEM) in structural biology. The needs for computational power are constantly growing with the increasing complexity of algorithms and the amount of data needed to push the resolution limits. High performance computing (HPC) is becoming paramount in cryoEM to cope with those computational needs. Since the nineties, different HPC strategies have been proposed for some specific problems in cryoEM and, in fact, some of them are already available in common software packages. Nevertheless, the literature is scattered in the areas of computer science and structural biology. In this communication, the HPC approaches devised for the computation-intensive tasks in cryoEM (single particles and tomography) are retrospectively reviewed and the future trends are discussed. Moreover, the HPC capabilities available in the most common cryoEM packages are surveyed, as an evidence of the importance of HPC in addressing the future challenges. Ó 2008 Elsevier Inc. All rights reserved.

1. Introduction Electron cryomicroscopy (cryoEM) has got an essential role in structural biology. Single particle cryoEM allows structural elucidation of macromolecular assemblies at subnanometer resolution and, recently, up to near-atomic level (Zhou, 2008). There are exciting prospects to reach atomic resolution soon (Henderson, 2004). On the other hand, electron cryotomography allows structural studies of complex specimens at near-molecular resolution as well as visualization of macromolecular complexes in their native cellular context (Lucic et al., 2005). The integrative combination of these cryoEM modalities with other complementary experimental approaches is expected to provide a comprehensive description of the cell function (Robinson et al., 2007). Numerous technical and computational advances have contributed to the current position of cryoEM. The importance of increasingly complex algorithms to address the different processing stages has been significant (Fernandez et al., 2006; Frank, 2006; Leis et al., 2006). So have been the needs for computational power. In single particle cryoEM, the computational demands come from the complexity of the algorithms themselves (e.g. classification, angular determination, and 3D reconstruction) together with the increasing number of particles, and their size, necessary to reach higher resolution. In electron cryotomography, the demands derive from the size of the tomograms or the number of subtomograms to * Fax: +34 950 015 486. E-mail address: [email protected]. 1047-8477/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.jsb.2008.07.005

align, along with the algorithmic complexity of the methods (e.g. 3D reconstruction, denoising, and template matching). High performance computing (HPC) has been traditionally applied to address the computationally demanding problems in cryoEM. Different types of HPC platforms, following the evolution in computer architecture, have been used. In the 1990s, large specialized parallel computers from institutional HPC centres were used for structural studies by single particle cryoEM and tomography (e.g. Martino et al., 1994; Perkins et al., 1997). In the last decade, however, computer clusters have had a significant impact in the HPC field. Their costs have made possible that relatively modest research groups or centres have their own cluster. Many cryoEM software packages already have abilities to distribute work on clusters. On the other hand, some of the computational problems in cryoEM are simply decomposed into independent tasks that are then launched to the parallel system. However, others have to be tackled by specially devised HPC strategies, which involve domain decomposition, intertask communications, etc. In the last few years, new computational infrastructures and technologies are emerging in the HPC arena. Their suitability for the complex problems in cryoEM is successfully being explored. This concise review intends to show the growing role of HPC as computational needs to push the limits of cryoEM are progressively increasing. First, the predominant HPC platforms and paradigms currently available for cryoEM labs are introduced. The computational problems in both cryoEM modalities (single particles and tomography) suitable for HPC are then reviewed. The literature describing the HPC strategies used, which is

2

J.J. Fernández / Journal of Structural Biology 164 (2008) 1–6

scattered in journals in the areas of computer science and structural biology, is gathered and put into perspective. The HPC capabilities available in the most common software packages are surveyed (Table 1), evidencing the importance of HPC in addressing future challenges in cryoEM. 2. HPC platforms and programming paradigms Multiprocessor platforms broadly fall into two categories, depending on the memory organization and the interconnection scheme (Hennessy and Patterson, 2007; van Heel et al., 2000). First, tightly coupled shared-memory computers, lately called symmetric multiprocessors (SMPs), are made up of a relatively small number of processors sharing a single centralized memory. Second, distributed-memory computers consist of individual nodes, each containing one or a small number of processors sharing a local memory, interconnected via a bus or a networking technology. An architectural approach for distributed-memory computers, the scalable shared-memory or distributed-shared-memory architecture (DSM), allows the physically separate memory to be addressed as one logically shared address space (i.e. as a virtually unique memory system). In contrast to SMPs, the processors in DSMs have non-uniform memory access (NUMA) as the latency depends on the physical location of the data. The principal use of multiprocessors is for true parallel programming, where a set of tasks execute in a collaborative fashion aiming at minimising the time to solve a problem, which is the primary goal of HPC. When the work carried out by the individual tasks is significant with respect to the global work, the parallelization strategy is called coarse-grain, otherwise fine-grain. Intertask communication is usually needed for data exchange and for synchronization purposes, though some problems, so-called embarrassingly parallel, may be decomposed into completely

independent tasks. There are two primary intertask communication mechanisms (Wilkinson and Allen, 2004). For shared-memory systems (either SMPs or DSMs), communications can be carried out by simple reads/writes through the memory accessible for all the processors. For distributed-memory systems, communications must be performed through the interconnection network in the form of messages among the processors, hence the message-passing paradigm. OpenMP directives (http://www.openmp.org/) and threads are typically used for implementations based on the shared-memory programming paradigm, whereas the standard protocol message-passing interface (MPI, http://www-unix.mcs.anl.gov/mpi/) is commonly used for the latter. Although shared-memory programming is easier, message-passing is more flexible as can be used in both architectural types. In contrast, there are other situations where the parallel system is merely used as a ‘‘computer farm” or ‘‘processor farm”. Here, independent jobs, in general from different users or possibly from an embarrassingly parallel problem, are executed in parallel without interaction at all. The goal is the exploitation of the highthroughput computing (HTC) capabilities of multiprocessors, i.e. maximise the number of completed jobs per time unit. Inter-processor communications may become an issue in distributed-memory systems, especially for larger processor counts. High-bandwidth interconnect technologies, such as Gigabit Ethernet, Myrinet or InfiniBand, supporting up to a 10 Gbit/s data rate, or other proprietary solutions, are currently used to reduce network latency. 2.1. Supercomputers Large-scale supercomputers are distributed-memory multiprocessors, mostly based on the DSM architecture in order to provide virtual shared memory capabilities. Distributed-memory supports

Table 1 HPC capabilities in the most common packages in cryoEM Package

Reference

Modality

Parallelized tasks

Implementation

AUTO3DEM

Yan et al. (2007)

Angular determination; reconstruction

MPI

BSOFT

Reconstruction

Custom

BSOFT EMAN

Heymann and Belnap (2007) Heymann et al. (2008) Ludtke et al. (1999)

FREALIGN

Grigorieff (2007)

Reconstruction; denoising; resolution estimation Classification; angular determination; reconstruction; Helixhunter, foldhunter Angular determination

Custom Custom; MPI; OpenMP, pthreads Custom

IMAGIC

van Heel et al. (1996)

Angular determination; reconstruction

MPI

IMIRS

Liang et al. (2002)

Angular determination; reconstruction

OpenMP, MPI

IMOD

Kremer et al. (1996)

Single particlesI Single particles Tomography Single particles Single particles Single particles Single particlesI Tomography

Custom

PRIISM/IVE SPIDER

Chen et al. (1996) Frank et al. (1996)

CTF correction; reconstruction; denoising; dual-axis tomogram combination Reconstruction; alignment of two tilt series Angular determination; reconstruction; template matching (fitting)

SPIDER UCSF TOMOGRAPHY XMIPP

Frank et al. (1996) Zheng et al. (2007)

Reconstruction; template matching Reconstruction

OpenMP, custom MPI

Classification & alignment via maximum-likelihood; angular determination; reconstruction

MPI, pthreads

Marabini et al. (1996)

Tomography Single particles Tomography Tomography Single particles

Custom, GPUs OpenMP, MPI, custom

Notes: 1. This information has been extracted either from the published articles describing the packages, or from their documentation or website, or going through the source code. 2. Packages specifically designed for icosahedral particles are denoted withI. 3. Implementations based on MPI are intended for distributed-memory machines, such as clusters, but they can also be used for shared-memory platforms. Implementations based on OpenMP and pthreads are, however, only for shared-memory platforms, either SMPs or DSMs. 4. Custom implementation of HPC capabilities is usually done through shell scripts, instead of using standards like MPI, OpenMP or pthreads. In the particular case of Spider, these capabilities are implemented by its PubSub system. Normally, these custom implementations are valid for distributed-memory and shared-memory computers. 5. Links to most of the packages can be found at the wikipedia ‘‘Software tools for molecular microscopy”: http://en.wikipedia.org/wiki/ Software_tools_for_molecular_microscopy.

J.J. Fernández / Journal of Structural Biology 164 (2008) 1–6

the memory bandwidth demands of a large number of processors (up to several thousands). Examples of such integrated supercomputers are Cray T3E, Cray X1, IBM SP2, Intel iPSC/860, Intel Paragon, SGI Origin, SGI Altix, NEC SX, etc. On the other hand, SMP supercomputers can have a relatively small number of processors (at most 64–128), e.g. Sun Sunfire. Traditional supercomputers have been relatively expensive, although technological advances are causing the price to decrease with time. Supercomputers are normally available through institutional HPC centres, shared by many scientists.

3

Condor (http://www.cs.wisc.edu/condor/) is a more sophisticated distributed processing system, capable of taking advantage of the computing platforms from different administrative domains. Finally, computational grids have emerged as powerful large-scale highperformance distributed computing infrastructures where computation and storage can be spread across a myriad of geographically dispersed machines (Foster and Kesselman, 2003; Buetow, 2005). The European EGEE grid (http://www.eu-egee.org/) or the American Teragrid (www.teragrid.org) provide hundreds of teraflops of computing power and dozens of petabytes of storage distributed across thousands computers.

2.2. Clusters 2.5. Single processor optimization Over the past decade, the concept of computer cluster has driven an important trend in HPC (Dongarra et al., 2005; Hennessy and Patterson, 2007). Clusters, comprising an integrated collection of independent computers based on either custom or commodity components interconnected by a network, are delivering unprecedented performance-to-cost ratio and provide exceptional flexibility and scalability. Their low cost allow individual groups or centres to have their own cluster dedicated to their research. Such clusters can provide results faster than waiting in the long job queues of large HPC facilities at supercomputing centres. To some extent, modern parallel supercomputers could be considered as clusters comprised by a replication of custom components connected via a high performance interconnection network. There are usually two levels of parallelism in clusters. The first is the number of nodes connected by the network, which are configured as a distributed-memory system. The second is the number of processors in each node, usually configured as a small SMP. In addition to HPC, clusters are often used for HTC, i.e. as ‘‘computer farms” to execute the jobs launched by many users in a lab. 2.3. Graphics units Vector processing and the SIMD (single instruction, multiple data) model formed the basis of dominant supercomputers in the 1980/90s (Hennessy and Patterson, 2007). They were losing importance with the progressive increase in performance of general-purpose computers. Today vector processing is again becoming important because graphics processing units (GPUs) included in computer video cards rely heavily on the SIMD model. The key idea behind SIMD is that a single instruction operates on many data items (i.e. arrays) at once, using many processing elements. In the last few years, the exploitation of the processing power in GPUs for general-purpose applications (GPGPU, http://www.gpgpu.org/) has been actively investigated, yielding outstanding performance (Pharr and Fernando, 2005). Although programming GPUs may still be tricky, recent development technologies are making it easier (http://www.nvidia.com/object/cuda_home.html). On the other hand, commodity CPUs also include some SIMD instructions for multimedia applications (e.g. MMX, SSE), which may also be applied for general-purpose computations (e.g. 4 floating-point operations at a time) and, in fact, modern compilers can use them to automatically vectorize particular pieces of code. 2.4. Distributed computing Distributed computing technologies, mainly focused on HTC, have been developed to harness the power of heterogeneous computational resources scattered across the lab/centre/campus, or even geographically distributed across continents, in order to tackle complex problems. Peach was developed to distribute independent jobs across the set of computers in a structural biology lab, taking advantage of the idle time on interactive desktop workstations or even computing nodes in a cluster (Leong et al., 2005).

Last but not least, single processor code optimization techniques should always be used to produce efficient software (Wadleigh and Crawford, 2000). Apart from algorithmic modifications to reduce the computational complexity, simple code transformations may yield exceptional reductions in computation time. For instance, algorithmic transformations to ensure an efficient use of the memory hierarchy when accessing data, code reordering to separate dependent instructions, and similar techniques, may boost the software performance. Also, already optimized mathematical libraries (e.g. BLAS for basic linear algebra operations) should be used when applicable. 3. HPC in single particle electron cryomicroscopy Single particle cryoEM allows structure determination at subnanometer resolution routinely at the expenses of enormous computational costs (Zhou, 2008) (e.g. recent near-atomic works needed up to 106 CPU h (Jiang et al., 2008)). HPC has usually been applied to the most demanding tasks, mainly orientation determination, and 3D reconstruction. 3.1. Orientation determination Initially, rough angular parameters of particles are commonly determined against templates by, typically, angular reconstitution (based on common lines) or multi-reference alignment (Frank, 2006). At this step individual particles can be treated independently, so the set of particles is commonly split into disjoint groups that are processed in parallel (Zhou et al., 1998; van Heel et al., 2000). In mid 1990s, however, an opposite approach was used for icosahedral viruses, which also obtained linear speedup (Martino et al., 1994; Lanczycki et al., 1998). Here, the particles were sequentially processed and the angular parameters of each particle were estimated in parallel by partitioning the search over the asymmetric unit. Angular refinement is by far the most time-consuming task in single particle cryoEM. Orientation parameters of particles are iteratively refined by comparing the experimental images with calculated projections from the reconstruction at decreasing angular step sizes. Each iteration involves (1) refinement of the orientations, followed by (2) a whole 3D reconstruction. There have been different approaches for this projection matching procedure, working in different spaces (real, Fourier, Radon, Wavelet) (Fernandez et al., 2006; Frank, 2006; Baker and Cheng, 1996). Different parallel strategies have been proposed to tackle the refinement step (reconstruction will be discussed later). Spider has long used OpenMP for refinement tasks on SMPs. Recently, a distributed-memory approach has been introduced (Yang et al., 2007) that achieves linear speedup in clusters. It consists of distributing the experimental images among the processors and broadcasting the reference projections. The multi-reference 2D alignment can be then executed in parallel on all processors with their own subset of particles. A sim-

4

J.J. Fernández / Journal of Structural Biology 164 (2008) 1–6

ilar approach, available in AUTO3DEM, is applied to projection matching in Fourier space, where the particles are scattered among the processors and the reference reconstruction is broadcasted (Ji et al., 2006; Marinescu and Ji, 2003; Martin and Marinescu, 1998). In general, similar parallel strategies are currently used by standard packages (e.g. EMAN, Frealign, Imagic, and Xmipp) to deal with angular refinement based on projection matching, mostly focused on clusters (e.g. Fotin et al., 2004) and recently on distributed environments using Condor (Jiang et al., 2008). On the other hand, the distributed-memory refinement approach by Martino et al. (1994) and Lanczycki et al. (1998) was based on cross common lines (any two particles share a common line in Fourier space, so their orientations mutually constrain each other). Here, the particles were distributed among the processors, and every processor broadcasted their particles for cross-comparison by other processors. Due to the amount of communications, the speedup dropped for many processors. A shared-memory parallelization of this method (available in IMIRS) showed better speedup as communication latency was avoided, at expenses of memory contention though (Zhou et al., 1998). 3.2. 3D Reconstruction Numerous parallel approaches for reconstruction have been proposed. The first parallelizations of reconstruction algorithms for electron microscopy date back to early-mid 1990s, where weighted backprojection (WBP) and iterative methods were implemented by fine-grain strategies for shared-memory and distributed-memory computers (Zapata et al., 1990, 1992; Carazo et al., 1992; Garcia et al., 1996). The parallelization of WBP, a common method in cryoEM, is straightforward. Spider has a parallel implementation based on its PubSub scheduling system for clusters. Essentially, the images are accessible to all processors, and each processor independently reconstructs a slab of the final 3D reconstruction. At the end, the slabs are assembled to yield the reconstruction. An alternative approach scatters the images across the processors that, in parallel, compute partial, full-sized, reconstructions with their own subset of images. Those partial results are finally summed together via a global reduction operation to yield the final reconstruction at linear speedup factors (BilbaoCastro et al., 2006). This work also implemented parallel versions of iterative methods (e.g. SIRT, ART) using blobs as basis functions, which usually outperform WBP. Here, every processor computes a partial result with its own subset of particles, and the resulting reconstruction at the current iteration (or current processing block) is obtained through a weighted sum of the partial results. This reconstruction is broadcasted again to all processors and a new iteration (or block) is processed, and so forth. Despite the amount of communications per iteration, excellent speedup factors were obtained. Spider has a similar implementation of voxel-based SIRT and Conjugate Gradient with MPI (Yang et al., 2007). The optimization of the parameters associated to these iterative methods has been addressed by grid computing, launching multiple reconstructions using different parameters following a global optimization method (Fernandez et al., 2005; Bilbao-Castro et al., 2007b). On the other hand, distributed-memory implementations of the Fourier method, available in AUTO3DEM, have shown good scalability rates (Marinescu and Ji, 2003; Marinescu et al., 2001). Here, each processor computes a reciprocal slab of the 3D Fourier transform (FT) of the reconstruction. After inverse 2D FTs, global exchanges and inverse 1D FTs, the slabs can be assembled in real-space to yield the final reconstruction. A similar approach is now available in Bsoft. EMAN has an MPI implementation of its iterative Fourier algorithm similar to the approach for real-space iterative methods in Bilbao-Castro et al. (2006). Finally, a parallel implementation of the Fourier–Bessel expansion based

on similar decomposition ideas was presented by Lanczycki et al. (1998). 3.3. Classification and fitting Image classification guarantees that the projections come from the same conformational structure. The common parallel approach for classification follows a strategy similar to that sketched above for multi-reference alignment (e.g. EMAN). Likewise, the iterative method for combined multi-reference classification and refinement based on maximum-likelihood is parallelized by dividing the particles into disjoint subsets to be processed independently, per iteration, on supercomputers and clusters (Scheres et al., 2007). Finer-grain parallelism was used in early 1990s for fuzzy C-means, by scattering images and class representatives across the nodes in distributed-memory computers, making global exchanges at each iteration (Carazo et al., 1992). On the other hand, fitting atomic models within cryoEM maps is important for their interpretation. There is an intensive computational effort to search for the template (i.e. the atomic model properly sampled and filtered) in multiple orientations throughout the cryoEM map by cross-correlation. The searches for the template at different orientations can be run in parallel and, at the end, the results are joined so as to obtain the matching parameters, as implemented in Spider (Rath et al., 2003). 4. HPC in electron cryotomography Electron cryotomography (cryoET) allows structural analyses of complex cellular environments at near-molecular resolution with an unique potential to bridge the gap between molecular and cell biology (Lucic et al., 2005). Reconstruction and tomogram postprocessing are the stages with the highest computational requirements. 4.1. Reconstruction WBP is the standard reconstruction method in electron cryotomography, though iterative methods (e.g. SIRT, ART) are getting increasing interest because of their better performance (Leis et al., 2006). Despite their potential, the computational demands of iterative methods have prevented their extensive use. The singletilt axis geometry typically used in cryoET allows decomposition of the 3D reconstruction problem into multiple reconstructions of slabs of slices orthogonal to the tilt axis. When WBP or voxelbased iterative methods are used, the slabs can be reconstructed in parallel on different processors. This is the basis of most parallel approaches for reconstruction in cryoET to exploit clusters (e.g. Spider, Priism, and Imod). An earlier approach addressed the 3D problem as 2D reconstructions of individual slices, not slabs, following a master–worker model (i.e. a master processor assigns a 2D reconstruction task to worker processors as they become idle), for distributed-memory supercomputers (Perkins et al., 1997). Nevertheless, packaging slices into slabs ameliorates network overhead. Bsoft’s approach is also based on decomposition into slabs, but in the Fourier space, as described for single particles. The use of blobs in iterative methods provides them with a further regularization mechanism, better suited for the conditions in cryoET. However, the overlapping nature of blobs make the slices, hence the slabs, interdependent. Therefore, there must be communications among the processors containing neighbour slices, several times per iteration (Fernandez et al., 2002). When intensive inter-processor communication is involved, latency hiding becomes an issue. That term stands for overlapping communication and computation so as to keep the processors busy while waiting for the communications to be completed. Latency hiding makes

J.J. Fernández / Journal of Structural Biology 164 (2008) 1–6

it possible to keep speedup near optimal (Fernandez et al., 2004; Fritzsche et al., 2005; Alvarez et al., 2007). Recently, it has been shown that the combination of efficient blob-based iterative methods, based on row-action ART and requiring a few iterations, along with cluster computing and single processor optimization makes them viable alternatives to WBP (Fernandez et al., 2008). Decomposition into independent slices or slabs makes this approach well suited for grid or distributed computing too. The Telescience project, an integrated solution for end-to-end tomography that allows remote data acquisition, processing, analysis and archival, computes the tomograms by spreading the WBP reconstruction of individual slices across a computational grid (Peltier et al., 2003; Lee et al., 2003). A later work, focused on the quantitative assessment of grid computing for voxel-based iterative reconstruction, showed that it is possible to find out the optimal slab size for a particular problem and a grid (Fernandez et al., 2007). Under optimal conditions, the speedup may turn out to be exceptional (around 70). That work also confirmed that there are still key issues, such as stability or difficult user–grid interaction, that currently preclude full exploitation of grid computing. That drove us to develop a user-friendly, fault-tolerant application for this purpose (Bilbao-Castro et al., 2007a). Recently, the application of GPUs to reconstruction has been investigated, concluding that outstanding reduction of the computation time (60–80) is achievable (Castano-Diez et al., 2007). Although there are still some limitations (e.g. the instruction limit), there are opportunities to exploit GPUs in cryoEM tasks suitable for vectorization. The power of GPUs are already accessible for cryoET through commercial software (Schoenmakers et al., 2005). Public packages (Priism) also allow exploitation of specific GPUs, even in the form of clusters. On the other hand, multimedia SIMD instructions included in modern CPUs are also being applied to reconstruct several slices at a time. Together with single processor optimization, this approach succeeds in significantly accelerating reconstruction on commodity CPUs (Agulleiro et al., in preparation; preliminary results in Agulleiro et al. (2008)).

5

mented in Spider (Rath et al., 2003) and in Omnimatch/Molmatch for distributed-memory supercomputers (Bohm et al., 2000), clusters (Frangakis et al., 2002; Ortiz et al., 2006) and grids (Lebbink et al., 2007). Alternatively, the tomogram could be divided into independent tiles subjected to the whole search (i.e. all orientations) for the template. Afterwards, subtomograms containing macromolecules can be extracted and their angular parameters can then be refined by iterative procedures. The parallel approaches for refinement in single particle cryoEM are also applicable here. 5. Future trends HPC is expected to play a paramount role in cryoEM to face the challenges ahead. Extraordinary computational needs will have to be overcome so as to reveal structural details of either large macromolecular assemblies at atomic scale or complex pleomorphic specimens at molecular resolution, or to build molecular atlases describing the spatial distribution of proteins and their interactions in the cell environment. HPC will thus be an invaluable resource to cope with those demands. Clusters and distributed infrastructures will be increasingly exploited for independent tasks or subtasks commonly found in cryoEM. Local inexpensive clusters and small SMPs will keep predominant over institutional supercomputers for execution of parallel cryoEM applications. On the other hand, the nature of cryoEM applications makes them well suited to the new emerging platforms, GPUs and multi-cores. The advent of multi-cores (several independent processors sharing on-chip resources and computer memory) has signalled a historic switch in HPC whereby the road to higher performance will be via multiple, rather than faster, processors. There are excellent opportunities to obtain incomparable performance-to-cost ratio by taking advantage of the tremendous computing power in commodity multi-core computers or in GPUs. These trends also turn out to be an inflection point for cryoEM software developers as HPC paradigms should be increasingly used to fully take advantage of these systems.

4.2. Denoising, template matching, and angular refinement Acknowledgments Noise reduction is a post-processing stage that significantly helps scientists interpret tomograms. Anisotropic nonlinear diffusion is the standard denoising method in cryoET, but it is extremely demanding. Single processor optimization has been used in the sequential implementations to cope with the huge memory requirements (Fernandez and Li, 2003). Further memory optimization and parallel implementations have been developed recently, with special focus on clusters of SMPs (Tabik et al., 2007). Here, a hybrid implementation that combines the message-passing and shared-memory paradigms with MPI and pthreads has proven to be better than implementations based solely on MPI. Good scalability has been obtained thanks to the lower communication penalty in the hybrid approach. A different approach is available in Bsoft and Imod, where the tomogram is divided into overlapping tiles (typically with as much overlap as denoising cycles) that can then be processed independently in parallel, at expenses of redundant computation. At the end, the denoised tiles are reassembled. This approach is well suited for distributed and grid computing besides clusters. Template matching allows automated identification of macromolecules in tomograms as well as objective annotation of cellular structures. The tomogram is scanned in search of structures matching templates of the macromolecules or the features of interest. The procedure is similar to that described for fitting atomic models in cryoEM maps, though the computational demands raise dramatically due to the size of tomograms. Therefore, the same straightforward parallelization is applicable and is already imple-

The author is deeply grateful to his colleagues at UAL (Almeria), CNB-CSIC (Madrid) and MRC-LMB (Cambridge). Work supported by EU-LSHG-CT-2004-502828, MEC-TIN2005-00447 and JA-P06TIC1426. References Agulleiro, J.I., Garzon, E.M., Garcia, I., Fernandez, J.J., 2008. Fast tomographic reconstruction with vectorized backprojection. In: Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), IEEE Computer Society, pp. 387–390. Alvarez, J.A., Roca, J., Fernandez, J.J., 2007. Multithreaded tomographic reconstruction. Lect. Notes Comput. Sci. 4757, 81–88. Baker, T.S., Cheng, R.H., 1996. A model-based approach for determining orientations of biological macromolecules imaged by cryoelectron microscopy. J. Struct. Biol. 116, 120–130. Bilbao-Castro, J.R., Carazo, J.M., Garcia, I., Fernandez, J.J., 2006. Parallelization of reconstruction algorithms in three-dimensional electron microscopy. Appl. Math. Model. 30, 688–701. Bilbao-Castro, J.R., Garcia, I., Fernandez, J.J., 2007a. EGEETomo: a user-friendly, faulttolerant and grid-enabled application for 3D reconstruction in electron tomography. Bioinformatics 23, 3391–3393. Bilbao-Castro, J.R., Merino, A., Garcia, I., Carazo, J.M., Fernandez, J.J., 2007b. Parameter optimization in 3D reconstruction on a large scale grid. Parallel Comput. 33, 250–263. Bohm, J., Frangakis, A.S., Hegerl, R., Nickell, S., Typke, D., Baumeister, W., 2000. Toward detecting and identifying macromolecules in a cellular context: template matching applied to electron tomograms. Proc. Natl. Acad. Sci. USA 97, 14245–14250. Buetow, K.H., 2005. Cyberinfrastructure: empowering a ‘third way’ in biomedical research. Science 308, 821–824.

6

J.J. Fernández / Journal of Structural Biology 164 (2008) 1–6

Carazo, J.M., Benavides, I., Rivera, F.F., Zapata, E.L., 1992. Detection, classification and 3D reconstruction of biological macromolecules on hypercube computers. Ultramicroscopy 40, 13–32. Castano-Diez, D., Mueller, H., Frangakis, A.S., 2007. Implementation and performance evaluation of reconstruction algorithms on graphics processors. J. Struct. Biol. 157, 288–295. Chen, H., Hughes, D.D., Chan, T.A., Sedat, J.W., Agard, D.A., 1996. IVE (image visualization environment): a software platform for all three-dimensional microscopy applications. J. Struct. Biol. 116, 56–60. Dongarra, J., Sterling, T., Simon, H., Strohmaier, E., 2005. High performance computing: clusters, constellations, MPPs, and future directions. Comput. Sci. Eng. 7 (2), 51–59. Fernandez, J.J., Li, S., 2003. An improved algorithm for anisotropic nonlinear diffusion for denoising cryotomograms. J. Struct. Biol. 144, 152–161. Fernandez, J.J., Lawrence, A.F., Roca, J., Garcia, I., Ellisman, M.H., Carazo, J.M., 2002. High performance electron tomography of complex biological specimens. J. Struct. Biol. 138, 6–20. Fernandez, J.J., Carazo, J.M., Garcia, I., 2004. Three-dimensional reconstruction of cellular structures by electron microscope tomography and parallel computing. J. Parallel Distrib. Comput. 64, 285–300. Fernandez, J.J., Bilbao-Castro, J.R., Marabini, R., Carazo, J.M., Garcia, I., 2005. On the suitability of biological structure determination by electron microscopy to grid computing. New Gen. Comput. 23, 101–112. Fernandez, J.J., Sorzano, C.O.S., Marabini, R.M.H., Carazo, J.M., 2006. Image processing and 3-D reconstruction in electron microscopy. IEEE Signal Process. Mag. 23 (3), 84–94. Fernandez, J.J., Garcia, I., Carazo, J.M., Marabini, R., 2007. Electron tomography of complex biological specimens on the grid. Futur. Gen. Comp. Syst. 23, 435– 446. Fernandez, J.J., Gordon, D., Gordon, R., 2008. Efficient parallel implementation of iterative reconstruction algorithms for electron tomography. J. Parallel Distrib. Comput. 68, 626–640. Foster, I., Kesselman, C. (Eds.), 2003. The Grid 2e: Blueprint for a New Computing Infrastructure. Morgan Kaufmann. Fotin, A., Cheng, Y., Sliz, P., Grigorieff, N., Harrison, S.C., Kirchhausen, T., Walz, T., 2004. Molecular model for a complete clathrin lattice from electron cryomicroscopy. Nature 432, 573–579. Frangakis, A.S., Bohm, J., Forster, F., Nickell, S., Nicastro, D., Typke, D., Hegerl, R., Baumeister, W., 2002. Identification of macromolecular complexes in cryoelectron tomograms of phantom cells. Proc. Natl. Acad. Sci. USA 99, 14153–14158. Frank, J., 2006. Three-Dimensional Electron Microscopy of Macromolecular Assemblies. Visualization of Biological Molecules in Their Native State. Oxford University Press. Frank, J., Radermacher, M., Penczek, P., Zhu, J., Li, Y., Ladjadj, M., Leith, A., 1996. SPIDER and WEB: processing and visualization of images in 3D electron microscopy and related fields. J. Struct. Biol. 116, 190–199. Fritzsche, P.C., Fernandez, J.J., Ripoll, A., Garcia, I., Luque, E., 2005. A performance prediction model for tomographic reconstruction in structural biology. Lect. Notes Comput. Sci. 3402, 90–103. Garcia, I., Roca, J., Sanjurjo, J., Carazo, J.M., Zapata, E.L., 1996. Implementation and experimental evaluation of the constrained ART algorithm on a multicomputer system. Signal Process. 51, 69–76. Grigorieff, N., 2007. FREALIGN: high-resolution refinement of single particle structures. J. Struct. Biol. 157, 117–125. Henderson, R., 2004. Realising the potential of electron cryomicroscopy. Q. Rev. Biophys. 37, 3–13. Hennessy, J.L., Patterson, D.A., 2007. Computer Architecture: A Quantitative Approach. Morgan Kaufmann. Heymann, J.B., Belnap, D.M., 2007. Bsoft: image processing and molecular modeling for electron microscopy. J. Struct. Biol. 157, 3–18. Heymann, J.B., Cardone, G., Winkler, D.C., Steven, A.C., 2008. Computational resources for cryo-electron tomography in Bsoft. J. Struct. Biol. 161, 232–242. Jiang, W., Baker, M.L., Jakana, J., Weigele, P.R., King, J., Chiu, W., 2008. Backbone structure of the infectious epsilon15 virus capsid revealed by electron cryomicroscopy. Nature 451, 1130–1134. Ji, Y., Marinescu, D.C., Zhang, W., Zhang, X., Yan, X., Baker, T.S., 2006. A model-based parallel origin and orientation refinement algorithm for cryoTEM and its application to the study of virus structures. J. Struct. Biol. 154, 1–19. Kremer, J., Mastronarde, D., McIntosh, J.R., 1996. Computer visualization of threedimensional image data using IMOD. J. Struct. Biol. 116, 71–76. Lanczycki, C.J., Johnson, C.A., Trus, B.L., Conway, J.F., Steven, A.C., Martino, R.L., 1998. Parallel computing strategies for determining viral capsid structure by cryoelectron microscopy. Comput. Sci. Eng. 5 (2), 76–91. Lebbink, M.N., Geerts, W.J.C., van der Krift, T.P., Bouwhuis, M., Hertzberger, L.O., Verkleij, A.J., Koster, A.J., 2007. Template matching as a tool for annotation of tomograms of stained biological structures. J. Struct. Biol. 158, 327–335. Lee, D., Lin, A.W., Hutton, T., Akiyama, T., Shinji, S., Lin, F.P., Peltier, S., Ellisman, M.H., 2003. Global telescience featuring IPv6 at iGrid2002. Futur. Gen. Comp. Syst. 19, 1031–1039.

Leis, A.P., Beck, M., Gruska, M., Best, C., Hegerl, R., Baumeister, W., Leis, J.W., 2006. Cryo-electron tomography of biological specimens. IEEE Signal Process. Mag. 23 (3), 95–103. Leong, P.A., Heymann, J.B., Jensen, G.J., 2005. Peach: a simple Perl-based system for distributed computation and its application to cryo-EM data processing. Structure 13, 505–511. Liang, Y., Ke, E.Y., Zhou, Z.H., 2002. IMIRS: a high-resolution 3D reconstruction package integrated with a relational image database. J. Struct. Biol. 137, 292– 304. Lucic, V., Forster, F., Baumeister, W., 2005. Structural studies by electron tomography: from cells to molecules. Annu. Rev. Biochem. 74, 833–865. Ludtke, S.J., Baldwin, P.R., Chiu, W., 1999. EMAN: semiautomated software for highresolution single-particle reconstructions. J. Struct. Biol. 128, 82–97. Marabini, R., Masegosa, I.M., San Martin, M.C., Marco, S., Fernandez, J.J., de la Fraga, L., Vaquerizo, C., Carazo, J.M., 1996. Xmipp: an image processing package for electron microscopy. J. Struct. Biol. 116, 237–240. Marinescu, D.C., Ji, Y., 2003. A computational framework for the 3D structure determination of viruses with unknown symmetry. J. Parallel Distrib. Comput. 63, 738–758. Marinescu, D.C., Ji, Y., Lynch, R.E., 2001. Space-time tradeoffs for parallel 3D reconstruction algorithms for virus-structure determination. Concurr. Comput. Pract. Exp. 13, 1083–1106. Martin, I.M., Marinescu, D.C., 1998. Concurrent computation and data visualization for spherical-virus structure determination. Comput. Sci. Eng. 5 (4), 40–52. Martino, R.L., Johnson, C.A., Suh, E.B., Trus, B.L., Yap, T.K., 1994. Parallel computing in biomedical research. Science 265, 902–908. Ortiz, J.O., Forster, F., Kurner, J., Linaroudis, A.A., Baumeister, W., 2006. Mapping 70s ribosomes in intact cells by cryoelectron tomography and pattern recognition. J. Struct. Biol. 156, 334–341. Peltier, S.T., Lin, A.W., Lee, D., Mock, S., Lamont, S., Molina, T., Wong, M., Dai, L., Martone, M.E., Ellisman, M.H., 2003. The telescience portal for tomography applications. J. Parallel Distrib. Comput. 63, 539–550. Perkins, G.A., Renken, C.W., Song, J.Y., Frey, T.G., Young, S.J., Lamont, S., Martone, M.E., Lindsey, S., Ellisman, M.H., 1997. Electron tomography of large, multicomponent biological structures. J. Struct. Biol. 120, 219–227. Pharr, M., Fernando, R., 2005. GPU Gems 2: Programming Techniques for HighPerformance Graphics and General-Purpose Computation. Addison-Wesley Professional. Rath, B.K., Hegerl, R., Leith, A., Shaikh, T.R., Wagenknecht, T., Frank, J., 2003. Fast 3D motif search of EM density maps using a locally normalized cross-correlation function. J. Struct. Biol. 144, 95–103. Robinson, C.V., Sali, A., Baumeister, W., 2007. The molecular sociology of the cell. Nature 450, 973–982. Scheres, S.H.W., Gao, H., Valle, M., Herman, G.T., Eggermont, P.P., Frank, J., Carazo, J.M., 2007. Disentangling conformational states of macromolecules in 3D-EM through likelihood optimization. Nat. Methods 4, 27–29. Schoenmakers, R.H.M., Perquin, R.A., Fliervoet, T.F., Voorhout, W., Schirmacher, H., 2005. New software for high resolution, high throughput electron tomography. Microsc. Anal. 19 (4), 5–6. Tabik, S., Garzon, E.M., Garcia, I., Fernandez, J.J., 2007. High performance noise reduction for biomedical multidimensional data. Digit. Signal Prog. 17, 724– 736. van Heel, M., Harauz, G., Orlova, E.V., Schmidt, R., Schatz, M., 1996. A new generation of the IMAGIC image processing system. J. Struct. Biol. 116, 17–24. van Heel, M., Gowen, B., Matadeen, R., Orlova, E.V., Finn, R., Pape, T., Cohen, D., Stark, H., Schmidt, R., Schatz, M., Patwardhan, A., 2000. Single-particle electron cryomicroscopy: towards atomic resolution. Q. Rev. Biophys. 33, 307–369. Wadleigh, K.R., Crawford, I.L., 2000. Software Optimization for High Performance Computers. Prentice Hall PTR. Wilkinson, B., Allen, M., 2004. Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers (2nd ed). Prentice Hall. Yan, X., Sinkovits, R.S., Baker, T.S., 2007. AUTO3DEM—an automated and high throughput program for image reconstruction of icosahedral particles. J. Struct. Biol. 157, 73–82. Yang, C., Penczek, P.A., Leith, A., Asturias, F.J., Ng, E.G., Glaeser, R.M., Frank, J., 2007. The parallelization of SPIDER on distributed-memory computers using MPI. J. Struct. Biol. 157, 240–249. Zapata, E.L., Carazo, J.M., Benavides, I., Walther, S., Peskin, R., 1990. Filtered back projection on shared memory multiprocessors. Ultramicroscopy 34, 271–282. Zapata, E.L., Benavides, I., Rivera, F.F., Bruguera, J.D., Pena, T.F., Carazo, J.M., 1992. Image reconstruction on hypercube computers: application to electron microscopy. Signal Process. 27, 51–64. Zheng, S.Q., Keszthelyi, B., Branlund, E., Lyle, J.M., Braunfeld, M.B., Sedat, J.W., Agard, D.A., 2007. UCSF tomography: an integrated software suite for real-time electron microscopic tomographic data collection, alignment, and reconstruction. J. Struct. Biol. 157, 138–147. Zhou, Z.H., 2008. Towards atomic resolution structural determination by singleparticle cryo-electron microscopy. Curr. Opin. Struct. Biol. 18, 218–228. Zhou, Z.H., Chiu, W., Haskell, K., Spears, H., Jakana, J., Rixon, F.J., Scott, L.R., 1998. Refinement of herpesvirus B-capsid structure on parallel supercomputers. Biophys. J. 74, 576–588.

High performance computing in structural determination ...

Accepted 7 July 2008. Available online 16 July 2008 ... increasing complexity of algorithms and the amount of data needed to push the resolution limits. High performance ..... computing power and dozens of petabytes of storage distributed.

186KB Sizes 0 Downloads 203 Views

Recommend Documents

Advances in High-Performance Computing ... - Semantic Scholar
tions on a domain representing the surface of lake Constance, Germany. The shape of the ..... On the algebraic construction of multilevel transfer opera- tors.

Advances in High-Performance Computing ... - Semantic Scholar
ement module is illustrated on the following model problem in eigenvalue computations. Let Ω ⊂ Rd, d = 2, 3 be a domain. We solve the eigenvalue problem:.

High Performance Computing
Nov 8, 2016 - Faculty of Computer and Information Sciences. Ain Shams University ... Tasks are programmer-defined units of computation. • A given ... The number of tasks that can be executed in parallel is the degree of concurrency of a ...

High Performance Computing
Nov 29, 2016 - problem requires us to apply a 3 x 3 template to each pixel. If ... (ii) apply template on local subimage. .... Email: [email protected].

High Performance Computing
Dec 20, 2016 - Speedup. – Efficiency. – Cost. • The Effect of Granularity on Performance .... Can we build granularity in the example in a cost-optimal fashion?

High Performance Computing
Nov 1, 2016 - Platforms that support messaging are called message ..... Complete binary tree networks: (a) a static tree network; and (b) a dynamic tree ...

High Performance Computing
Computational science paradigm: 3) Use high performance computer systems to simulate the ... and marketing decisions. .... Email: [email protected].

SGI UV 300RL - High Performance Computing
By combining additional chassis (up to eight per standard 19-inch rack), UV 300RL is designed to scale up to 32 sockets and 1,152 threads (with hyper threading enabled). All of the interconnected chassis operate as a single system running under a sin

pdf-175\high-performance-computing-in-science-and-engineering ...
... the apps below to open or edit this item. pdf-175\high-performance-computing-in-science-and-en ... rformance-computing-center-stuttgart-hlrs-2013-f.pdf.

Ebook Introduction to High Performance Computing for ...
Book synopsis. Suitable for scientists, engineers, and students, this book presents a practical introduction to high performance computing (HPC). It discusses the ...

pdf-0743\high-performance-cluster-computing-programming-and ...
... the apps below to open or edit this item. pdf-0743\high-performance-cluster-computing-programming-and-applications-volume-2-by-rajkumar-buyya.pdf.

pdf-0743\high-performance-cluster-computing-programming-and ...
... the apps below to open or edit this item. pdf-0743\high-performance-cluster-computing-programming-and-applications-volume-2-by-rajkumar-buyya.pdf.

Bridging the High Performance Computing Gap: the ...
up by the system and the difficulties that have been faced by ... posed as a way to build virtual organizations aggregating .... tion and file transfer optimizations.