Parallel-Scalable Single-Particle Analysis in 3D Electron Microscopy
| Project | PS-3DEM |
| Research Area | Bio Sciences |
| Principal Investigator(s) | Carlos Sánchez |
| Institution(s) |
|
Abstract
Structural biology aims at the elucidation of the three-dimensional (3D) structure of biological macromolecular complexes in order to fully understand its function in the live cell. A widely used approach to collect such structural information is imaging tens of thousands of copies of the same complex in an electron microscope. This is called Single-Particle Analysis. These images are extremely noisy and deformed due to the imperfect optical system. Therefore, they have to be intensively processed to obtain the 3D structure at high resolution that is essential for studying the relation structure-function. The whole field is usually called 3D Electron Microscopy (3DEM).
Currently, solving a structure at high (subnanometer) resolution typically takes 50000 CPU-hours. Note that this time would be even longer without the parallelization of the most time consuming steps. Such long elapsed CPU time is mostly due to the fact that the code is not maximally optimized but also to the fact that the same sequence of image analysis steps has to be repeated several times for the same structure (e.g., the procedure has to be repeated with new images collected for a new specimen that is purer or more stable biochemically or the procedure has to be repeated with new values for the parameters involved in different image analysis steps). Although not much can be done as a remedy to the latter problem, there is still an important gain in the speed that can be obtained if the code is optimized.. Also, one should always have in mind that future structural biology problems will require to process even larger data sets, which will necessitates applications with high throughput (large amount of work performed within a given time). For instance, one has to process 50000-100000 images of isolated particles in each run for a currently typical high-resolution structure (0.8-1 nm resolution). We estimate that one will need 5-10 times more images to achieve quasi-atomic resolution (0.4 nm) in the case of completely asymmetric structures.
Over the last 20 years, we have developed at the National Centre of Biotechnology (CSIC) an open-source software package (Xmipp, http://xmipp.cnb.csic.es) to address this 3D reconstruction problem. The package is publicly available and it is used by structural biology groups all over the world. The package includes parallel capabilities to reduce the computing time. In this project we aim at optimizing the steps involved in the 3D analysis of single-particles, which shall transform the XMIPP into a high throughput application. Indeed, the computing time with optimized XMIPP will be further reduced. This will increase the productivity of structural biologists and optimize the use of available.
The group at the National Centre of Biotechnology (CSIC) has been appointed to lead the Image Processing Centre for Microscopy associated to INSTRUCT (the European Infrastructure for Structural Biology), thanks in part to the image processing developments that are collected in the aforementioned software package. The objective of such centre is to provide support to structural biologists all over Europe in performing image analysis steps towards the computation of high-resolution (0.5-0.8 nm) 3D structure of macromolecular assemblies of biological interest. The high-resolution structure is essential for studying the function of these macromolecules in living cells. Typically, the atomic-resolution structures (0.1-0.2 nm) of the domains of a macromolecular complex computed by other experimental methods (X-ray crystallography or NMR) or theoretical methods (prediction) will be positioned into the structure obtained by electron microscopy and the regions corresponding to the interface between the neighbouring domains will be studied to understand what allows these domains to assemble and what is their role in the complex.


