DEISA Services and Operations
The purpose of this section is to discuss how the DEISA Supercomputing Grid will operate, because this is essential to understand the type of applications that can be deployed and run in this infrastructure. The other important point is to clearly understand how the DEISA strategy for the integration of supercomputing resources adds substantial value to pre-existing national infrastructures.
The strong integration of IBM AIX systems aims at providing a single system image of a distributed supercomputer. This will be fully transparent to end users, that will access the super-cluster through the site in which the have a login.
The fundamental purpose of the AIX super-cluster operation is running bigger and more demanding applications than the ones that can be run today on each national cluster. One possibility of doing this would be to “grid enable” the application so that it can run on more than one platform. However, this strategy – that requires a modification of the application - does not really work for tightly coupled parallel applications. In this case, the finite signal propagation velocity induces MPI communication latencies in a wide area network that are intolerable for high performance computing.
DEISA adopts a different strategy, based on load balancing the computational workload across national borders. Huge, demanding applications are run by reorganizing the global operation in order to allocate substantial resources in one site. They are therefore runs “as such” with no modification. This strategy only relies on network bandwidths, which will keep improving in the years to come.
The other benefit of the AIX super-cluster comes from the possibility of transparently sharing data through GPFS. European data repositories that require frequent updates – like bio-informatics databases, for example – can be established in one site and accesses by all the others.
Altogether, the AIX super-cluster provides to end users most of the benefits of a unique, 20 teraflops supercomputer, in a transparent way.
The software architecture of the DEISA Grid is specific to the needs and the requirements of a virtual European supercomputing centre, where one is dealing with a limited number of huge and well identified supercomputing resources, with applications that are most often tuned to a specific platform. Many interesting subjects in Grid computing (resource discovery, economic scheduling, etc) are not of primordial importance here. If a user submits a job to a Linux cluster, he does not expect the job to be run in a vector supercomputer. The DEISA Grid services focus on four strategic kinds of services:
Support for workflow applications
Workflow applications are those that need to “visit” successively different computing resources to accomplish a complex simulation. The simplest case one can imagine is a simulation in which pre-processing, number crunching and post-processing are performed on different platforms. This may be convenient because different platforms are assigned different roles for which they are particularly efficient. Or it may happen because the application is manipulating a distributed data set, and it is more convenient to pre or post process the data locally before sending it through the network.
This is by no means the most general case. Workflow applications may integrate more than data manipulations in computing platforms. They can include, for example, data acquisition stages coupled to external instruments. In any case, as the systems that are simulated grow in complexity, workflow applications become more and more relevant.
Workflow applications are fully supported from the start by the DEISA Grid through UNICORE, a European middleware developed initially by the German supercomputing centres. UNICORE has been tailored for workflow applications. It is capable of handling them as a unique job, and to perform in the background all the required data transfers.
UNICORE is supported today by one of the DEISA partners, FZJ-Jülich. It has benefited from development efforts provided by several European projects, and has reached the maturity required for production class grid environments. The UNICORE service on the DEISA Grid will soon be operational.
Global data management
Large simulations take very often input from large data sets repositories, and produce substantial amounts of new data. Data management is at the heart of high performance computing. All national supercomputing facilities operate sophisticated and efficient data management systems, principally file servers and hierarchical storage systems.
DEISA intends to deploy and operate a global data management infrastructure with continental scope, to serve the applications deployed on the supercomputing Grid. The building blocks and most of the required software technologies are already available. On one side, there is the possibility of extending the global distributed file systems concept to a heterogeneous environment, to enable transparent data sharing across different types of platforms. A number of technology options are currently being assessed and evaluated. One of the most important issues here is the interoperability of AIX and Linux systems. DEISA is exploring with IBM the possibility of providing GPFS clients operating on non-IBM Linux-on Itanium systems. These systems would then be able to access data exported by the AIX super-cluster.
However, data sharing through global distributed file systems is only one aspect of global data management (that is, allowing different applications or platforms to access the same data). The opposite service is also needed, namely, enabling efficient access to distributed data sets. Workflow or distributed applications may need to operate on data sets residing in different national data management services. Standalone applications that for efficiency reasons are run in remote platforms require efficient access to their home data management services. This leads to the deployment of a coherent global architecture involving efficient data transfers, data staging, and data storage, which should optimize data management at a continental scale.
The deployment of the global data management infrastructure will be carried out in 2005. DEISA relies very strongly on the next generation European and national network infrastructures to meet the performance requirements imposed by high performance computing.
Co-scheduling services for distributed applications
DEISA has decided to implement a co-scheduling service on the supercomputing grid, to enable grid applications that run concurrently on different platforms. We have said before that it is not a good idea to grid enable homogeneous, tightly coupled parallel applications. However, there are a number of sophisticated multi-scale, multi-physics applications composed of loosely coupled independent software modules that need to exchange information in real time with limited communications overhead. Classical examples are ocean-atmosphere or fluid-structure coupled codes.
The coupled software components address in general completely different physical systems, and use different algorithms and numerical methods. It is not surprising that they may run most efficiently in one specific platform. These applications may therefore be efficiently mapped to a heterogeneous supercomputing grid. Co-scheduling services are needed to be able to run them without inducing unacceptable perturbations of the national services. Discussions with technology providers are under way. DEISA expects to deploy this service in 2006.
DEISA has a very strong support policy for applications of this kind, which are at the forefront of modern software engineering. A few of them are already in operation. Specialized support for development and deployment of distributed applications is provided by the infrastructure.
Portals and Web services
Portals and Web interfaces are critical to enhance the user adoption of sophisticated supercomputing infrastructures, by hiding from them the complexities of the computational environment. This is a major priority for DEISA.
The lines of action in this area are extremely diversified, and the DEISA strategy is driven by the requirements of the DEISA applications. We provide here two examples:
- UNICORE has the capability of accepting “plugins” that adapt the graphical interface of the UNICORE client to a specific application. In this way, portals to applications can be developed, which enormously simplify the work of preparing the appropriate data sets and launching them.
- For demanding problems in Bio-informatics, the multithreaded versions of BLAST running on some of the huge shared memory nodes of the AIX super-cluster are much more efficient than the MPI versions that run normally on Linux clusters. A bio-informatics centre in France (InfoBioGen) is currently acting as a portal to the AIX super-cluster. Biologists continue to submit BLAST jobs to their natural working environment at InfoBiogen, and are not DEISA aware. The InfoBioGen staff reroute the most demanding jobs to IDRIS.