Research in a Connected World by Alex Voss, Elizabeth Vander Meer, David Fergusson - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 2Distributed Systems

2.1The European e-Infrastructure Ecosystem*

Key Concepts
  • Managed e-Infrastructures

  • Different needs for general-purpose and specialised e-Infrastructures

  • Layers of e-Infrastructures and the role of standards

  • Academic and commercial e-Infrastructures

  • Convergence of e-Infrastructures

Introduction

e-Research requires seamless access to computational, storage, and network resources, which can be provided by a variety of different means ranging from volunteer systems, to community based infrastructures, to general-purpose infrastructures federating resources across different institutions. These resources are made available to different scientific communities via well-defined protocols and interfaces exposed by a software layer (Grid middleware). Such federated infrastructures are referred to as e-Infrastructures and provide a number of advantages to researchers and service providers alike.

Of these different approaches, well managed e-Infrastructures are of particular importance. Apart from enabling seamless access to heterogeneous, independently managed resources, well managed e-Infrastructures also provide their users with common operational procedures such as accounting, support and support systems, and usage policies (etc). Moreover, different service levels can be negotiated, allowing the user to establish service level agreements with such an e-Infrastructure. As a consequence, researchers experience the usage of an e-Infrastructure in the same way as using a single system managed by a local resource provider. It is important to realise that in order to achieve this deployment of standardised services is needed as well as a harmonization of operational and security procedures across the different independent resource providers. Multi-purpose e-Infrastructures are also desirable from a resource provider point of view, as a single infrastructure can serve several communities and thus reduce the need for dedicated community services that require additional operational effort.

However, it is unlikely that a single common infrastructure will eventually be able to serve all needs as different legislative regulations, usage models, and other regional or thematic peculiarities demand the creation of separate e-Infrastructures. As a consequence, national and regional e-Infrastructures as well as thematic ones like e-Infrastructures focusing on the federation of supercomputing resources have emerged. Europe, through ambitious national research and infrastructure programs and dedicated European Commission programs, is playing a leading role in building multi-national, multi-disciplinary e-Infrastructures and has devised a roadmap for a pan-European e-Infrastructure. This road-map acknowledges the need for different infrastructures but also envisages these infrastructures embedded in an ecosystem that allows users to easily access resources managed by different infrastructures.

Two Ecosystem Paths – the EGI and PRACE Infrastructures

The establishment of a European e-Infrastructure ecosystem is currently progressing along two distinct paths: the EGI and PRACE. The European Grid Initiative (EGI) intends to federate national and regional e-Infrastructures, managed locally by National Grid Initiatives (NGIs) into a pan-European, general-purpose e-Infrastructure as pioneered by the EGEE (Enabling Grids for E-sciencE) project that unites thematic, national and regional Grid initiatives. EGI is a direct result of the European e-Infrastructure Reflection Group (e-IRG) recommendation to develop a sustainable base for European e-Infrastructures. Most importantly, funding schemes are being changed from short term project funding (like 2 years funding periods in the case of EGEE) to sustained funding on a national basis. This provides researchers with the long-term perspective needed for multi-year research engagements. All European countries support the EGI vision and at the time of writing the organisational and legal details are being defined with the aim of starting EGI in 2010.

Unlike EGEE, which has strong central control, EGI will consist of largely autonomous NGIs with a lightweight coordination entity on the European level. This setup asks for an increased usage of standardised services and operational procedures to enable a smooth integration of different NGIs exposing a common layer to the user while preserving their own autonomy. In this context the work of the Open Grid Forum (OGF) is of particular importance. OGF aims at standardising services for e-Infrastructures and increasingly tackles operational issues.

At the same time, the e-IRG has recognised the need to provide European researchers with access to petaflop-range supercomputers in addition to high-throughput resources that are prevailing in EGI. The Partnership for Advanced Computing in Europe (PRACE) aims to define the legal and organisational structures for a pan-European High Performance Computing (HPC) service in the petaflops range. These petaflop range systems are supposed to complement the existing European HPC e-Infrastructure as pioneered by the Distributed European Infrastructure for Supercomputing Applications (DEISA) project. DEISA is federating major European HPC centres in a common e-Infrastructure providing seamless access to supercomputing resources and, thanks to a global shared file system, data stored at the various centres. This leads to a three tier structure, with the European petaflops systems at tier 0 being supported by leading national systems at tier 1. Regional and midrange systems complement this HPC pyramid at tier-2 as depicted in Figure 1 below.

The PRACE HPC Ecosystem
Figure 2.1
The PRACE HPC Ecosystem

Although the goals of PRACE/DEISA are similar to the ones of EGI/EGEE, the different usage and organisational requirements demand a different approach and hence the establishment of two independent, yet related infrastructures. For researchers it is important, however, to have access to all infrastructures in a seamless manner, hence a convergence of the services and operational models in a similar way as discussed in the EGI/NGI case above will be needed.

Layers of the Ecosystem

This convergence and the addition of other tools (like sensors, for instance) will eventually build the computing and data layer of the e-Infrastructure ecosystem. Leveraging the physically wide area connectivity provided by the network infrastructure (operated by GÈANT and the National Research and Education Networks in Europe) this computing and data layer facilitates the construction of domain specific knowledge layers that provide user communities with higher level abstractions, allowing them to focus on their science rather than the computing technicalities. The resulting three-layered ecosystem is depicted in Figure 2 below.

e-Infrastructure Ecosystem
Figure 2.2
e-Infrastructure Ecosystem

Europe is also active in technology transfer to foster the usage of e-Infrastructures in other areas such as South America, India, China, Asia-Pacific and the Mediterranean. These efforts, together with similar efforts in the US, Japan, and Australia, ensure that large parts of the world are covered by e-Infrastructures as shown in Figure 3 below.

Worldwide e-Infrastructure Coverage
Figure 2.3
Worldwide e-Infrastructure Coverage

Clouds in the Ecosystem

In the commercial sector, dynamic resource and service provisioning as well as "pay per use" concepts are being pushed to the next level with the introduction of "cloud computing", successfully pioneered by Amazon with their Elastic Compute Cloud (EC2) and Simple Storage Service (S3) offerings. Many other major IT businesses offer cloud services today, including Google, IBM, and Microsoft. Using virtualisation techniques, these infrastructures allow dynamic service provisioning and give the user the illusion of having access to virtually unlimited resources on demand. This computing model is particularly interesting for start-up ventures with limited IT-resources as well as for the dynamic provisioning of additional resources to cope with peak demands, rather than over-provisioning ones own infrastructure. As of today the usability of these commercial offerings for research remains yet to be shown although a few promising experiments have already been performed. In principle, clouds could be considered yet another resource provider in e-Infrastructures, however, it is currently not trivial to bridge the different interfaces and operational procedures to provide the researcher with a seamless infrastructure. More work on interface standardisation and on how commercial offerings can be made part of the operations of academic research infrastructures is needed.

Summary: The Need for Convergence

In summary, a variety of different e-Infrastructures are available today to support e-Research. Convergence of these infrastructures in terms of interfaces and policies is needed to provide researchers with seamless access to the resources required for her research, independently of how the resource provisioning is actually managed. Eventually, a multi-layer ecosystem will greatly reduce the need for scientists to manage their computing and data infrastructure, with a knowledge layer eventually providing high-level abstractions according to the needs of different disciplines. Initial elements of such an e-Infrastructure ecosystem already exist and Europe is actively striving for sustainability to ensure that it continues to build a reliable basis for e-Research.

2.2The EGEE Distributed Computing Infrastructure*

Key Concepts
  • gLite middleware, providing access to shared resources

  • Security and access to resources

  • Data storage

  • Compute resources

  • Deployment of the infrastructure

  • Application development

Introduction

The Enabling Grids for E-sciencE (EGEE) project provides an e-Research platform to the European research community and their international collaborators for high throughput data analysis for over 17,000 users across 160 projects. With a heritage stretching back over nearly a decade, EGEE-III (and its proceeding projects EGEE-II, EGEE-I and the European Data Grid) is a 32M € project funded by the European Commission to implement and deploy a distributed computing infrastructure to support researchers in many scientific domains, such as astrophysics, biomedicine, computational chemistry, earth sciences, high energy physics, finance, fusion, geophysics and multimedia. In addition, there are several applications from business sectors running on the EGEE Grid, such as applications from geophysics and the plastics industry. This chapter introduces the EGEE project in detail, considering middleware, security issues, access to information, data, compute resources, deployment and application development. It concludes with a look at sustainability.

EGEE nodes mapped onto globe
Figure 2.4
EGEE nodes mapped onto the globe

Facilitating Access To Shared Resources – the gLite Middleware

Grids are characterised by decentralised access to shared resources. These resources consist of computers, disks, and the network connections that link them together. Seamless, secure and scalable access to these resources, which may be owned by different organisations and could encompass different operating systems and architectures, is provided through software called middleware. Organisations that wish to cooperatively share their resources with their collaborators can do so without central control.

The gLite middleware distribution produced by the EGEE project is composed primarily of open-source software from many sources – some developed within the project and others from external providers. This software is integrated into a single software distribution before being tested and made available to sites for installation.

Layers of the EGEE e-Infrastructure
Figure 2.5
Layers of the EGEE e-Infrastructure

The software services used within gLite are there to enable researchers, through their own applications, to access the physical resources (disks, computers, instruments) that are attached to the EGEE infrastructure. These services have defined interfaces which allow developers to build their own applications.

Security

The resources that make up the EGEE infrastructure are extremely valuable and access to these resources needs to be strictly controlled. Access to resources within EGEE is restricted to members of research collaborations (commonly called Virtual Organisations). Some resources may only allow individuals from a single Virtual Organisation to access their resources, while other resource providers will provide a shared resource to multiple Virtual Organisations spanning several research communities.

To join a Virtual Organisation you need to be able to prove who you are electronically. This is similar to the way that a passport is used to prove your identity when you cross international borders. Within the Grid community this is frequently done through the use of a certificate – generally issued by proving your identity to someone at your local institute. Some organisations allow you to generate a certificate through your existing ability to access your organisation’s own network.

Once you are able to identify yourself you can apply to join a Virtual Organisation. Different Virtual Organisations exist to cover the needs of different communities. A community may have more than one Virtual Organisation within it with each one having different entry criteria and possibly providing access to different resources.

Information

With many thousands of services potentially available to a user, discovering which one to use presents many challenges. The infrastructure is continually changing – services are appearing, disappearing or being upgraded as the sites evolve. Being able to discover, in near real-time, the types of services that are available, the Virtual Organisations that are able to access them, and the characteristics of each service (i.e. the data that it stores or the speed of the processors), and the load on the service, are all information points that can drive which service to select.

The information collected by gLite on the resources within the infrastructure can be presented in many ways. The information can be browsed directly through a web portal, searched manually through command line tools, or programmatically from within an application.

Data

Many of the researchers that use EGEE’s infrastructure do so in order to analyse data stored in files. Frequently these files are stored at locations different from the currently available computational resources. EGEE provides services that allow users to retrieve a file from off-line tape storage onto disk, and to then move that file to the site where the computational resources for that user is going to be available. (This method of data storage and retrieval is normal in high energy physics experiments and frequently used in other communities dealing with large archived data sets such as climate modelling and satellite observation records.)

How does a user locate the file that they need to use on an infrastructure where the location of files is continually changing? File catalogues run by some communities provide a register where a ‘logical’ file name can be mapped to a number of physical replicas. Having files stored in multiple places has many benefits - files are still available even if one of the sites storing the files is temporarily disconnected from the network or the service is down. Software can be written to exploit the distributed location of the files so as to run an application on the computing resources located near to their storage location – thereby reducing the time taken to move the files from their storage location to the compute resources.

Many of EGEE's sites are linked together through dedicated network connections. EGEE provides a service that is able to coordinate the bulk transfer of files across the network that allows the connection to be shared between different communities and file transfers to be managed and prioritised.

Computing

Key to nearly all communities supported by EGEE is the ability to analyse file-based data. Generally, applications need to be installed on the compute resources before they can be used to analyse their data. Their availability on a resource is something that can be advertised through the information system and allows the user to select the resources where their applications are already available. These applications are then started through services that encapsulate the compute resource – regardless of the operating system or the internal structure of the compute cluster that will be used to analyse the data.

The user, or their application, uses the EGEE information service to select a compute resource that they have access to and where their application is installed. Any input files needed by the application are transferred from the user’s computer, located in storage out on the EGEE infrastructure through the file catalogue, or by knowing its explicit location, to the compute resource that will be used for the analysis. Once the file is in place, the request to start the application and analyse the file, is passed to the compute service. When the analysis is complete any output files will be available on the compute resource. If the user wishes the output files to remain available for future use they will need to be transferred back to the user’s desktop or stored elsewhere.

Within EGEE some user communities undertake this process manually by knowing where their files are and which compute resources they wish to use. Other communities have written their own applications that directly mimic the manual processes thereby simplifying the life of the user. EGEE provides a generic resource brokering service that is able to automatically perform these tasks for many of the core scenarios previously done manually by a user.

EGEE Architecture
Figure 2.6
Overview of the architecture of EGEE

Accounting

As the compute, storage and network resources are contributed by different organisations for shared use by groups outside their organisations, it is important that this use is accounted for. Many organisations share their resources through ‘service level agreements’ that specify the proportions of the resource that can be used by different communities. Within EGEE the use of individual computing resources is accounted for and recorded centrally for later analysis and reporting. These usage agreements are validated through these centralised accounting records. Similarly, the volume of data transferred over the dedicated network links between the primary resource centres is also reported. This usage is generally reported for each Virtual Organisation using the infrastructure.

Operations

Vital to EGEE, and for any project that aims to deploy and support an infrastructure, is its operational effectiveness and availability. EGEE's infrastructure is deployed in over 50 counties on over 280 sites and encompasses over 80,000 processors and 20PB of data enough to store 400 million four-drawer filing cabinets full of text or 50 million CDs – a stack around 50km high. This infrastructure is available continuously and supports over 300,000 jobs a day and the research network connecting these sites and the distributed user community sustains transfer speeds of over 900MB/s each day.

A Platform for Application Development

Over the last decade the software interfaces to EGEE have stabilised, matured and now form a platform that provides a basis for external developers to build their own applications. These developers come from both the research community, that use EGEE for their own work, and the broader software community that provide higher-level tools and services for the research community to use. Some of the latter work is starting to appear in the RESPECT programme (Recommended External Software for EGEE CommuniTies - http://technical.eu-egee.org/index.php?id=290) that aims to publicise software and services that work well in concert with the EGEE gLite software and thereby expand the functionality of the grid infrastructure for users, promote the reuse of existing software to reduce duplicated development, and to provide software more oriented to end users than the core gLite middleware distribution.

This activity has expanded over the last year and now includes various software packages:

  • meta-schedulers able to run unattended applications and their related workflows by dynamically selecting resources

  • portals that provide access to EGEE’s resources through a web interface

  • tools that simplify the specification and execution of applications – especially those that involve the execution of the same application over large sets of data

  • services that provide access to data stored in files or in relational databases

  • tools that help developers to build applications to access grid resources

Summary

As the EGEE-III project enters its final year work continues on improving the effectiveness, usability, availability and reliability of the infrastructure. The user community continues to expand and an increasing number of researchers are coming to depend on this e-Infrastructure as part of their regular daily work. In recognition of the increasing maturity of this e-infrastructure, the community has been studying over the last year how this infrastructure can be made more sustainable. The result, the European Grid Infrastructure (EGI), establishes a small organisation that federates and coordinates the work of independent National Grid Infrastructures (NGI) and is due to start in May 2010.

Solutions