GridUSP as a central facility to e-Science in Brazil
By Alberto Camilli, University of São Paulo, Brazil
The University of São Paulo (USP) has campuses in several cities across São Paulo state, encouraging a decentralized research environment in which every department is responsible for its own resource acquisition and deployment. This situation is similar to that of many universities around the world, and it arises because research resources cannot come completely from the university’s own budget, but must be supplemented by sponsoring agencies that privilege the relative merit of the research, ranked according to the prestige of the leading researcher. From a researcher’s perspective, enabling hardware´s exclusive utilization is a shorter way to achieve the desired results. Nonetheless, this model promotes competition around existing infrastructures: space, energy and administrative labor are shared and limited resources. In fact, when grants are requested, these infrastructures are rarely included in the specifications, which frequently push to maximize processing power. Yet these infrastructures are naturally bound by local limits, and thus this decentralized model frequently leads to low availability systems, poor management and unsustainable practices.
Thus, enhancing the use of common infrastructures is paramount to productivity gains, since it will help scientists to focus on the subject of their research instead of researching the infrastructure itself. GridUSP is driven by USP’s central IT campus management board (CTI) to promote e-science through a better utilization of existing IT infrastructure.
GridUSP architectural premises
GridUSP is a central facility housed at CCE (Centro de Computação Eletrônica), the main data center at USP. It is a cloud-based premier experience providing a 24/7, stable, monitored, flexible, interoperable and powerful environment for researchers from USP. Its conception was based on the necessity to match the needs of users of parallel processing programs (for which high performance network specific cluster-based solutions are the only solution), and collaborative and/or asynchronous serial users (for which grid-based solutions are needed). The main premises of GridUSP are:
- It should provide on-demand high performance configured clusters. Usual HPC users should run MPI jobs transparently, as if a dedicated cluster was available. Initially seeded by centrally acquired modern hardware hosted at CCE, the architecture should enable aggregation of other clusters at CCE as well as researchers’ “donated” hardware, remotely administered (this latter is only available after scrutiny of the hardware’s benefits and available network connections by CCE staff). Therefore, contrary to usual grid models where Virtual Organizations are the resource owners, central administration is considered essential to achieve minimum quality standards for service levels.
- It should be able to interoperate with other grid environments. Basically, every node in GridUSP comprises several preformatted virtual “images” that can be loaded upon request by a job scheduler. This preserves the simplicity of the user interface while allowing for collaboration in grid environments. The current core implementation of GridUSP comes from the OpeNebula/Reservoir project (EGEE).
Example use cases
- User of a weather forecasting model (WRF) demanded an arbitrary number of nodes requiring intense inter-processor communication overhead. After experimenting with several on-demand configurations, the optimum performance was achieved using four nodes (32 CPU) with 16GB RAM/node running MPI jobs (Linux Debian OS).
- User of a GIS agriculture research model needed an environment to run under a Windows system. Two nodes (16 cores) with 5 GB/node of RAM was scheduled, accessing 1.8 TB of the cloud-shared storage (Lustre) for data repository.
- User of USP central administration requested the configuration of nodes for testing a prototype system for processing student enrollments. Student requests are multiplexed through a virtual network infrastructure and seen as only one application. Since the application loads are seasonal, central administration (Linux Ubuntu OS) must be able to easily preschedule and configure the required number of available machines in the cloud. In this task, some GridUSP nodes can be requested by Integrade (a grid kept by another USP’s research group), running asynchronously to accomplish up-to-date cross references in professors’ curriculum database publications (Lattes base), as an aid to help student choices.
These examples show the GridUSP infrastructure transparently accessed in a variety of configurations and in practical e-Science situations.
Conclusions
While keeping the term ‘grid’ in its name, GridUSP is provided as a cloud, taking advantage of existing campus infrastructure and focused on functionality. GridUSP is a pragmatic response to USP’s scientific user community and can be replicated by other groups to leverage e-Science in their campuses, and possibly as a component to promote consolidation of global e-Science infrastructures.
