Supercomputers, grids, or clouds?
By Wolfgang Gentzsch, DEISA2 and Open Grid Forum, Germany
Now that we have a new computing paradigm –cloud computing – will clouds replace supercomputers, just as we thought grids might (and then did not)? Are grids dead, now that we have clouds? Despite all the promising developments in the grid and cloud computing space, and the avalanche of publications and talks on this subject, many people still seem to be confused about High Performance Computing (HPC) versus grids versus clouds, and
hesitant to take the next step. I think there are a number of issues driving this uncertainty:
Grids didn’t keep all their promises
Grids did not evolve (as some of us originally thought) into the next fundamental IT infrastructure for everything and everybody. The diversity of computing environments meant we had to develop different middlewares and face different usage models with different benefits. Enterprise grids were (and are) providing better resource utilization
and business flexibility, while global grids are best suited to complex R&D collaborations with resource sharing. For enterprise usage, setting up and operating grids was often cumbersome. For researchers, this was seen to be a necessary evil. Implementing complex applications on supercomputers has never been easy. So what.
Grid: the way station to the cloud
After 40 years of dealing with HPC, grid computing was indeed the next big thing for the big-science researcher, while for the enterprise CIO, grids were a way station on the road to the cloud model, which provides all the missing pieces of a utility: ease to use, economies of scale, business elasticity, and pay-as-you-go accounting (thus reducing capital expenditure). In cases where security matters, there is always the private cloud, run within an enterprise’s firewall. In more complex enterprise environments, where different applications are run under different policies, private clouds can easily connect to (external) public clouds, creating a hybrid cloud infrastructure that balances security with efficiency.
Different policies: what does that mean?
No application job is alike. HPC jobs differ by priority, strategic importance, deadline, budget, and IP and licensing characteristics. In addition, a specific code (and its inherent algorithms) often requires a specific computer architecture, interconnect, operating system, memory, and so on. These differences strongly influence where and when a job will run. For any job, this set of specific requirements will determine the specific policies that must be defined and programmed, such that the job will only run in accordance with these policies. Ideally, this is guaranteed by a dynamic resource broker that controls submission to grid or cloud resources, be they local or global, private or public.
Grids or clouds?
One question remains: how do I find out (and then ‘tell’ the resource broker) whether my application should run on an HPC grid or in a cloud? This answer, among others, depends on the algorithmic structure of the program, which might be intolerant of the high latency and low bandwidth typical in today’s clouds. These performance limitations
are exhibited mainly by tightly coupled data-intensive applications running in parallel on hundreds or thousands of processors or cores. The good news is, however, that many HPC applications do not require high bandwidth and low latency, and thus can easily run on the cloud. Examples are parameter studies often seen in science and engineering, where the same application is executed for many parameters, resulting in many independent jobs (such as analyzing the data from a particle physics collider, identifying the solution parameter in optimization, ensemble runs to quantify climate model uncertainties, identifying potential drugs, studying economic model sensitivity to parameters, and analyzing different materials and their resistance in crash tests, to name just a few).
HPC needs grids and clouds
According to the experience of the DEISA Extreme Computing Initiative (DECI), there are still plenty of grand-challenge science and engineering applications that can only run effectively on the largest and most powerful (and thus most expensive) supercomputers. The DEISA HPC grid (also called an HPC Ecosystem) comprises eleven of the fastest supercomputers in Europe. Today, nobody would build an HPC cloud for these particular applications: it simply wouldn’t be profitable as the “market” (composed of just the HPC users) is far too small. In some science application scenarios, with complex workflows of different tasks, a hybrid infrastructure might make sense: cloud whenever possible, and HPC grid whenever necessary, providing the best of both worlds. 
