SCW_APRMAY10

HPC: CLUSTER MANAGEMENT

environments, internet self-service, pooling of resources, and metered usage – then ‘grid’ is missing the first two elements. As for the first element, grids are not really elastic, meaning applications that share the same OS build will also typically share the same grid in a relatively static manner. You’ll need multiple grids (or dedicated resource pools within the same grid) to handle the different OS builds that apps require. When adding virtualisation and/or physical host provisioning to the mix, you now enable repurposing of hosts based on policies that understand specific requirements of each application. For example, you might have a high-priority application on Windows that needs more resources at a specific time. Previously this application team would have been out of luck, but now the grid can serve up resources that are repurposed from a lower priority environment for this need. A set of Linux boxes can be quickly modified by tearing down the Linux virtual machine (VM) and firing up the Windows VM, or by rebuilding from bare metal with tools like dual boot, and the ‘new’ Windows servers will automatically join the app environment.

‘There’s such a sea-change taking place that at least one company has changed its name to reflect the new environment’

For the second element, grids provide APIs and command line interfaces (CLIs) for job submission, but don’t really provide self-service tools to request infrastructure on demand or adapt to calendar-based requirements. This is where the addition of an IaaS self-service resource request framework via a web portal, API and CLI enable application teams to more flexibly request and manage the resources they need. In this regard, Harris notes how Platform Computing’s cooperation with CERN demonstrates how that organisation is delegating more control out to the application teams to provide better performance and responsiveness for the services they provide. CERN also benefits from the automation required to make these self-service components

www.scientific-computing.com

40 more cores are provisioned to run Application B during this unplanned spike

100 90 80 70 60 50 40 30 20 10 0

Hour of Day

Reprovisioning a system for a different OS with Platform ISF Adaptive Cluster can help adapt to changing usage patterns.

operate (specifically application service definitions that contain the instruction set to auto-provision and auto-scale full app environments), significantly reducing IT manual labour required to service their application teams. To address such needs, Platform Computing recently announced the Platform ISF Adaptive Cluster, which turns static clusters and grids into dynamic, shared environments using heterogeneous physical and virtual HPC resources. It allocates resources dynamically, based on Platform LSF and Platform Symphony workload demands. This product helps eliminate cluster and queue sprawl, removes application stack silos and reduces large job starvation.

New names reflect new realities

There’s such a sea-change taking place that at least one company has changed its name to reflect the new environment. Previously known as Cluster Resources, the company is now Adaptive Computing, with its well- known Moab product line. Particularly relevant is Moab Adaptive HPC Suite, which likewise creates an adaptive operating environment that responds to changing requirements of applications and workloads. It allows a compute environment to dynamically accommodate workload surges. To do so it changes a node’s operating system as well as software and other resources, on the fly, in response to workload needs. It automatically triggers an OS change on

the needed number of nodes using a site’s preferred OS-modification technology – whether it be dual boot, diskful or stateless provisioning.

Beyond the operating system these other resources can take a number of shapes, one example being networking. Moab can be aware of the network topology and route jobs to nodes or clusters that have high throughput for those jobs that require it. When it comes to GPUs, you configure the software to identify which resources have Cuda processors. Some applications can work in both forms, but you don’t want to block them if certain resources are busy, but you can set up an affinity to a particular resource.

A final point in this regard is adaptive energy savings. With good cluster management, says Adaptive Computing’s president Michael Jackson, you can make maximum use of nodes already running and it should never be necessary to wait for a power-on cycle to take place. He says that one customer saved enough in energy bills in one month to pay for the software. With grid computing, you can provision computing resources as a utility that can be turned on or off. Cloud computing goes one step further with on-demand resource provisioning.

Another name-change to be aware of is Bright Computing, which is a spin-out from the European HPC cluster company ClusterVision. This company specialises in provisioning clusters, but for workload management it works with third parties. ➤

SCIENTIFIC COMPUTING WORLD APRIL/MAY 2010

23

App A (Linux) App B (Win) App C (Win)

Number of Cores

0 2 4 6 8

10 12 14 16 18

20 22 Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44