SCW_APRMAY10

HPC: CLUSTER MANAGEMENT

director of product marketing William Lu, before cluster management software can do a complete job. Specifically, the software needs to detect when a particular GPU is used by a specific application to determine when capacity is or will be available. Today it’s possible to detect the number of GPUs in a node, and that’s a static assignment; however, Cuda provides no indication whether a particular core is in use. Without this information, users might make wrong assumptions about GPU usage and inadvertently leave a GPU idle or try to use two different GPUs – the cluster software just doesn’t know. He relates that his company is working with Nvidia to resolve this technology gap, and future versions of Cuda are likely to have enhancements along these lines.

Cluster management software can also include a total view of all the cores, here showing their temperature with ‘rack view’ from the Bright Cluster Manager.

➤

For GPUs, the Bright Cluster Manager installs Cuda libraries in the proper places. An ‘environment module’ allows you to manage multiple versions and ensures that an application or compiler knows where to find everything it needs. For GPUs, says commercial director Matthijs van Leeuwen, assume user A and user B need two different versions of Cuda; without the environment module you would have to state this explicitly. In addition, with the ‘rack view’ feature you can visualise any metrics that are made available to the operating system including teraflops or the chip’s temperature.

The company also touts Bright Health, which addresses that fact that an HPC cluster evolves as hardware and software are subject to change. An interesting feature is the ‘prejob checker’, which consists of a number of tests that run in a few seconds before a job starts. If a test fails, the faulty node is taken offline in the workload manager, the administrator is notified and the job is re-queued – the workload manager is not flushed empty if one of the nodes is faulty, and the job doesn’t have to go to the bottom of the queue.

24

All on a single DVD

Like the previous company mentioned, Clustercorp specialises primarily in provisioning, but company president Tim McIntire says their main value added is putting all the pieces needed into one distribution, which is called a Rocks. That, in turn, uses a paradigm called ‘Rolls’, which automates the installation of software across a cluster and prevent ‘software skew’. A Roll can contain any software package built for Linux. Clustercorp’s Rolls include critical stacks from suppliers such as Mellanox (OFED Infiniband), Intel (compilers), Portland Group (compilers), Platform (LSF), Cluster Resources (Moab), TotalView (debugger), Nvidia (Cuda) and Panasas (storage). Further, with the Xen Roll you can use the same tools to spin off virtual clusters inside virtual clusters, each with a VPN (virtual private network). When you provision a node for the first time, you select an appliance type such as a compute node, storage node or, for GPUs, a Cuda node. During provisioning, each Cuda appliance automatically gets the required modules. With GPUs there is one issue to be aware of, points out Platform Computing’s

SCIENTIFIC COMPUTING WORLD APRIL/MAY 2010

Another limit is that, today, a workload manager can only send a job and hope that the application benefits from the selected environment. The advice from Jochen Krebs, director of enterprise solutions sales for Altair, is to consider the application and your users (their privileges and rights) and then put in a definition into a site policy that makes sure to reserve GPUs for applications that benefit most from them.

Built with the OS

Besides those firms who provide cluster management software for available Linux operating systems, some are developing an OS with the cluster software already included. One such company is T-Massive Computing, Russia’s largest HPC supplier, with its Clustrx HPC OS. It includes a POSIX-compliant resource manager (based on SLURM, the open-source Simple Linux Utility for Resource Management) along with deployment, job-scheduling, monitoring, power-management, cluster management and provisioning subsystems. This software is being used on the 420- teraflop Lomonosov supercomputer at Moscow State University, a system with 35,776 cores and currently in the 12th position in the current TOP500 list. Clustrx does provide hybrid and GPU- based support with a hybrid MPI, and the firm is working to enhance it with adaptive task management and scheduling including dynamic task profiling as well as some level of GPU (or any other accelerator) node virtualisation.

www.scientific-computing.com Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44