HPC YEARBOOK 2012

HPC 2012 Workload management

to have consistent I/O performance. With multiple applications running simultaneously in a cluster, the workload manager needs to understand and satisfy the I/O needs of each application. A Big Data-aware workload manager will be able to schedule the various applications, such that their peak I/O demands do not overlap, and that the required storage performance is available when it is needed. ‘Of course, I/O performance is heavily

dependent on data locality. Generally speaking, today’s workload managers treat data as blobs of raw bytes, to be shuffled about with little understanding of their content. In the future, workload managers will increasingly understand the data’s structure and attributes. For example, by understanding the temporal and spatial locality of a dataset, the workload manager can exploit those factors in their scheduling decisions,’ he adds. Today, a major part of workload

management is dealing with cloud computing and virtualisation. ‘Historically, virtualisation was anathema to high- performance computing practitioners,’ says Harrington. ‘Te so-called virtualisation tax, or performance penalty caused by virtualisation was too high a price to pay for high-performance workloads. However, in recent years, this penalty has decreased to the point of being almost negligible for many applications. Tis trend, combined with virtualisation’s greatly increased flexibility, has made virtualisation a growing tool in the HPC arsenal. ‘Taking virtualisation to the next level,

HPC clouds combine the provisioning of virtual and physical machines with workload management technologies. Since virtual machines can be easily started, moved, stored and altered, they are much easier for the workload manager to schedule and control. Tis increased flexibility results in higher overall system utilisation and greater return on investment. HPC clouds also enable on-demand, pay-

per-use compute models that increase the accessibility and flexibility of HPC systems. Tis brings HPC to a wider audience and lowers its overall cost. In the future, workload managers will have to cope with these realities, dealing with more users and a wider variety of workloads, both physical and virtual. Workload managers must continue to evolve along with the ever-changing computing landscape. ‘At our recent user group meeting, two

themes emerged when it came to workload management and cloud computing,’ says Nitzberg. ‘Tese were consolidation and the need for a nice interface. ‘When we talk about consolidation here,

it’s about administrators assessing the cost and productivity of the resources available to them, whether that’s on the local nodes or via the cloud. Te emergence of the cloud and the fact that it has a pricing structure has allowed direct comparisons to be made. Administrators are now seeing the need for consolidation of resources to reduce costs. ‘User interface is also incredibly

important. Tere are a lot of big advantages in providing a simple user interface for both end users and administrators. In the case of the latter, they need to be able to refresh what’s behind the portal – that is, the configuration of computers, networks, storage and so on – without affecting what the user sees.’ Looking ahead,

‘As an absolute minimum, it should offer provisioning,’ says van Leeuwen. ‘Te next feature one would expect is monitoring, which is keeping track of all sorts of metrics around hardware and soſtware, and visualising these statistics. Users should also look for configuration management, which enables you to configure many aspects of compute nodes. ‘It’s important that you have proper

“Tere is a distinction between cluster management and workload management, but it is vital that the two work together in harmony”

Nitzberg sees a reduction in complexity, in keeping with the cyclical nature of HPC development. ‘Tirty years ago, clusters as we know them today didn’t exist,’ he says. ‘Instead, it was a complex web of interconnected high-performance computing devices. So, clusters built from commodity nodes made things easier, but now we have GPU processors and co-processors and so on, which have once again increased the complexity of systems. It may be that things will become even more complex in the near future, before they become simpler. Workload managers will have to monitor this trend and respond accordingly.’

Cluster management Tere is a distinction between cluster management and workload management, but it is vital that the two work together in harmony. Dr Matthijs van Leeuwen is chief executive officer of Bright Computing, a company that specialises in cluster management soſtware. ‘Almost all high- performance computing is now done on clusters,’ he says, ‘so, almost all HPC systems need both cluster management and workload management. Newcomers to HPC don’t always understand that there are two separate products.’ So, what should a cluster manager do?

integration between your cluster manager and your workload manager. When we first started the company, we were surprised that nobody was taking cluster management that seriously, and also that nobody had really looked at integration with workload managers. So, that’s what drove us to create our own product, Bright Cluster Manager. We work closely with all the common workload management soſtware products in HPC – both commercial and open source.’ Te level of integration

between the cluster manager and the workload manager will differ among products,

but for Bright Cluster Manager, this starts at the point of installation. ‘Our soſtware will automatically install and configure the workload manager,’ says van Leeuwen, ‘so when somebody installs our soſtware, they only need provide the information about their cluster once. Similarly, we provide just one user interface that caters for both cluster management and workload management, and provide one API for both of them. ‘Other features include health checking,

whereby our soſtware will monitor the health of nodes, and let the workload manager know when the health of any given node is failing. Tat way, the workload manager knows not to schedule any jobs for that node. Power consumption is also managed; if the workload manager tells us that a particular node is not being used, then our soſtware will close that node down to save power. ‘Finally, the deep integration handles

“data aware scheduling to the cloud”, whereby when a user submits a job, we let the workload manager take care of the data transfer,’ he adds. Looking ahead, van Leeuwen believes

there are two drivers in cluster management – cloud computing and Big Data. ‘Until a couple of years ago, no cluster management

25

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32