SCW_DEC11JAN12

workload management

middleware packages, there is a recent business development that resulted in the flooding of email inboxes of those involved in this sector. Specifically, this was IBM’s announcement in October that it was acquiring Platform Computing – and the impact on the overall market will be interesting to follow. Workload management software attempts

to route jobs to each core to make sure the overall system runs at peak efficiency. Dr Songnian Zhou, CEO of Platform,

says there is a clear intention to keep Platform as a separate business group and that it enhances IBM’s current product line. Over the past decade, he notes, IBM has focused on supercomputers, and while that business is solid, it’s still a niche market; IBM is looking to diversify into the broader market for general scientific and technical computing. For instance, IBM’s Load Leveler workload-management middleware has been focused on AIX and the PowerPC series, but going forward they are pressing into the x86 market for clusters, grids and clouds, and that’s where Platform’s products make a solid contribution. He also believes this acquisition represents a shift in the computing landscape, where middleware such as load managers are becoming a critical component; IBM’s purchase of Platform is validation of this fact, because until now the big hardware companies haven’t placed a strong focus on middleware, an area that has until now been underappreciated. As people scale out, this middleware becomes a critical component. Finally, he notes: ‘We at Platform are going from a symmetrical structure where we partnered with all hardware vendors to an asymmetrical structure where, with Platform owned by IBM, we are committed to do whatever it takes to help IBM provide end- to-end solutions. However, our commitment to supporting all hardware and software vendors remains and is backed up by even more resources. Users always live in a heterogeneous world, and it’s a major part of the value of this acquisition and in IBM’s interest to ensure Platform software supports all platforms.’

Devil in the details As for differentiating factors, ‘the devil is in the details’ says Bill Nitzberg, CTO for PBS Works at Altair Engineering. ‘Everyone says their software is secure, but to my knowledge we’re the only ones with an EAL 3+ certification. Next, our scheduler is dynamic in that we allow administrators to express their needs as a formula, rather than a set of

www.scientific-computing.com

coefficients for a static table. We also work with job arrays more efficiently, where a user can submit tens of thousands of jobs, and we manipulate it as a single object.’ Nitzberg also points out that Altair is

unique in the HPC market, because it not only supplies middleware such as PBS Professional, its commercial workload management solution, but is also an engineering company creating its own applications such as the Hyperworks suite for engineering analysis. Various suppliers focus on different areas, and our base is traditional HPC where we know the applications well. We have a 500-strong team who do HPC as their day job using our tools, so we get immediate unbiased user feedback.’

MOAB SOLVES THE JOB

SUBMISSION PROBLEM BY ALLOWING A LOCAL INSTANCE OF MOAB TO BE INSTALLED IN NEW JERSEY WHERE USERS CAN INTERACT WITH THE SYSTEM, MANIPULATE THEIR DATA SETS, AND ANALYSE THEIR RESULTS

Penguin Computing, which is the home

of Scyld Beowolf cluster technology, the original Linux cluster software co-invented by Penguin’s CTO Donald Becker, has been deploying a variety of schedulers on its clusters. It pre-installs Torque (Terascale Open-source Resource and QUeue manager) on every cluster that is under management of its Scyld ClusterWare software; for customers with more complex requirements the company’s scheduler of choice is Moab from Adaptive Computing (which also provides commercial support for Torque). As for confusion in the market, Arend

Dittmer, director of product management at Penguin, agrees. He says: ‘There are a lot of overlapping claims as to what the different products can do, and for the most part these claims are justified. Once a vendor comes out with a “new” feature it’s typically a matter of time until that is replicated by other vendors.’ He adds that one of the things that matter

most from a practical perspective is each scheduler’s architecture and implementation. How does the main scheduler go through the list of pending jobs? Is this main scheduler (the decision-making engine) multi-threaded? What are the shortest configurable scheduling intervals? Can main schedulers talk to each other, thus

informatics for biofuels

allowing for enterprise grid installations? How is the scheduling mechanism integrated with parallel MPI applications? These software design choices have implications on the scheduler’s scalability and its ability to process large numbers of small jobs, interactive jobs and large parallel jobs. Unfortunately, but not surprisingly, he adds, information on these implementation details is not available from the respective vendors, so you only have each vendor’s scalability claims to go by.

What are they using it for? In speaking with potential workload manager suppliers, one thing that’s helpful is to ask for references and see what other people are doing with the software. In the case of Adaptive Computing, it just announced that the National Oceanic and Atmospheric Administration (NOAA), in conjunction with Oak Ridge National Laboratory (ORNL) and Computer Sciences Corporation (CSC), has selected Moab technology as the workload management software for NOAA’s new Gaea supercomputer. In choosing a workload manager, one of NOAA’s primary considerations was location-aware scheduling. That organisation’s Geophysical Fluid Dynamics Laboratory (GFDL), located in Princeton, NJ, supports local researchers as well as others across the country. Gaea is physically located at ORNL in Tennessee. The disparate locations of users and systems, current and future, create challenges in networking, data transfer, and job submissions. Moab solves the job submission problem by allowing a local instance of Moab to be installed in New Jersey where users can interact with the system, manipulate their data sets, and analyse their results. Moab then communicates with and migrates workload jobs and data between GFDL and the instance of Moab running on Gaea in Tennessee. This model can grow organically as new users and compute resources come online. Another topic that will be of increasing

importance going forward are workload managers’ cloud bursting capabilities to support the seamless movement of workload into the cloud, which cannot be accommodated locally. Two companies that have combined efforts along these lines are RightScale, a cloud-management platform, and Univa, which sells Univa Grid Engine 8.0. That software started as Sun Grid Engine and then became Oracle Grid Engine. According to Gary Tyreman, CEO of Univa, Oracle has started to move away from the HPC market,

DECEMBER 2011/JANUARY 2012 25

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40