SCW_OCTNOV11

HPC virtualisation ‘If you run several virtual machines

on one processor, then as soon as you run a parallel programme – and nearly all HPC programmes are parallel – instead of getting the power of one processor, you get the power divided by the number of virtual machines that are running on it,’ Parsons says. ‘To get real performance in an HPC

application, you want to cut the application up into pieces, and give each piece to a processor that works independently of the others – and every now and then they have a conversation. As soon as you add virtualisation there, you have two problems. One is that, to get real performance you need to have one “blob” of work being done by one processor, and in virtualisation you don’t really know what’s going on

car’s speed back, is a complex business – and it’s not helped by regulations from FOTA, the Formula One Teams Association, that govern how much processing is allowed. Each team must balance real- world wind tunnel work and computerised analysis, and keep the two within a defi ned limit at all times – currently, CFD terafl ops usage must be less than or equal to 40 minus two thirds of the wind tunnel hours (CFD <

_ 40 – 2/3*WT). As Team Lotus is

currently increasing the real wind tunnel work it does, senior HPC support engineer Geoff Dunk is tightly constrained in the compute power he can use – so he needs to drag every last ounce of processing out of it that he can. ‘The Dells run Linux, stripped down so we only use what we need to. We want to get

COMPUTATIONAL FLUID DYNAMICS, ANALYSING THE TRADE-OFF

BETWEEN DOWNFORCE, TO KEEP THE CAR ON THE TRACK, AND DRAG HOLDING THE CAR’S SPEED BACK, IS A COMPLEX BUSINESS

underneath. And the second is that you need the fastest communication between processors possible and when you’re using virtual hosts, there’s a whole layer of technology in between the network and the virtual host that slows down communication – you’re immediately not getting optimum communication,’ Parsons says. HPC applications are generally trying to

work at the very limit of what’s possible, and so specialised, physical systems are always likely to have the edge, Parsons says. A prime example of this on-the-edge

work is going on at Team Lotus, where engineers are studying aerodynamics in Formula 1 cars using 12 Dell M1000e Chassis blade server enclosures, with 16 blades in each, eight CPUs and 24GB of memory per blade. Computational fl uid dynamics, analysing

the trade-off between downforce, to keep the car on the track, and drag, holding the

the most out of every machine and, adding the machines together, the best resource out of the environment,’ Dunk says. Virtualisation would just get in the way.

‘To run a virtualised system, the host has to use overhead. And we want to get the most out of our hardware, the best bang for our buck, and we don’t want to waste energy running a host. So it’s 100 per cent physical.’ Despite seeing 92 per cent effi ciency from the HPC setup, Dunk is still not satisfi ed and determined to cut down some of the eight per cent loss. That will involve tuning the operating system, to make things as effi cient as possible, and tuning the interconnections to cut latency. ‘They’re Infi niband high speed interconnects, out-of-the-box, and I’ve still some work to do on that.’ Dunk stresses that he does recognise the

benefi ts of virtualisation – the enterprise system for Team Lotus is ‘quite a virtualised environment’, and the powerful post-

processing machines used to turn the ‘fi les full of numbers’ produced by the HPC set up into comprehensible images could potentially be virtualised too. ‘Though [the post-processing

applications] do use a lot of memory, one of the machines, a 32-core machine with half a terabyte of memory, can be used for just one job. We’re not running hundreds of small jobs – we run six slots, or six large jobs at a time, and they can take between 17 and 24 hours.’ It’s not all about servers, these days – HPC is increasingly being run on general purpose graphics processing units (GPGPUs). For massively parallel, multi- threaded workloads, GPUs are ideal – fast and effi cient, yet using very little power. Peer 1 Hosting runs two data centres in the UK, including a new 58,000 square metre facility in Portsmouth. Amanda Dunn, director of business development at EMEA, is responsible for increasing the use of the company’s GPUs in the HPC market. She says this is one area where virtualisation isn’t even a consideration. While it is possible to connect a GPU

to a virtual server using virtualisation technology, it’s not currently possible to link to more than one virtual machine at a time – the same GPU cannot be shared with multiple virtual machines, Dunn says. ‘GPGPUs don’t contain any native ways

to share with virtual environments, or to isolate or protect memory between the many processes needed by virtualisation. Ideally, what’s needed is some form of abstraction layer, vendor independent and enabling multiple virtual machines to communicate with the GPGPU to process workloads,’ she says. One of the reasons for the lack of that

abstraction layer is because of the use of proprietary methods of communications across devices, Dunn says. While efforts have been made to create open standards, they’re not widely adopted to date.

22

SCIENTIFIC COMPUTING WORLD

Image courtesy of Team Lotus

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52