SCW_FEBMAR16

wavebreakmedia/Shutterstock.com

high-performance computing

➤ failure recovery, in a way where you are not losing too much momentum as the application progresses, and you can recover’. Te challenges of disappearing nodes and of removing dependencies ‘are not that easy for people who have written their own applications and are trying to provide more features. I think it is right for them long-term – to provide more recovery features – but these are the challenges that they have seen,’ he said. Allinea’s David Lecomber said that the

company had started using the Amazon cloud three or four years ago, to deliver training: ‘We could spin up a cluster in the training room for everyone to play with. Tey could debug; they could profile; without interfering with anyone else. We basically had on-demand access to clusters out in the cloud, rather than having to configure something locally or disrupt a local service while training was going on.’ Originally, back in 2010, it was hard work to

get started in the cloud, he observed: ‘It was like building your own HPC system, but with virtual hardware.’ However, like Khosla, he believes that it is now easier than before. ‘Tere is a good package on Amazon called CFN cluster. Within 10 minutes you can create a one-time set-up there and boot up a cluster in less than five minutes.’ All the usual soſtware stack, job schedulers and queues are included and ‘it even comes with a dynamic number of nodes. So, as someone asks to do more work on a system, it will spin up another couple of nodes and add them to the cluster – not something you can do in a regular HPC centre.’

Cloud providers interested in HPC Te major cloud providers themselves are showing more interest in HPC. According to David Power: ‘We have started to see some cloud providers put together higher-end offerings for HPC. You’ve got 10G systems; VM that can go up to 16 cores with a decent amount of RAM in it; they’re beginning to look a bit more like your traditional HPC compute node.’ (Although InfiniBand interconnects dominate the Top500 list of the world’s fastest supercomputers, there are still nearly 200 machines on the list with Ethernet/10GigE interconnects.) Bios-IT is now putting together its own cloud

solution. ‘Te ability to burst – get additional capacity for short periods of time – is something that a lot of our customers are now interested in investigating, so we have set up a proof-of- concept cloud-based system in our lab. We put in all the usual HPC hardware – parallel file system, InfiniBand interconnects. We were able to use Docker and Ironic to do bare-metal provisioning, so you didn’t have the performance hits from virtualisation.’

28 SCIENTIFIC COMPUTING WORLD THE DEGREE OF

INTEREST AMONG HPC CUSTOMERS FOR THE OPENSTACK OPTION WAS SURPRISING

Acceptance of the cloud in HPC is growing,

Power believes: ‘I think in the future you’re going to see more and more people using this sort of approach towards HPC’. Tere has been interest in Bios-IT’s trial cloud service from two categories of customers. One group are those who want to burst out into the cloud, who want extra capacity for a short period of time ‘because of some research, a grant proposal, or conference coming up where they needed urgent access to something and were willing to pay us for that instead of waiting in queues for the local resource’.

Leapfrog into HPC However, both Power and Khosla highlighted one category of user for whom the cloud may offer the opportunity to ‘leap-frog’ into HPC direct from a workstation-based infrastructure, without having to invest in the hardware of an HPC cluster (or the management overheads of running it). Khosla said: ‘We are seeing people on workstations wanting to move to HPC, but they don’t have an existing HPC infrastructure. Now it becomes a serious conversation for them to say “should we look at the cloud and see how it goes because we’ve heard that getting up and running in the cloud is a lot easier? I don’t have to buy facilities – and who would manage it for me?” We’re beginning to see more interest there,’ he observed. According to Power, for the infrastructure

that Bios-IT is creating, ‘we have had a number of requests from people just to do hosted HPC. Tey don’t want to look aſter the data centre themselves, just submit their codes and get the information out of it.’ What makes the cloud so attractive in this

scenario, Khosla continued, is that the return on investment (ROI) is easier to assess. In general, it’s hard for people to say what their ROI is if they buy their own cluster, he explained; they have to factor in not just the cost of the hardware, but power requirements, support and management costs. It is one of the areas on which X-ISS provides specialist advice. Whether for cloud or cluster, however, ‘getting the right expertise is a big challenge. We have a range of HPC solutions just for that issue, and at predictable cost. Te smaller guys don’t need a full-time person to manage their cluster, so we provide the service and they can easily assess the ROI.’ It may turn out that, with experience gained by leap-frogging to HPC in the cloud, a company will decide that HPC is worth the investment and buy its own dedicated cluster. But Khosla warned that there are still pitfalls

in trying to use the cloud. Specialist expertise is needed to meet the challenge of different types of users coming in with Open Source variants, or other commercial code that they want to try and test: ‘Te Open Source stuff is very, very painful, because each application has its own dependencies and oſten they conflict. Some of the cloud technologies – OpenStack, Containers and so on – are now beginning to be leveraged for private clouds to provide resources to these researchers on a quick basis, so they can test and validate their stuff before they go into production. Te nice thing about that is it starts making sense; once their signature is understood, they will maybe be able to move to a public cloud also.’

The future: federated clouds Te importance of the cloud in HPC is growing therefore, and there are further benefits to come. Allinea’s Lecomber said: ‘One thing I do find nicer about it is that it’s constantly improving. If you have an HPC system you are tied to it and its architecture for three to four years until the next one comes along, whereas Amazon does a refresh at least once a year in terms of the kinds of processors that you can pay for, so you’ve always got the ability to try something new. It may not be the absolute leading edge, but you do always get the ability to pay for access to some decent new hardware.’ Bright’s Van Leeuwen thinks the reach of the

cloud will extend still further: ‘Te next step is to burst from one private cloud to another private cloud – the term for that is ‘cloud federation’. Tis development will be really interesting for multinational companies that have data centres in multiple locations, he continued, so that they can balance compute resources across all their data centres and, rather than just sharing workloads within one data centre, can do so between multiple data centres.l

@scwmagazine l www.scientific-computing.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44

orderForm.title