search.noResults

search.searching

note.createNoteMessage

search.noResults

search.searching

orderForm.title

orderForm.productCode
orderForm.description
orderForm.quantity
orderForm.itemPrice
orderForm.price
orderForm.totalPrice
orderForm.deliveryDetails.billingAddress
orderForm.deliveryDetails.deliveryAddress
orderForm.noItems
high-performance computing





and set up a business of their own. Te two developers (Moe and Danny) started the company – SchedMD – which stands for scheduling by Moe and Danny. ‘In the early years, from 2010 to 2012,


the company focused primarily on custom development of SLURM. In early 2013 when I joined the company, they we started exploring providing commercial support to help fund future SLURM development,’ said Jenson, who explained that although the soſtware is completely open-source SchedMD does provide commercial support for the platform helping users to get their implementations up and running as fast as possible. While all users get exactly the same version of the


WE DO NOT WANT THE UK TO LOSE THE EDGE WITH REGARD TO RESEARCH


COMPETITION THE SYSTEM FROM THE STFC WAS A WELCOME ADDITION AT AROUND 8,000 CORES


soſtware, paid users get access to support through SchedMD, or one of the partner organisations. ‘SchedMD is the only company that


provides level 3 support. Tere are several companies such as Bull/Atos, HPE, Cray, SGI,


8 SCIENTIFIC COMPUTING WORLD


Dell, Lenovo, and others that provide level 1 and 2 support for SLURM but for any level 3 issues they all outsource those to us,’ said Jenson. Te paid users help to contribute to further


development of the soſtware, which is then made available to the free users through the open-source soſtware model – although setting up advanced features without the help of commercial support can be time- consuming. Jenson stated that the next version of the


SURM soſtware would introduce a new feature known as federated SLURM, or grid computing, which allows multiple systems to share job allocation. ‘It allows its multiple SLURM systems to


work together to share jobs. Right now, if you want to submit a job to a system, you have to be on a system and that job goes to the system you are on’ explained Jenson. ‘With this new feature in place, sites


will be able to have all of their systems communicating. Based on how it is configured, the jobs can be routed to the correct systems to meet the organisations policy,’ stated Jenson. However, as with all factors in HPC


management, there is a trade-off between the time spent fixing a problem and the cost of paying for professional support. Many academic centres do not have the resources and so must invest their time to get these systems running. Whereas commercial companies may want to spend the money to ensure maximum


cluster utilisation at all times to maximise profits and return on investment (ROI). For the HPC users at Durham University, there was no choice but to implement an open- source workload manager because they could not cover the cost of licencing LSF on the newest addition to the COSMA HPC set-up. ‘LSF is a good batch system with wonderful


features – and we made good use of that – but if you do not have the money you do not have the money and you have to make do with what you have,’ stressed Heck. ‘With SLURM, the benefit in our


opinion is that the community is behind its development. It is an active community and many of the supercomputing systems around the world are using SLURM,’ said Heck. Heck explained that this is important to


HPC users, because they do not want to be leſt ‘high and dry’. Tey do not want to invest in a system and


then having that system fail, ending up with a lack of support from developers. An active community helps to allay these fears as there are many users invested in the success of the technology. Members of this community can also help


each other, and expertise started to develop between like-minded users. Heck commented that the ICC was not the only team at Durham using SLURM, as the main Durham HPC facility is also using this technology. Heck’s Dirac colleagues at Cambridge are also using SLURM, so there is clearly some local expertise being developed among the diverse user community.l


@scwmagazine l www.scientific-computing.com


sdecoret/Shuterstock.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36