HIGH PERFORMANCE COMPUTING g
In a blog post by OCF, Coates stated that ‘Spare capacity can be utilised when users are not using all HPC resources and any donation of clock cycles doesn’t need to impact on any current workloads. GPU capacity is the most sought after at this time, but all donated resources help.’
Maximising utilisation What started out as a technical exercise for OCF staff looking to better utilise the computing cycles has enabled researchers to donate their spare computing resources to help combat Covid-19. With many universities closed or partially operational it could be a good time to use spare computing resources in this manner. Even if there are a small number of
critical applications running on a cluster it may be impossible to cease all computing operations. While some may have smaller clusters or partitions that can be switched off many modern systems rely on dual socket servers which share power supplies. Equally there may be integrated cooling solutions which make it difficult to efficiently run the cluster at less than full utilisation. In many cases, making efficient use
of those compute cycles to produce scientific output is the most efficient way of utilising the lefty over capacity of a large cluster. Coates noted that the work did not take long because the team had already been looking at using the scheduler for idle tasks. It was just a case of switching that over to focus on F@h. ‘Off the back of that we decided that it would be a technical exercise for us to just say if we want to use all of the spare cycles on a cluster for our pre-defined workload,’ Proteins are molecular machines made of a linear chain of chemicals called amino acids that, in many cases, spontaneously ‘fold’ into compact, functional structures. Much like any other machine, it’s how a protein’s components are arranged and moved that determine the protein’s function. Viruses also have proteins that they
use to suppress the immune systems and reproduce themselves. To help tackle coronavirus, F@H project is trying to determine how these viral proteins work and how we can design therapeutics to stop them. Coates also noted that the reason they
chose to develop these instructions for the Slurm Workload Manager was due to its popularity with many research centres and universities. ‘A lot of our educational establishments use Slurm as a Workload manager anyway and we have deployed
8 Scientific Computing World Summer 2020
quite a few clusters out there into the world with Slurm. It is quickly becoming somewhat of a standard for educational centres, they can expand it to their needs quickly because it is open source.’ added Coates. ‘Because we know that a lot of those educational sector customers have an underutilised cluster resource it was another easy choice. We wanted to make sure that people could get the most out of this if they decided to jump onto it. We realised that a lot of universities are in semi shut down at the moment so there was no point letting those cycles go to waste.’
Learning at a distance Deep learning has long been known to be a powerful tool for research computing. However one of the stumbling blocks has often been the availability of good quality data. Without the requisite amount of high quality data the model will struggle to develop the required accuracy or insight. Federated learning is one attempt to
combat this challenge. Mona Flores, global head of medical AI at Nvidia explains, how federated learning enables AI research using decentralised data. ‘It means that if we have three hospitals a, b and c each one of us has our own data set. We do not share these data sets but the eventual model is learning from all of our data,’
“Spare capacity can be utilised when users are not using all HPC resources and any donation of clock cycles doesn’t need to impact on any current workloads”
said Flores. ‘Now people can keep their data and their intellectual property and they can have all of their privacy concerns addressed. You do not need to send any of the data back and forth.’ This is particularly useful for sensitive
data often found in healthcare or medical image scans. By removing the need to share patient data, hospitals, research centres and academic institutions can share data quickly through a centralised server hosted by Nvidia. Federated learning could add real benefit to research, particularly for areas where AI is just emerging and there may not be enough data to run a model from a single organisation. In this case, organisations can participate in collaboration rather than not being able to complete the research, or having to share large scale datasets including private information.
@scwmagazine |
www.scientific-computing.com
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38