SCW Summer 2020

HIGH PERFORMANCE COMPUTING

“Now people can keep their data and their intellectual property and they can have all of their privacy concerns addressed. You do not need to send any of the data back and forth”

must be agreed upon in order to ensure that the data sets can be used as a single shared model, some homogeneity of the data can be beneficial. Small changes such as different

scanners providing the same type of image or two populations of different age groups or other population metrics can help to normalise the data and provide a more varied insight for the researchers creating the model. ‘If you are doing something where you

‘Today if I want to run deep learning on

computerised tomography (CT) scans and I am studying a certain problem and I only have 50 patient scans with this condition. You can imagine if I tried to train a model just using my 50 CT scans – depending on what the model is – I may not be able to get a very accurate result. But if I am learning not just my own

cases but yours and on all the other available cases, now I have so much more data to train the model,’ stated Flores. ‘As you know deep learning needs a lot of data so now the researchers can have a model that works better, at least at their own institution, through federated learning as opposed to what they could have had just training on their own dataset.’

Social engineering or standardisation? While federated learning can solve key challenges in data availability for AI it can also create problems of its own. Sharing this kind of data requires that collaborators think about key parameters that must be collected but also how the data will be collected and stored. ‘Federated learning is really just coming up and there is not just a single way of doing this. Even once you choose the specific way of doing federated learning there are many things that can change,’ stressed Flores.

www.scientific-computing.com | @scwmagazine ‘What we are noticing now is that initially

when you start doing federated learning there is a lot of social engineering that needs to happen. Being able to collect the data in the same manner in each place for example. You may need to annotate the data so you have to make sure all of the places can agree on what certain parameters.’ ‘All of this stuff is happening today by what I call social engineering, over time a lot of that will become automated and this makes the experimentation and iteration much faster,’ added Flores. There are many questions that are either developing over time or that must be addressed on a case by case basis by the organisations collecting the data. For example, how do you decide on and exchange the weights underlying the model? ‘There are many different variables and

also the methodology to come up with a better model at the end,’ stated Flores. ‘How do you aggregate this data? There are many questions still left to answer.’ ‘This is a field that is just starting and it is going to continue to emerge and get better and more efficient and will become much easier for people to do. It really depends on what you are studying,’ noted Flores.

While there are certain parameters that

need to collect blood count in terms of haemoglobin and someone reports it in terms of hematocrit then it is not going to make sense,’ said Flores. ‘The old adage of garbage in, garbage out still exists. You definitely need to have some sort of standardisation. ‘Having said that you can use instruments from multiple vendors as long as the characteristics of the scan and the setup parameters can be matched up.’ ‘The deep learning model can correct

for a certain amount of noise in the image relating to differences in the make and model of a scanner for instance,’ added Flores. ‘To the extent that you have lots of heterogeneous data, that can actually make the model more robust.’ Using this approach it would theoretically be possible to ‘generalise the model to someone in Spain even though we have different scanners and we do things slightly differently in the US,’ Flores noted. While deep learning and AI are still in their infancy for many research areas it is important that data sources can be made available. Even more imperative is that standards and methods must exist in order to ensure that data is usable and accessible so that it is not siloed away but available for researchers to use effectively. ‘Deep learning needs a large dataset and this is really what is making AI and specifically deep learning more common these days,’ stated Flores. ‘We have had DL algorithms since the 1950s but only now are you hearing about them being used in clinical medicine now and that is because we have an abundance of data that was not available before.

Summer 2020 Scientific Computing World 9

greenbutterfly/Shutterstock.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38

orderForm.title