This page contains a Flash digital edition of a book.
high-performance computing


about compute, but storage was always the child that’s leſt behind. Everybody needs it, but not a lot of thought is given to it. But once users start running, they realise that they do need storage and need it fast. So we have to plan it out and think about it,’ said Michelle Butler, technical programme manager for storage enabling technologies, at the US National Center for Supercomputing Applications (NCSA) in Illinois.


Take the burden from the users Scientists use computers as a means to an end – they want to focus on the science, she continued, ‘but you have to know what is under the covers to take advantage of what is there. You have to know how it’s laid out; what the storage is underneath it.’ At the NCSA, she explained, ‘we’re working with users that have very large files. Right now the burden is on the users to move that data, and we have to take that burden off them. We also have to take the burden off them to find their data.’ Butler pointed out that distributed data is


not confined to the LHC project. At the NCSA, the most data intensive sciences, storing up to 4PB, are weather and atmospheric sciences – including solar flares that would be included in weather so the solar and space environment. Nor is distributed data unknown: she cited the example of an earthquake engineering team that uses the Cray Blue Waters’ supercomputer at the NCSA but which stores its data in three different machines in geographically separate places. ‘How can they find the files that they have stored?’ she asked. It is a complex problem, she went on, involving data management and the management of file locations but the location problem was not just within one computer it also involved location geographically. ‘I think there is a lot of different hardware and soſtware that needs to be built here.’ And, she stressed, it was imperative to work with the vendors: ‘Tey can’t just do their own thing with their hardware anymore; it is much more global than that. Everyone has to have more standards and more flexibility and realise they do not own all the data at every level anymore.’ Butler’s theme of taking the burden away


from the users is very much in evidence also at Cern. Years of effort have gone into building a system where the ‘users do not know about the details of our infrastructure. Tey have the illusion of an infinite file system,’ according to Massimo Lamanna, who is section leader, file systems and disk operations section, at Cern. He explained that, since the 1990s, all the


experiments have been directly connected to the computer centre. ‘Tey use the computer


www.scientific-computing.com l


centre first of all as an archival system. In some senses, this may not seem very exciting, but it’s clearly super-important.’ With the beams running inside the LHC,


the experiments register thousands of collisions between the fundamental particles, but local filters at the detectors down in the tunnels select those signals that might be interesting for the physics the experiment is trying to study. Once the event has met these criteria, the data is put on fibre cables directly connected to the data centre. ‘We receive these on disks and from the disks we stream to tape, because essentially it’s the quickest way to create a second copy. Which would typically stay for ever – it’s the archival component of what we are doing,’ Lamanna continued. Archiving is the first of three activities that


the data centre supports, he explained. Te second is to make the disk files immediately available for processing – starting the reconstructions of the sub-nuclear events in preparation for the final analysis across all the data. Te important point to this initial analysis, he said, is to get an idea of the quality of the data, the calibration of the instruments, and the ability to gauge whether the data is okay or whether there are some subtle variations that the experimentalists need to correct immediately. Te third activity is export. From Cern’s


disks, the data is distributed to the LHC computing grid. A network of ‘Tier 1’ centres around the world take the data – depending on which experiment they are collaborating on


WHY CAN’T THE MACHINE TALK DIRECTLY TO THE DISK DRIVE?


– and the data is copied to tape there as well. Tis process ensures that, distributed across the 10 to 12 independent Tier 1 centres, there is a complete copy of the LHC data – a second copy in addition to the one held at Cern itself. Lamanna said: ‘At, for example, CMS [one of the LHC experiments] they are taking 1PB in a week. If you go to RAL [the UK’s Tier 1 centre at the Rutherford Appleton Laboratory] you get 10 per cent; go to Fermilab [the US particle physics laboratory] you get 20 per cent. So across Tier 1 you get a second copy – for business continuity/disaster recovery reasons. My team is in the middle. We are getting the data right from the pit, putting it on tape, giving the data for reconstruction – this is done on our batch machines – and then we run the services that can be interrogated by the experimenters to put us in connection with the Tier 1.’


@scwmagazine TRIUMF Lab Higher Density Storage Enclosures


Innovative storage technologies How the Tier 1 centres organise their storage is largely up to them, although there is a memorandum of understanding (MOU) in place ‘that has stringent requirements about up-times and what kind of services have to be provided,’ as Reda Tafirout, group leader, Atlas Tier 1 at Triumf, the Canadian particle physics laboratory in Vancouver, pointed out. Atlas is one of the other LHC experiments. Not every country can provide a Tier 1 centre for Atlas, he continued, because it’s an expensive project ‘but we had to do it to help the science from the experiment’. ‘Every year Atlas reviews its computing


model, based on a projection of how the LHC will perform, and how much data acquisition time will be happening. What happened in 2012 with the Higgs discovery, we knew there was a new particle so the LHC run for that year was extended by a couple of months – which has consequences for the computing resources. At that time, Canada was one of the very few Tier 1 centres that had spare capacity that was pledged to Atlas.’ In Tafirout’s view, the Tier 1 centres are


such a critical component of the project that they have to have reliable hardware and infrastructure. But while Atlas makes projections of its requirements for up to three years in advance: ‘It is dynamic; there is always room for revisions or variations. So it’s a continuous process.’ Triumf signed the MOU with CERN in 2006. ‘We needed really high- density systems at Triumf; we could not go with traditional servers that were being used at that time at Cern because they had such a large data centre that density was not so critical for them at that time. In our case, the real estate is really


APRIL/MAY 2015 23





TRIUMF lab


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40