This page contains a Flash digital edition of a book.
HPC facilities HPC 2012 Dedicated to discovery A behind-the-scenes look at the systems supporting scientific research


Michael E. Papka, director of the Argonne Leadership Computing Facility, gives a US perspective


Starting in 2004, the US Department of Energy’s (DOE) Office of Science began to add dramatically to the supercomputing capabilities in the DOE


National Laboratory complex. Tis led the way in developing the nation’s supercomputing capabilities to enable researchers to model and simulate experiments that could never be performed in a laboratory, and for US industry to perform ‘virtual prototyping’ of complex systems and products. Te computer architecture of the Blue


Gene series of machines, which began with the Blue Gene/L, is the result of a DOE and IBM Research co-design initiative involving Argonne National Laboratory, Lawrence Livermore National Laboratory and IBM Research. Tis unique multi- million R&D effort to enhance the capabilities of the fastest computer then in existence brought together an exceptional group of computer and soſtware designers. Te team’s vision for these systems directed the future architectural evolution of the product line in protocols, programming models and overall system design. Argonne National Laboratory’s 10 petaflops


IBM Blue Gene/Q supercomputer, Mira, is part of the Office of Science’s plan to meet mission critical needs in HPC. It may only be nearing production use now, but it can already claim three standouts. Te system ranks third on the June 2012 Top500 list of the world’s fastest supercomputers, tops the Graph 500 Benchmark for data-intensive computing, and is among a bevy of world-leading energy- efficient IBM machines. From power and cooling upgrades to the phased process of installing and testing each of its 48 racks, Mira’s deployment at the Argonne Leadership Computing Facility (ALCF) spanned nearly two years. ALCF staff are currently migrating 16


4


Early Science application codes from Mira’s testing and development racks to the full- scale system. Tese projects, submitted by research teams from across the country and assisted by computational scientists, cover the range of scientific fields, numerical methods, programming models and computational approaches expected to run on Mira. Leadership class systems must support


“Since its founding in 2006, the ALCF has provided more than four billion core hours”


a workload that comprises many very large jobs. Mira’s predecessor Intrepid, an IBM Blue Gene/P, supports the science runs of many different areas that employ 124,000 or more cores. Mira’s workload will include even larger jobs. Tese application codes frequently use most or all of the cores in the system, may involve intense communications among the nodes, and may generate massive I/O loads. Terefore, the system must have an interconnection network with low latency even for jobs that use the entire configuration; it must be sufficiently reliable that most long- running jobs will not be affected by system failures; its I/O speed and capacity must be high at the


hardware level; and the file system soſtware must have high performance. In addition, the system soſtware must also be able to support such a workload with high efficiency, and the soſtware tools must be capable of processing and handling such demands. Te ALCF is committed to delivering


786 million core hours on Mira in 2013. In full production mode starting in 2014, more than five billion core hours will be allotted to scientists each year. With Mira, scientists and researchers are gaining a powerful tool to pursue bigger and more complex energy problems that impact the entire planet.


Michael E. Papka is the director of the ALCF and the deputy associate laboratory director for computing, environment and life sciences at Argonne National Laboratory. Dr Papka is also an associate professor of computer science at Northern Illinois University.


Dominik Ulmer, general manager at CSCS, discusses supercomputing in Switzerland


Science is a very competitive environment and the general role of CSCS, the Swiss National Supercomputing Centre, is to provide scientists with the ability to not only do


world-class research, but to be the first in their fields. In addition to delivering the necessary computing capacity, we provide access to the most innovative technologies and computing methods. Switzerland is not the biggest country, and it is very clear that we will never be home to the largest supercomputers in Europe, but because Swiss universities rank highly in international standards, our researchers are used to an incredibly high standard in computing capabilities. Our role is to satisfy those demands. Te first step is to run very large systems. Te


infrastructure is critical to this and in March 2012 a specially-built computer room that had been in development since 2009 came into operation. It is possible for this room to house machines that consume up to 20MW of total power, however there is the small issue of not being able to budget for that large an electricity bill! We don’t currently have systems approaching that level, but now that we are no longer limited by the infrastructure, we could do so in the future. When considering such high power consumption, however, energy efficiency becomes a necessity as any energy overhead would cost millions of Swiss Francs. CSCS uses free cooling by taking water from the Lake of Lugano, from a certain depth where the temperature is a constant 7°C, and then pumping it to the centre nearly 3km away. While effective, this method of cooling brings its own special set of complications, the main one being the need to interact heavily with the environment and gain consensus among the local population. Te second step is ensuring that relationships


within the industry are maintained. At CSCS, we have a history of gaining access to technology early on, and interfacing heavily with the industry is the best way of not only getting the hardware, but keeping up with the methods for programing these new systems as they develop. In addition


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32