SCW_DEC14JAN15

➤

code-named Sequana; a matching soſtware stack, known as the bullx supercomputer suite; a new fast interconnect, code-named BXI; a range of servers with ultra-high memory capacity, known as the bullx S6000 series; and a set of services to assist customers to develop their applications and make the most of exascale. Te new generation of BullXI interconnect

is intended to free the CPU from the overhead of handling communication – communication management is coded into the hardware,

according to Derue. Te ultra-high memory capacity servers, the bullx S6000, are intended to address applications – for example genomics – that require in-memory data processing. Te first model to become available is fully scalable up to 16 CPUs and 24 TB of memory. Sequana is deliberately designed to be

compatible with successive generations of different technologies (CPUs and accelerators) and can scale to tens of thousands of nodes. It will take advantage of Bull’s liquid cooling systems in order to ensure energy efficiency,

Managing innovation – collaboration centre stage

How will innovation for Exascale be managed in future? Perhaps the most significant part of the announcement in November that a consortium of IBM, Nvidia, and Mellanox had won the orders for two new US supercomputers was that it is a consortium, rather than a single company that had won the bid. This raises an interesting question: if a company the size of IBM cannot develop exascale technology by itself, can other computer companies offer credible exascale development paths unaided? IBM has decided that the key point in its strategy is to open up its Power architecture as a way of fast-tracking technological innovation – collaboratively, rather than by one company going it alone. IBM briefings during the week of SC14 understandably had an air not just of the cat having got the cream but rather the keys to the whole dairy. Together with Nvidia’s Volta GPU and Mellanox’s interconnect technologies, IBM’s Power architecture won the contracts to supply the next- generation supercomputers for the US Oak Ridge National Laboratory and the US Lawrence Livermore National Laboratory.

On the Friday before the US

Supercomputing Conference, SC14, opened in New Orleans in late November, the US Government had announced it was to spend $325m on two new supercomputers, and a further $100m on technology development, to put the USA back on the road to Exascale computing (see page 20). Although £325m is now coming the

consortium’s way, Ken King, general manager for open door alliances at IBM, stressed that: ‘From our perspective, more important than the money is the validation of our strategy – that’s what’s getting us excited.’ As Sumit

Gupta, general manager of accelerated computing at Nvidia, put it in an interview: ‘IBM is back. They have a solid HPC roadmap.’

The decision marks a turn-around

in IBM’s standing; its reputation was tarnished when, after four years of trying, it pulled out of a contract to build the Blue Waters systems at the US National Center for Supercomputing Applications (NCSA) at the University of Illinois in 2011. Originally awarded in 2007, the contract was reassigned to

CAN OTHER

COMPANIES OFFER CREDIBLE EXASCALE

DEVELOPMENT UNAIDED?

Cray, which fulfilled the order. At SC14, the consensus was that the announcement was an endorsement of IBM’s decision to open up its Power architecture to members of the OpenPower Foundation and thus build a broad ‘ecosystem’ to support the technology. Gupta pointed out that IBM could have tried to go it alone, but decided to partner with Nvidia and Mellanox via the OpenPower Foundation, and work with them on the bid. ‘Opening the Power architecture – this is the new roadmap and validates what we have done together. When given a fair choice, this is the preferred architecture’. The fact that both Oak Ridge and

Livermore chose the same architecture was seen as a powerful endorsement of this technology development path, particularly as the two laboratories were free to choose different systems because they are funded from different

16 SCIENTIFIC COMPUTING WORLD

portions of the US Department of Energy (DoE) budget – Oak Ridge from the Office of Science and Livermore from the National Nuclear Security Administration. David Turek, vice president of

Technical Computing OpenPower at IBM, pointed out that Livermore has no accelerator-based applications but is now choosing heterogeneity and, he claimed, it was the application engineers at Oak Ridge who were pressing most strongly for the system. The third member of the Collaboration of Oak Ridge, Argonne, and Lawrence Livermore (Coral) project, Argonne National Laboratory, is also funded by the Office of Science within DoE and is therefore constrained to choose a different system from Oak Ridge’s. The Argonne announcement has been deferred into the New Year. The delay has prompted speculation

that Argonne too would have preferred the Power-based solution. After all, Argonne’s current machine is an IBM Blue Gene/Q – called ‘Mira’ – that already uses 16-Core PowerPC A2 processors. But the laboratory was constrained by the purchasing rules to opt for another choice. Cray is not participating in the Coral bidding process, so it is not clear who the alternative provider might be to whom Argonne can turn. However, Paul Messina, director of science for the Argonne Leadership Computing Facility, said: ‘There were more than enough proposals to choose from.’ The Argonne machine will use a different architecture from the combined CPU– GPU approach and will almost certainly be like Argonne’s current IBM machine, which uses many small but identical processors networked together -- an approach that has proved popular for

biological simulations. While the Coral systems would perform at about 100 to 200 petaflops, Messina thought that their successors would be unlikely to be limited to 500 petaflops but that a true Exascale machine would be delivered by 2022, although full production level computing might start later than that. Gupta’s view that opening up the

Power architecture was the new roadmap was echoed by IBM’s David Turek.

He said: ‘We could not have bid for Coral without OpenPower. It would have cost hundreds of millions of dollars and taken us years. Why waste time and money if we could leverage OpenPower to within five per cent of its performance peak? We have lopped years off our plan.’ And in that accelerated development pathway, OpenPower ‘is critical to us’.

He cited the tie-up with Mellanox: although IBM has smart people in networking, he said, by itself it did not command enough expertise. Mellanox had unveiled its EDR 100Gb/s InfiniBand interconnect in June this year, at ISC’14 in Leipzig, and this will have a central role in the new Coral systems. However, Brian Sparks from Mellanox pointed out that the company intends to have a stronger interconnect available for Coral than EDR: ‘200G by 2017 is on our roadmap.’

IBM announced the ‘OpenPower

Consortium’ in August 2013 and said it would: open up the technology surrounding its Power Architecture offerings, such as processor specifications, firmware, and software; offer these on a liberal licence; and use a collaborative development model. However, Turek said, IBM had not outsourced innovation to OpenPower: ‘The bulk of innovation is organic to IBM.’

@scwmagazine l www.scientific-computing.com

and the first version will be available in 2016. Derue said: ‘We are paving the way to Exascale. With our solution, 100 petaflops systems are possible.’ But in all this, Bull too has its eye on scaling

in both directions. It is interested in providing powerful computing cheaply to the smaller enterprises. Because Sequana is modular in concept,

designed as a group of building blocks, customers will find it easy to deploy and to configure for their own needs, Derue said. But

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32