HPC YEARBOOK 2012

Memory HPC 2012

that exist and the current solutions. If the response is that budgetary constraints mean that HPC users will continue to invest in CPUs rather than memory, at least it will give us a clear indication of the best way to move forward. Te industry is losing money right now and end markets aren’t showing a lot of growth, or even predicted growth, but I don’t think the HPC community is aware of these issues. With the very best

technology available today, for example, using 20 Pbytes of system memory – the minimum, an Exaflop computer could operate with – would consume 10MW, which represents 50 per cent of the allowed power consumption for exascale. Tis is unacceptable and so we have to find ways of reducing that energy demand and enhancing the overall system performance through faster memory. Te I/O portion on the chip can oſten account for 40 per cent of the consumption and so introducing optical I/Os could provide a solution in the future. It’s very difficult to predict what will happen,

Chris Gottbrath, principal product manager at Rogue Wave Software, turns his attention to cache memory

“One issue that needs addressing today is the substantial scepticism that still exists where SSDs are concerned”

Considering the fact that cache memory is a really key piece of architecture, it isn’t always consciously thought about by users or programmers. Providing a bridge between the fast processors and the generally slower memory where all the data is stored, the cache is vital to a system’s performance as it decreases the access time, which can otherwise become a bottleneck. Essentially, cache enables data to remain local and readily available to the processor. Writing cache-friendly code isn’t difficult, however it is easy to neglect the cache and assume that the data will automatically be there for the processor. In any situation where the performance or power efficiency of the code matters, not having had that conscious thought can mean that the architecture of the program will be hard on the cache. In terms of the hardware, different grades

but one issue that needs addressing today is the substantial scepticism that still exists where solid state drives (SSDs) are concerned. Te advantages of SSDs over hard disk drives (HDDs) are considerable. Not only will the performance of SSDs be approaching close to 100 times better that HDDs in the next few years, but SSDs have smart IDs that inform the user what the minimum life of the unit will be. HDDs are unpredictable, yet are still being given preference in the industry. We don’t have an answer for this, but it could be that HPC centres are hesitant to open their systems for fear of warranty, liabilities or other legal concerns and so are simply waiting for the next generation of memory to arrive. To stimulate a lively exchange in the community we plan to cooperate with HPC centres and universities and openly communicate the findings with users and producers to effectively address the, oſten unsubstantiated, concerns.

Further information

Altera Corporation www.altera.com

Rogue Wave Software www.roguewave.com

Samsung www.samsung.com

Texas Memory Systems www.ramsan.com

Viking Technology www.vikingtechnology.com

of processors tend to have varying levels of cache memory; some have far more than others. A greater amount of cache is almost always better as it is less sensitive to how tightly the program architecture has been developed to be. Of course, whether developers have been really attentive to cache or not, the situation is that it will rarely ever be enough to hold a program’s entire data set. When our customers take the

time to look closely at cache- optimising their programs, small changes can achieve major improvements in runtime performance. Tis wouldn’t be surprising to see in new or hastily written code, but opportunities for improvement have turned up even in benchmark code that had been repeatedly tuned. A step forward would be to ensure that everyone in the industry is aware of cache and the impact it has on end users in terms of the energy consumption and speed of their programs. On average, it is usually slower by a factor of 100 to move data from the main memory to the processor in a random access manner, than it is to do so from the cache. Te power ratio is also greater when repeatedly fetching data from the main memory. Tis is a critical point and even customers who don’t necessarily think of themselves as part of the HPC community are acutely aware of the importance of lowering the power consumption of their systems. Efficiency is a necessity and the scheduling of tasks is another area where cache can have a

12

significant impact. With companies like Intel launching multi-core processors, the natural thing to assume is that scheduling tasks to all of those cores will be the most efficient way to run a program. If those tasks are all nicely cache-optimised, then that might be true, but if they aren’t then the cache becomes a limiting factor. Despite the fact that operators may be throwing away potential processor clock cycles, it may in some cases be better to run less than the available number of cores. Bandwidth is another hindering factor and

“Providing a bridge between the fast processors and the generally slower memory where all the data is stored, the cache is vital to a system’s performance”

hardware vendors are attempting to provide the right balance of bandwidth and cache availability for the cores on the machine. Te difficulties are that the bandwidth has limitations and cache is both expensive and physically extensive. Tere are people who may want enough bandwidth to render the cache unnecessary, but we have yet to see any company deliver that. Again, the best solution is to give conscious thought to the cache. If programmers are more cognisant, they can ensure that the most effective structure is in place and that they can read from memory in a predictable way. Tere is little the cache can do to help if, for a C++ program, developers need to read or write most (or all) of the elements of a data store that happen to be in a tree or list structure with lots of pointers. It may be faster to store the data in an array so that the cache and pre-fetching logic can pull the data in efficiently. Tere are also hybrid data structures that store the data

in a compact array, but support fast ways to look up individual elements. Using arrays won’t work all the time. It could help or hurt depending on how oſten all the elements are accessed versus walking the tree to read, add or delete one element. Tools like TreadSpotter provide guidance

on where opportunities to optimise the cache exist and where performance issues, such as bandwidth limitations, can be eliminated. Generally, all optimisations do take effort to ensure that the reading of data is predictable and regular, and as oſten as possible from adjacent memory locations. If successful, programs will run several times faster and scalability will be improved, so it is well worth taking the time to ensure that cache memory is at the forefront of considerations. l

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32