search.noResults

search.searching

dataCollection.invalidEmail
note.createNoteMessage

search.noResults

search.searching

orderForm.title

orderForm.productCode
orderForm.description
orderForm.quantity
orderForm.itemPrice
orderForm.price
orderForm.totalPrice
orderForm.deliveryDetails.billingAddress
orderForm.deliveryDetails.deliveryAddress
orderForm.noItems
HIGH PERFORMANCE COMPUTING


how realism guides evolution vs revolution James Reinders is a parallel programming and HPC expert with more than 27 years’ experience working for Intel until his retirement in 2017. In this article Reinders gives his take on the use of roofline estimation as a tool for code optimisation in HPC


Performance tuning by setting goals:


Roofline Analysis is a technique that projects a view of realism into optimisation targets. It lets us know when we’ve tuned all we can (assuming evolution of our code) which may uncover the unsettling fact that we need a new algorithm (revolution). As a long-time teacher of optimisation


techniques, I can confidently say that Roofline analysis is a must-have for anyone optimising for performance. This has not always been the case. As I will explain, today it is an important technique to draw upon when doing performance optimisation.


When mentioning Roofline Analysis, I


have been asked ‘Hasn’t that been around awhile?’, usually followed by ‘What’s new?’ Excellent questions. The answers


revolve around two factors: (1) complexities (latency hiding through parallelism and memory hierarchies) in optimising for today’s processing architectures – including CPUs, GPUs, and accelerators of all kinds,


(2) new tools, based on new research, to


”Roofline analysis – a technique to know when we’ve tuned all we can (evolution) which may uncover the unsettling fact that we need a new algorithm (revolution)”


4 Scientific Computing World August/September 2017


help us deal with these complexities. In the face of increasingly complicated systems, Roofline Analysis provides us with a step-by-step method to ascertain whether an algorithm has reached the end of its ability to provide more performance through continued optimisation work.


Complexities in optimising for today’s systems Today we are faced with a great diversity of compute devices, ranging from Intel Xeon scalable processors, and GPUs, to more application-specific accelerators enabled by FPGAs and ASIC technologies. It’s not the diversity that demands


Roofline analysis, it’s the complexity of the architectures of the individual devices. Specifically, it is their complex abilities to hide latencies, and the sophisticated parallel compute capabilities and


multilevel memory subsystems that play critical roles in such latency hiding. Years ago, performance optimisation was successful if we could reduce the number of instructions being executed. Such optimisations were nearly always rewarded by performance improvements. That is not the case today. Fortunately, Roofline analysis addresses these complications in optimisation work.


New tools, new research, how to cope The technique of Roofline analysis has recently seen a surge in study, resulting in some interesting papers and tutorials. Throughput optimisation techniques tend to be effective everywhere. Therefore, tuning investments using roofline analysis done on an Intel Xeon Scalable processor- based server, where the development environments are rich and mature, will lead


@scwmagazine | www.scientific-computing.com


Anilinn/Shutterstock.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36
Produced with Yudu - www.yudu.com