search.noResults

search.searching

dataCollection.invalidEmail
note.createNoteMessage

search.noResults

search.searching

orderForm.title

orderForm.productCode
orderForm.description
orderForm.quantity
orderForm.itemPrice
orderForm.price
orderForm.totalPrice
orderForm.deliveryDetails.billingAddress
orderForm.deliveryDetails.deliveryAddress
orderForm.noItems
HIGH PERFORMANCE COMPUTING


to optimisations that help other compute devices. We can choose whatever environment with which we are most comfortable, and wherever a tool happens to run best, to get the most important tuning work done to improve throughput.


When roofline confirms our fears (but reduces futile optimisation attempts) Roofline analysis can hint that we should find a new algorithm in two ways: (1) It reveals that the arithmetic intensity (AI) is low, therefore the peak capabilities are not well utilised. We may find ourselves needing to find an algorithm that can get closer to peak performance, when optimisations to the current approach fail to be possible in critical parts of our application.


(2) It reveals that AI is high, but performance falls short of what we need, want, or believe should be possible. Only an algorithmic change can give us better performance on a machine, if we are already close to a machine’s peak performance.


If this seems a bit circular, you are right. When we have low-AI, we seek to make it high-AI, through algorithmic change if


”Roofline analysis helps us avoid chasing optimisations that do not improve performance. I cannot overstate how valuable this is!”


optimisation is not possible. No matter how we reach high-AI, we are faced with the need for algorithm change to go further. Being told we need to rewrite using a


new algorithm is not necessarily welcome news. The good news about the Roofline analysis technique is that it clarifies for us whether these needs are truly present. Knowing that can prevent a lot of time vainly spent seeking optimisations that simply do not exist. An example of this is ‘reducing cache misses’. Specific ‘stall’ event monitoring counters (emon counters) added to Intel processors (with Intel Xeon Scalable processors offering the greatest support in quantity and diversity), allow tools to find cache misses that are actually causing delays (stalls) and therefore causing lower-AI.


Roofline analysis can incorporate stall


information into its technique, helping us avoid chasing optimisations that do not improve performance. I cannot overstate how valuable this is!


Intel automated much of the tedious work in doing a Roofline analysis Intel has implemented Roofline analysis into a feature in its Intel Advisor tool (free versions available) so we can explore our own applications, and get concrete feedback on application-specific bottlenecks. Sophisticated, and easy-to-use instrumentation, it relies on strong support for stall accounting present in Intel processors, with the broadest capabilities being in the Intel Xeon Scalable processors found in servers and supercomputers. I highly recommend a variety of reading


material from Berkeley Labs, and the Intel Advisor tools including some excellent tutorials on its usage.


James Reinders is a Parallel Programming and HPC expert with more than 27 years’ experience working for Intel until his retirement in 2017. Reindeers is the author of eight books in the HPC field in addition to numerous papers and blogs.


COMSOL


CONFERENCE 2018


INDIA | August 9–10 UNITED STATES | October 3–5 SWITZERLAND | October 22–24 CHINA | November 1–2 TAIWAN | November 9 SOUTH KOREA | November 23 JAPAN | December 7


Experience the premier event for


multiphysics simulation. Register today at: comsol.com/conference


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36
Produced with Yudu - www.yudu.com