HIGH PERFORMANCE COMPUTING Vectorisation of a stencil-based code
Intel's Cedric Andreolli, Jim Cownie and Henry Gabb describe the use of Intel Advisor to vectorise stencil-based code
Panel 1
In a previous article, we introduced a wave propagation kernel called Iso3DFD. This kernel suffered from inefficient memory accesses that were detected and solved using Intel Advisor. The problem arose from suboptimal loop ordering. Reordering loops resulted in a four-times speedup. The previous article focused on the Roofline Model and Memory Access Pattern Analysis (MAP) available in Intel Advisor. In this article, we will continue to use Intel Advisor to improve vectorisation. Intel Advisor will check the level of vectorisation in every loop, and offer advice (hence, the name Advisor) to improve performance.
Verifying vectorisation with Intel Advisor This analysis can be run through the GUI or directly from the command-line. Survey analysis uses sampling to time the loops and functions, and extracts information
#define HALF_LENGTH 8 #pragma omp parallel for for(int iz= HALF_LENGTH; iz<nz- HALF_LENGTH; iz++) { for(int int iy= HALF_LENGTH; iy<ny- HALF_LENGTH; iy++) { for(int ix= HALF_LENGTH; ix<nx- HALF_LENGTH; ix++) { int offset = iz*dimnXnY + iy*nx + ix; float value = 0.0; value += ptr_prev[offset]*coeff[0]; for(int ir=1; ir<=HALF_LENGTH; ir++) { value += coeff[ir] * (ptr_prev[offset + ir] + ptr_prev[offset - ir]); value += coeff[ir] * (ptr_prev[offset + ir*nx] + ptr_prev[offset - ir*nx]); value += coeff[ir] * (ptr_prev[offset + ir*dimnXnY] + ptr_prev[offset - ir*dimnXnY]);
} ptr_next[offset] = 2.0f* ptr_prev[offset] - ptr_next[offset] + value*ptr_vel[offset]; }}}}
about vectorisation. Advisor can provide more details when the application is compiled with Intel compilers (see Figure 1). The code of Iso3DFD, after reordering the loops, is presented in panel 1.
After running the survey, Advisor
reports that our two main hotspots are scalar loops (all the CPU time is spent in scalar loops). It indicates that the compiler was not able to auto-vectorise these loops. This behavior happens because of potential pointer aliasing. For example, if our pointers refer to the same memory addresses, vectorisation might change the results of the computation. The compiler has no way of knowing whether pointers refer to the same memory location, so it must be conservative and assume that it is unsafe to vectorise the code. To help the developer decide whether it’s safe to vectorise, Advisor
Figure 1
Figure 2 14 Scientific Computing World June/July 2019
Figure 3 @scwmagazine |
www.scientific-computing.com
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32