Artificial Intelligence Technology
Figure 1 Frequency vs area Frequency and area:
Figure 1 shows how frequency varies with area. It shows three factors to consider in the memory subsystem:
There is a large variation in memory area – the largest memory (which is constructed from a large number of small capacity SRAM instances) has almost twice the area of the smallest.
The variation in frequency is much smaller – the fastest is only 18 per cent greater than the slowest.
In general, faster memory systems are bigger than slower ones. This is because the faster ones are constructed using large numbers of small, fast, SRAM instances, which have a higher relative overhead for peripheral circuits. However, there is a significant amount of variation – for a given speed there is more than one SRAM instance that could be used, and the choice of which one to use can make a significant difference to total area. For example, for a 10 per cent speedup
over the slowest memory, there is a 30 per cent area difference between the largest and smallest possible areas. Alternatively, for a given memory area, there can be a 10 per cent difference in speed between the slowest and fastest SRAM options.
Frequency, read current and area: Figure 2 compares frequency and read current. The underlying data shows a strong correlation between read and write currents, and so, in the rest of this section, we will simply refer to active power without distinguishing between reads and writes. Figure 2 shows four outliers with significantly higher currents than all the others. These four all use SRAM instances with relatively narrow output words and wide column multiplexers. This means that they have multiple instances running in parallel, each with long internal wires (and therefore large internal active capacitances) leading to the high total current. Even ignoring these outliers, there is still a large range in active
Figure 2 Frequency vs read current
power – there is a 3x difference between highest and lowest currents, and therefore scope for significant power optimisation. Note that in Figure 2, some points are coloured blue and some orange. These correspond to the points in the bottom half and top half of Figure 1 respectively (i.e., points corresponding to an area more than 125 per cent of the minimum are orange and those below this limit are blue). This indicates that there is not a strong correlation between active power and area – in fact, the option with the lowest power is an orange dot (larger area), and of the 13 lowest power options (below the 150 per cent relative read current line) 7 are blue dots and 6 are orange. The implication of this is that there is scope for some independent optimisation of area and active power.
Summary:
This section has highlighted that there is a wide variation in area and
active power between the options for constructing our example 1Mbyte, 128-bit word memory. The data indicates strong correlation between leakage and area, but not between active power and area, or frequency, so that it is possible to find options which optimise all three together.
Going beyond the standard compiler In this section, we consider whether it is possible to do better than the best options from our existing compiler. sureCore has previously developed application-specific memories for particular customers – does it look possible to do the same for AI?
Read power and write power: It was mentioned above that there was strong correlation between read and write power. This was a design choice in the compiler, as many applications perform similar numbers of reads and writes. However, AI applications can show a heavy skew towards reads. This means that changes that optimise read power at the expense of write power are worth exploring in future.
Capacity and word length:
The memories that were identified above as having the best active power and area all have wide output words and narrow multiplexers (i.e., the opposite of the narrow words and wide multiplexers identified as having high active power). Would wider words do even better? The current compiler is limited to 128-bit words by RC effects that impact frequency. However, for AI, where it is bandwidth that matters rather than frequency, a frequency reduction to increase the word length could be a good option.
www.sure-core.com
www.cieonline.co.uk Components in Electronics December/January 2023 31
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44 |
Page 45 |
Page 46 |
Page 47 |
Page 48 |
Page 49 |
Page 50 |
Page 51 |
Page 52 |
Page 53 |
Page 54 |
Page 55 |
Page 56 |
Page 57 |
Page 58 |
Page 59 |
Page 60 |
Page 61 |
Page 62 |
Page 63