search.noResults

search.searching

saml.title
dataCollection.invalidEmail
note.createNoteMessage

search.noResults

search.searching

orderForm.title

orderForm.productCode
orderForm.description
orderForm.quantity
orderForm.itemPrice
orderForm.price
orderForm.totalPrice
orderForm.deliveryDetails.billingAddress
orderForm.deliveryDetails.deliveryAddress
orderForm.noItems
POWER


How to cut the thirsty demands of AI


Research by academics at the University of California has highlight the huge amount of water used to cool the computer clusters that power ChatGPT.


in Figure 2. In Semidynamics’ All-In-One solution, the Tensor Unit (TU) takes care of matrix multiplication, whereas the Vector  and SoftMax.


Comparison of traditional AI architecture to Semidynamics’ new All-In-One integrated solution


A


ccording to the Times, the study, entitled “Making AI less thirsty”, says that writing a 100-word email using ChatGPT consumes 140Wh of energy and requires 500ml of water for cooling. All this heat in the computers essentially comes from moving data around on the chips. The greater the performance, the more waste heat has to be dealt with.


This brings home the correlation between the intensive processing required by AI and the resulting waste heat. The same issue of waste heat applies with an individual AI chip. So, as the performance demands of AI chips continue to increase rapidly so does the issue of waste heat. Plus, of course, designers also want to keep the power budget down as well as not having to use cooling systems for chips.


Semidynamics has solved this issue by creating a new architecture for AI chips that  movement on a chip and therefore reduces the power required and the waste heat produced.


The traditional AI design uses three separate computing elements: a CPU, a GPU (Graphical Processor Unit) and an NPU (Neural Processor Unit) connected through a bus. This traditional architecture requires the energy-hungry moving of data to and


from the bus. In contrast, Semidynamics has re-invented AI architecture and integrated the three elements into a single, scalable processing element. This combines a RISC-V core, a Tensor Unit that handles matrix multiplication (playing the role of the NPU) and a Vector Unit that handles activation-like computations (playing the role of the GPU) into a fully integrated, all-in-one compute element, as shown in Figure 1.


This means that the previous, energy-hungry, data movements to and from the bus are no longer required as the data is moved around within the processing element with zero energy cost so the chip’s power use is reduced compared to the traditional architecture and, therefore, the production of waste heat is also reduced. The company has found main memory bandwidth reduction of up to 30 percent  (applications and batch size), compared to a comparable traditional architectural  savings.


Large Language Models (LLMs) have emerged as a key element of AI applications. LLMs are computationally dominated by self-attention layers, shown in detail in Figure  multiplications (MatMul), a matrix Transpose and a SoftMax activation function, as shown


40 NOVEMBER 2024 | ELECTRONICS FOR ENGINEERS


Since the Tensor and Vector Units share the vector registers, expensive memory copies can be largely avoided. Hence, there is zero latency and zero energy spent in transferring data from the MatMul layers to the activation layers and vice versa. In addition, to keep the TU and the VU continuously busy, weights and inputs must  vector registers. To this end, Semidynamics’ Gazzillion Misses technology provides unprecedented ability to move data. By  misses, data can be fetched ahead-of-time yielding high resource utilisation. Furthermore, Semidynamics’ custom tensor extension includes new vector instructions optimised for fetching and transposing 2D tiles, greatly improving tensor processing.


Figure 2 Attention Layer in LLM


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52  |  Page 53  |  Page 54