search.noResults

search.searching

saml.title
dataCollection.invalidEmail
note.createNoteMessage

search.noResults

search.searching

orderForm.title

orderForm.productCode
orderForm.description
orderForm.quantity
orderForm.itemPrice
orderForm.price
orderForm.totalPrice
orderForm.deliveryDetails.billingAddress
orderForm.deliveryDetails.deliveryAddress
orderForm.noItems
Feature: AI


spot combining mixed precisions to be most efficient, such as 8-bit for the first five layers, 4-bit for the next five layers and 1-bit for the last two layers, for example. The last part, and probably the most critical one requiring


hardware adaptability, is custom memory hierarchy. Constantly pumping data into a powerful engine to keep it busy is crucial, and customised memory hierarchy is needed from internal memory to external DDR/HBM to keep up with layer-to-layer memory transfer needs.


The rise of AI productisation Harnessing DSA to make AI models more efficient fuels the growth of AI applications: classification, object detection, segmentation, speech recognition and recommendation engines are just some examples of what is being productised, with new ones emerging every day. In addition, there is a second dimension to this complex growth. Within each application, more models are being invented to either improve model accuracy or make it less cumbersome. In classification, whilst AlexNet in 2012 was the first breakthrough in deep learning, it was fairly simple with a feed-forward network topology. Hot on its heels, in 2013 Google introduced


30 September/October 2020 www.electronicsworld.co.uk


Googlenet, with a map-and-reduce topology. Modern networks like DenseNet and MobileNet now have Depthwise convolution and skip-through where data is sent to many layers ahead.


DSA trends This level of innovation puts constant pressure on existing hardware, requiring chip vendors to innovate fast. Here are a few recent trends that are pushing the need for new DSAs: Depthwise convolution is an emerging layer that to be


efficient requires large memory bandwidth and specialised internal memory caching. Typical AI chips and GPUs have fixed L1/L2/L3 cache architecture and limited internal memory bandwidth, resulting in very low efficiency. Researchers are constantly inventing new custom layers, but chips today simply can’t provide native support for them. Because of this, they need to run on host CPUs without acceleration, often becoming the performance bottleneck. Sparse neural network is another promising optimisation,


where networks are heavily pruned, sometimes up to 99% reduction, by trimming their edges, removing fine-grained matrix values in convolution, and so on. However, to run this


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52  |  Page 53  |  Page 54  |  Page 55  |  Page 56  |  Page 57  |  Page 58  |  Page 59  |  Page 60  |  Page 61  |  Page 62  |  Page 63  |  Page 64  |  Page 65  |  Page 66  |  Page 67  |  Page 68