FEATURE MCUS & MPUS High performance edge processing

Jeff VanWashenova, director of automotive segment marketing at CEVA investigates, building efficient deep-learning networks for autonomous vehicles


ehicle makers are implementing increasingly advanced driver

assistance systems (ADAS), such as lane- departure warning, autonomous braking and parking assistance, as they race towards being able to offer fully autonomous driving. Many of today’s ADAS use machine- vision strategies, implemented using conventional signal-processing techniques, to detect, identify and classify objects. Now the use of deep- learning neural networks are being explored to achieve faster and better scene and object recognition. This is a computationally onerous task, so while established computing architectures are being used to explore deep-learning neural networks, their widespread adoption in vehicles will demand optimised computing architectures.

THREE STEPS TO EMBEDDED NEURAL NETWORKS Many of the new vision analysis systems use convolutional neural networks (CNNs). First, a generic CNN is ‘trained’ to achieve the desired image-processing result, such as recognising objects, by exposing it to images that have been tagged with identifiers for each type of object in the scene: trees, pedestrians, road signs and so on.

The resultant network, whose weights have been established during this phase, is translated to run on the computing resources available in the target system. Once the network has been optimised in this way, the target system can use the CNN to draw inferences about the scene which it is viewing. Training is often done offline on large server farms equipped with standard CPUs, GPUs, or even FPGAs. Once the network is achieving the required recognition performance, the conventional next step is to run the network in the target system on CPUs, GPUs or FPGAs like those used during training, typically using floating-point arithmetic to ensure high precision. This may help get to market quickly,

but it is not the best way to build systems for use in high-volume end applications such as vehicles, where cost and power budget constraints make other approaches a better choice.

14 NOVEMBER 2018 | ELECTRONICS Figure 3:

NeuPro AI processor includes NeuPro Engine and NeuPro VPU

/ ELECTRONICS NeuPro consists of a NeuPro Engine,

which has specialised engines for matrix multiplication, fully connected, activation and pooling layers, and a NeuPro VPU, a fully programmable vector processor unit that can be used to support future development, as well as running the CDNN real-time software framework. The AI processor family supports 8bit or 16bit fixed-point representations to enable overall high accuracy performance.

Figure 1: CDNN software

framework optimises the PC-trained network to run efficiently on CEVA- XM and NeuPro AI Processors

Figure 2: NeuPro AI Processor

TARGETED TRANSLATION AND OPTIMISED COMPUTATION CEVA has developed an alternative approach that does away with the need to use floating-point arithmetic in the target, enabling real-time performance at a power level that is more acceptable for automotive use. It uses a combination of an advanced translation mechanism and an optimised target architecture to enable users to build more efficient CNN implementations. The CDNN (CEVA Deep Neural Network)

software framework automatically converts the floating-point trained network into a fixed-point network and, with additional sophisticated optimisations, makes it suitable for running using CEVA’s embedded AI inference processors. CDNN supports the most advanced

neural networks layers and topologies. Also, since AI technology is continuing to evolve, companies are trying to add their own ‘secrete sauce’ for differentiation. As a result, a flexible solution is a mandatory requirement. CDNN was developed as an adaptive solution to cope with future trends. The company offers its NeuPro family,

which includes four self-contained, specialised AI processors that are complementary with CDNN.

NeuPro support up to 4K 8x8 MACs,

scaling in performance for a broad range of end markets. Ranging from 2 TOPS for the entry-level processor up to 12.5 TOPS for the advanced configuration, it is designed to handle the complete gamut of deep neural network workloads on-device and to cope with future needs. Machine-learning neural networks are

developing rapidly, enabling systems that, in some cases, are better at recognising objects than people are. Although this approach is enabling exciting new capabilities, as always in IC design, it’s vital to make good trade-offs between power and performance. CEVA’s AI processors, together with the CDNN software framework, will help to bring these techniques to a wide variety of embedded applications quickly.

CEVA T: +1 650 417 7900

Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44