IMVE February/march 2020

NEURAL NETWORKS

Embedded vision gains open standards boost

Khronos Group president Neil Trevett discusses open API standards for applications using machine learning and embedded vision, before addressing this at the Embedded World trade fair

T

echnologists in the machine learning industry are finding new opportunities for deploying devices

and applications that leverage neural network inferencing, now with unseen levels of vision-based functionality and accuracy. But with this rapid evolution comes a dense and confusing landscape of processors, accelerators and libraries. Tis article assesses the role open interoperability standards play in reducing the costs and barriers to using inferencing and vision acceleration in real-world products. Every industry needs open standards to

reduce costs and time-to-market through increased interoperability between ecosystem elements. Open standards and proprietary technologies have complex and interdependent relationships. Proprietary APIs and interfaces are often the Darwinian testing ground and can remain dominant in the hands of a smart market leader; that is as it should be. Strong open standards result from a wider need for a proven technology in the industry and can provide healthy, motivating competition. In the long view, an open standard not controlled by, or dependent on, any single company can often be the thread of continuity for progress as technologies, platforms and market positions swirl and evolve. Te Khronos Group is a non-profit

standards consortium open to any company, with more than 150 members. All standards organisations exist to provide a safe place for competitors to co-operate for the good of all. Te Khronos Group creates open, royalty-free API standards that enable software applications libraries and engines to harness the power of silicon acceleration for demanding use cases, such as 3D graphics, parallel computation, vision processing and inferencing.

Embedded machine learning Many interoperating pieces need to work together to train a neural network and deploy it successfully on an embedded, accelerated inferencing platform. Effective neural network training typically takes large datasets, uses floating point precision and is run on powerful GPU-accelerated desktop machines or in the cloud. Once trained, the neural network is combined with an inferencing run-time engine optimised for fast tensor operations, or a machine learning compiler that transforms the neural network description into executable code. Whether an engine or compiler is used, the final step is to accelerate the inferencing

‘Tere is increasing interest in these standards as... parallel programming becomes the most effective way to deliver performance’

code on one of a diverse range of accelerator architectures, ranging from GPUs through to dedicated tensor processors. How can industry open standards help

streamline this process? Figure 1 illustrates Khronos standards that are being used in the field of vision and inferencing acceleration. In general, there is increasing interest in all these standards, as processor frequency scaling gives way to parallel programming as the most effective way to deliver performance at acceptable levels of cost and power. Broadly, these standards can be divided into two groups: high-level and low-level.

20 IMAGING AND MACHINE VISION EUROPE FEBRUARY/MARCH 2020

Te high-level APIs focus on ease of programming with effective performance portability across multiple hardware architectures. In contrast, low-level APIs provide direct, explicit access to hardware resources for maximum flexibility and control. It is important that each project understand which level of API will best suit their development needs. Also, often the high-level APIs will use lower-level APIs in their implementation. Let’s take a look at some of these Khronos standards in more detail.

SYCL: C++ single-source heterogeneous programming SYCL (pronounced ‘sickle’) uses C++ template libraries to dispatch selected parts of a standard ISO C++ application to offload processors. SYCL enables complex C++ machine learning frameworks and libraries to be compiled and accelerated to performance levels that, in many cases, outperform hand-tuned code. By default, SYCL is implemented over the lower- level OpenCL standard API, feeding code for acceleration into OpenCL and the remaining host code through the system’s default CPU compiler. Tere are an increasing number of SYCL

implementations, some of which use proprietary back-ends, such as Nvidia’s Cuda for accelerated code. Significantly, Intel’s new OneAPI initiative contains a parallel C++ compiler called DPC++ that is a conformant SYCL implementation over OpenCL.

Neural network exchange format Tere are dozens of neural network training frameworks in use today, including Torch, Caffe, TensorFlow, Teano, Chainer, Caffe2, PyTorch, MXNet and many more – and all use proprietary formats to describe their trained networks. Tere are also dozens, maybe even hundreds, of embedded inferencing processors hitting the market. Forcing that many hardware vendors to understand and import so many formats is a classic fragmentation problem that can be solved with an open standard. Te neural network exchange format

(NNEF) is targeted at providing an effective bridge between the worlds of network

@imveurope | www.imveurope.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40

orderForm.title