Embedded special

completely redundant,’ Tusch said. ‘Tis gives us an opportunity,’ he continued.

‘Maybe we don’t need 3,000 YouTube’s a year to store security footage; if we can only categorise scenes successfully at the edge [i.e. at the device, rather than on a server or in the cloud], and get rid of all the pixels at that point, we don’t even have to transmit the video.’ However, to make sense of security footage,

it’s not just a case of classifying the scene, but also identifying the behaviour of objects – people normally – within those frames. Tere is the potential to do face recognition, for instance, but this kind of processing, if done on a server, is extremely expensive, Tusch noted. To find people at

different scales and different locations would require the equivalent of at least 300 full classifications per frame. Tusch calculated – again just to illustrate the problem – that with 120 million new IP cameras this would cost $132 million per hour running on Amazon Web Services. Tusch said that an

performance. ‘I would argue that traditional architectures are not remotely good enough and many GPUs and DSPs – certainly not CPUs – can’t get anywhere near the kind of performance needed,’ he said. Tere are other architectures, and Apical

developed an engine called Spirit, now owned by ARM, that has low power consumption and could potentially be used for this. ‘I’m not familiar with anything today that can do this, but I’m sure that within a year or two this problem will be solved,’ Tusch remarked. He said that by being able to do complex

By 2022 all new IP cameras will have some kind of embedded convolutional neural network or equivalent inside them

is another. YouTube stores 500 hours of video each minute, Tusch quoted in his presentation; around 3,000 YouTube’s would be needed to store the video created by 120 million new IP cameras. ‘If you look at the largest hosters of video,

they are not Facebook or YouTube, they are mid-sized security video storage companies,’ Tusch said. So, who will watch all this video? Te

answer is machines and artificial intelligence. ‘Tat gives us an opportunity to deal with the problems of transmission and storage,’ Tusch added. Deep learning algorithms, which are trained

to find patterns based on lots of data, are now able to categorise scenes in video, as shown by Google and others that use the technology to make web searches more effective. Classifying a video of a fainting goat is

not funny in the same way as watching the video is, but describing a still from a security camera feed in a few bytes of data is really all that is needed for surveillance – watching the video is not necessary at all. ‘Tere’s really no useful information a pixel can convey, they’re

argument could be made that the systems don’t need to process at full frame; they can rely on motion detection and trigger uploads when needed, but he said that in his experience full frame and full resolution processing is necessary to achieve the required accuracy. ‘You can see that the problem of doing

computer vision on servers is a serious one, even if you can get the video into the cloud and store it,’ he commented. ‘Tese problems of transmission, storage and cloud compute are so great that when people talk about the need to optimise or balance computer vision between servers and the edge, what they really mean is that all of the computer vision has to happen on the device, at the edge, and that the conversion from pixels to objects has to happen there, otherwise you run into scalability issues at the very first stage.’ Can recognition and localisation be

computed on the device today? Tusch said that traditional embedded computing architectures are not fast enough for the kind of complex scene analysis needed for security applications. Running neural networks for deep learning applications means many gigabits per second of pixel data going to and from memory, which puts a limit on the computational

analysis at the edge, on the device, and reduce pixels into metadata, then this could be streamed to the cloud and analysed there. Take iris recognition, which requires around 100 pixels and can be done at a distance of 1.7 metres with a 25 megapixel sensor. ‘You can’t even think about encoding video at that resolution at the edge, let alone transmitting it and storing it,’ Tusch said, ‘but if we have an embedded vision engine running close to the sensor that can process every pixel, we have a viable approach. We can analyse every frame, we can find faces,

we can crop irises, and then we could process locally or send the cropped encoded jpeg up to a server to do the recognition. By taking video out of the problem, and replacing it with pure metadata or at the very least regions of interest, you get something practical and cheap. ‘By 2022 all new IP cameras in the

world will have some kind of embedded convolutional neural network or equivalent inside them,’ he continued, adding that at some point in the future – he gave 2030 as a possible timeline – the vast majority of cameras connected to the internet will not produce any video whatsoever. For that to happen, processing at the

edge will have to be solved, as well as other challenges like real-time scene analysis. Te task of being able to replace pixels completely with object detection at the edge requires a very high accuracy, which doesn’t exist yet, Tusch said. He also added that, for surveillance

applications, all kinds of event have to be detected, and that without deep learning it would be hard to throw away information at the source. For object recognition and object detection, deep learning has strong advantages over traditional approaches. O

June/July 2017 • Imaging and Machine Vision Europe 27

Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52  |  Page 53  |  Page 54  |  Page 55  |  Page 56