Components in Electronics Dec/Jan 2014

Embedded Design

architecture can quickly become bandwidth limited. H.264 cores for FPGAs have been available for some time now, from a number of vendors, although none have been fast or small enough to convert 1080p30 video and still fit in a small- footprint device. Targeting an all- programmable SoC has allowed the A2e micro-footprint H.264 core to overcome this constraint. A single core is capable of encoding and decoding multiple steams of video at resolutions from 720p HD up to 4K Ultra HD (UHD), at frame rates from 15fps to 60fps. Further advantages include simplified hardware design and easier OSD data insertion. Moreover, engineers using these cores are able to manage system latency effectively and meet the demands of real-time control applications.

Secure and simplified OSD In many types of equipment a common requirement is to overlay graphics or data such as a timestamp or GPS coordinates onto the video. In applications such as

the time to receive the packets and decode the video.

It is worth noting that many systems, despite using a custom hardware-based encoder, employ a standard PC-based media player such as VLC to decode the video. These tend to have large buffer delays, which can be as much as 500- 1000ms. To truly control and minimise latency, the system should perform both coding and decoding in hardware. In addition, care should be taken to minimise the delays introduced by buffers, network stacks and streaming servers/clients on both the encoding and decoding sides. To help designers satisfy a variety of applications, including real-time control, A2e has used its H.264 IP to produce both encoder-only and encoder/decoder cores, and has also built special low-latency versions of these cores as well as a low- latency Real-Time Streaming Protocol (RTSP) server. The 1080p30 encode-only core uses 10,000 Look-Up Table (LUTs), or roughly 25% of a Zynq Z7020 FPGA fabric, while the encoder/decoder uses 11,000 LUTs. In the low-latency encoder,

Figure 2. Block diagram of complete encode-decode system

video surveillance, it can be important that this On-Screen Display (OSD) data should be immune to tampering. This can be achieved effectively by adding the OSD information before video compression. Off- the-shelf video SoCs, however, are typically designed to insert packetised OSD information on the decoded-video side. This is less secure, and also complicates design by forcing designers either to send the OSD information as metadata or to use the processing core to time video frames and programmatically write the OSD data to the video buffer.

Building the encoder in an FPGA allows OSD to be added pre-compression simply by dropping in a block of IP. Similarly, additional processing blocks such as fisheye lens correction can also be inserted before the compression engine if required.

Managing latency To achieve latency in the 10-30mS range necessary for control applications, designers must consider the performance of both the encoding and decoding subsystems. The time between the data leaving the sensor and the decoded image appearing on the screen – referred to as “glass-to-glass” latency - is the sum of the video-processing time (including any algorithms such as fisheye lens correction), the time to fill the frame buffer, the video- compression time, software delays associated with transmitting the packets, and then any network delays followed by

www.cieonline.co.uk

only 16 video lines need to be buffered up before compression starts. For a 1080p30 video stream, the latency is less than 500µs. For a 480p30 video stream the latency is less than 1ms. Hence this encoder allows designers to construct systems with low and predictable latency. The low-latency RTSP server differs from a standard RTSP server in two key respects. Firstly, it is removed from the forwarding path, yet continues to maintain statistics using Real-Time Control Protocol (RTCP) and asynchronously updates the kernel drive with the changes in the destination IP or MAC address. Secondly, the kernel driver attaches the necessary headers, based on information from the RTSP server, and injects the packet directly into the network driver to be forwarded immediately. This eliminates the time to execute a memory copy to or from the user space. Figure 2 shows a complete H.264 IP- based video subsystem comprising the A2e low-latency encoder/decoder and low- latency RTSP server implemented in a Zynq 7000 SoC. The system has a combined latency of less than 50ms, while allowing a small PCB footprint and simplified hardware design.

A2e Technologies | www.a2etechnologies.com

Xilinx | www.xilinx.com Allen Vexler is CTO, A2e Technologies

Components in Electronics December 2013/January 2014 37

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52