Thinking Highways North America Vol 11 No 4

AUTONOMOUS VEHICLES

each different input variable and an output node for each possible outcome. For binary classifiers (eg, pedestrian present or not), there might be just two output layers. For character recognition, there might be over thirty, one for each alphanumeric character and symbol. Deep neural networks have many hidden layers and many nodes at each layer. They require an extremely large set of training cases, requiring huge amounts of data and processing time. In addition, if they are not initialized properly, the may not be trainable to produce the correct out- put. They have become extremely popular and widely used in, for example, image and speech recognition due to Big Data provid- ing the necessary sets of input data, Graphics Processing Units (see below) providing the computing resources to dramatically reduce the required training time, and research dis- coveries guiding how to set the initial condi- tions so that they can be trained. One of the strengths of machine learn-

ing techniques, including deep neural net- works, is that they can generalize, applying the trained and tuned algorithm to handle cases that they have never seen before, just as humans do. This is invaluable in handling a task with the nearly infinite variations encountered in driving. In addition, they pro- vide an estimate of their confidence in the result, which could, for example, be used to trigger a need to return control to a human driver. A significant short-coming of neural networks in particular is that while they can be incredibly accurate, they do not provide information on how they reach a decision (other than a series of weighted formulas for the links shown in Figure 1, which typically do not have the meaningful logic that, for example, a series of if-then rules would have).

OBJECT DETECTION AND SIGN READING Algorithms for detecting objects such as pedestrians or signs in the camera image often use a concept called a sliding win- dow. The system is trained to recognize objects of interest in a fixed window. Each frame of the image from a camera or a lidar is scanned, looking at one window at a time. After a window is examined, the win- dow is slid over slightly to again determine if an object is present, and this process is

12

Figure 3: bounding boxes on two detected pedestrians

repeated until the entire image is scanned. The step size for moving the window is an adjustable parameter. The most accurate but computationally expensive option is to use a step size of just one pixel. Usually a larger step size is used, however if it is too large there is increased risk of splitting the object between windows and there- fore failing to detect it. Of course, this will only find objects of a certain apparent size. Therefore, the process is repeated using larger windows with the same aspect ratio, which are then scaled down to the base size and run through the detection algorithm. Although both cameras and lidar provide

sufficient resolution to determine the bor- ders of objects, only camera images can be used to read traffic signs. To find signs and read their text, a similar process can be used as a first step, which is determining whether or not a sliding window has an element of text in it or not. Once this is done for the entire image, an algorithm is used to deter- mine adjacent windows of text, and those windows are combined into a larger box that is assumed to contain text from the same sign. Once the bounding box for the text has been determined, the image from within the box is processed using an algorithm that has been trained to detect splits between

characters, again by looking as a small win- dow that is slid over, with the window size scaled to the size of the text bounding box. This is called character segmentation. Once this is done, character classification can be applied to recognize the individual character found in the windows between the separat- ing space, and then the recognized charac-

ters combined into words. Of course, this just touches the basics.

Some algorithms, for example, model a pedestrian as the sum of major body parts, and search for these parts as part of the object detection algorithm, which can be useful for detecting partially obscured pedestrians.

GRAPHICS PROCESSING UNITS (GPUS) It would take far too long to sequentially do all this processing for various windows in every captured video frame. This is one of the processing areas where Graphics Processing Units (GPUs) excel. GPUs were originally designed to offload graphics pro- cessing from the CPU in desktop comput- ers. Rather than just a few general purpose processors, they possess thousands of spe- cial purpose processors that are optimized for mathematical operations. Their use has spread from graphics to machine learn- ing, including use in vehicle automation systems. They are well suited for what are called “embarrassingly parallel” problems where little or no interaction is required between the separable parts. Process- ing each window in an image to detect objects is a perfect example, where the results for each window is independent of any processing of other windows. GPUs such as Nvidia’s DRIVE GPUs designed for prototyping automated vehicles have over 3000 processing cores, so that, for exam- ple, thousands of image windows can be processed in parallel, yielding many orders of magnitude improvement in processing speed.

www.thinkinghighways.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52 | Page 53 | Page 54 | Page 55 | Page 56 | Page 57 | Page 58 | Page 59 | Page 60 | Page 61 | Page 62 | Page 63 | Page 64 | Page 65 | Page 66 | Page 67 | Page 68

orderForm.title