U.S. TECH July 2016 Issue

Page 10

www.us- tech.com

TechWaTch Parallel Programming Made Easy By Larry Hardesty, MIT News Office

chine. But it rarely works out that way. Most computer programs are se- quential, and splitting them up so that chunks of them can run in parallel causes all kinds of complications. In the May/June issue of the In-

I

stitute of Electrical and Electronics Engineers’ journal Micro, research - ers from MIT’s Computer Science and Artificial Intelligence Laborato- ry (CSAIL) presented a new chip de- sign they call Swarm, which should make parallel programs not only much more efficient but easier to write. In simulations, the researchers

compared Swarm versions of six com- mon algorithms with the best exist- ing parallel versions, which had been individually engineered by seasoned software developers. The Swarm ver- sions were between three and 18 times as fast, but generally required only one-tenth as much code — or even less. And in one case, Swarm achieved a 75-fold speedup on a pro- gram that computer scientists had so far failed to parallelize. “Multicore systems are really hard to program,” says Daniel

n theory, a program on a 64-core machine would be 64 times as fast as it would be on a single-core ma-

Sanchez, an assistant professor in MIT’s Department of Electrical Engi- neering and Computer Science, who led the project. “You have to explicit- ly divide the work that you’re doing into tasks, and then you need to en- force some synchronization between tasks accessing shared data. What this architecture does, essentially, is to remove all sorts of explicit syn- chronization, to make parallel pro- gramming much easier. There’s an especially hard set of applications that have resisted parallelization for many, many years, and those are the kinds of applications we’ve focused on in this paper.” Many of those applications in-

volve the exploration of what com- puter scientists call graphs. A graph consists of nodes, typically depicted as circles, and edges, typically depict- ed as line segments connecting the nodes. Frequently, the edges have associated numbers called “weights,” which might represent, say, the strength of correlations between data points in a data set, or the distances between cities. Graphs crop up in a wide range

of computer science problems, but their most intuitive use may be to de- scribe geographic relationships. In-

deed, one of the algorithms that the CSAIL researchers evaluated is the standard algorithm for finding the fastest driving route between two points.

Setting Priorities In principle, exploring graphs

would seem to be something that could be parallelized: Different cores could analyze different regions of a graph or different paths through the graph at the same time. The problem is that with most graph-exploring al- gorithms, it gradually becomes clear that whole regions of the graph are irrelevant to the problem at hand. If, right off the bat, cores are tasked with exploring those regions, their exertions end up being fruitless. Of course, fruitless analysis of ir-

relevant regions is a problem for se- quential graph-exploring algorithms, too, not just parallel ones. So comput- er scientists have developed a host of application-specific techniques for pri- oritizing graph exploration. An algo- rithm might begin by exploring just those paths whose edges have the lowest weights, for instance, or it might look first at those nodes with the lowest number of edges. What distinguishes Swarm from

PCB Prototypes & Medium Volume

Get the best Price & Service for special technology Proto‘s & volume PCB quantities

Flex PCB Prototypes Favourable pricing 

FREE Stencil with every PCB-POOL® prototype order

For information, please call at

916 241 9062 www.pcb-pool.com

other multicore chips is that it has ex- tra circuitry for handling that type of prioritization. It time-stamps tasks ac- cording to their priorities and begins working on the highest-priority tasks in parallel. Higher-priority tasks may engender their own lower-priority tasks, but Swarm slots those into its queue of tasks automatically. Occasionally, tasks running in

parallel may come into conflict. For instance, a task with a lower priority may write data to a particular mem- ory location before a higher-priority task has read the same location. In those cases, Swarm automatically backs out the results of the lower-pri- ority tasks. It thus maintains the synchronization between cores ac- cessing the same data that program- mers previously had to worry about themselves. Indeed, from the programmer’s

perspective, using Swarm is pretty painless. When the programmer de- fines a function, he or she simply adds a line of code that loads the function into Swarm’s queue of tasks. The pro- grammer does have to specify the met- ric —such as edge weight or number of edges — that the program uses to pri- oritize tasks, but that would be neces- sary, anyway. Usually, adapting an existing sequential algorithm to Swarm requires the addition of only a few lines of code.

Keeping Tabs The hard work falls to the chip

itself, which Sanchez designed in col- laboration with Mark Jeffrey and Suvinay Subramanian, both MIT graduate students in electrical engi- neering and computer science; Cong Yan, who did her master’s as a mem- ber of Sanchez’s group and is now a

PhD student at the University of Washington; and Joel Emer, a pro- fessor of the practice in MIT’s De- partment of Electrical Engineering and Computer Science, and a senior distinguished research scientist at the chip manufacturer NVidia. The Swarm chip has extra cir-

cuitry to store and manage its queue of tasks. It also has a circuit that records the memory addresses of all

New chip design

makes parallel programs run many times faster

and requires one-tenth the code.

the data its cores are currently work- ing on. That circuit implements something called a Bloom filter, which crams data into a fixed allot- ment of space and answers yes/no questions about its contents. If too many addresses are loaded into the filter, it will occasionally yield false positives — indicating “Yes, I’m stor- ing that address,” — but it will never yield false negatives. The Bloom filter is one of sever-

al circuits that help Swarm identify memory access conflicts. The re- searchers were able to show that time-stamping makes synchroniza- tion between cores easier to enforce. For instance, each data item is la- beled with the time stamp of the last task that updated it, so tasks with later time-stamps know they can read that data without bothering to determine who else is using it. Finally, all the cores occasionally

report the time stamps of the highest- priority tasks they’re still executing. If a core has finished tasks that have earlier time stamps than any of those reported by its fellows, it knows it can write its results to memory without causing any conflicts. “I think their architecture has

just the right aspects of past work on transactional memory and thread- level speculation,” says Luis Ceze, an associate professor of computer sci- ence and engineering at the Univer- sity of Washington. “ ‘Transactional memory’ refers

to a mechanism to make sure that multiple processors working in par- allel don’t step on each other’s toes. It guarantees that updates to shared memory locations occur in an orderly way. Thread-level speculation is a re- lated technique that uses transac- tional-memory ideas for paralleliza- tion: Do it without being sure the task is parallel, and if it’s not, undo and re-execute serially. Sanchez’s ar- chitecture uses many good pieces of those ideas and technologies in a cre- ative way.”r

Reprinted with permission of MIT News <http://news.mit.edu/2016/parallel- programming-easy-0620>

July, 2016

PCB-POOL® is a registered trademark of Beta LAYOUT GmbH

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52 | Page 53 | Page 54 | Page 55 | Page 56 | Page 57 | Page 58 | Page 59 | Page 60 | Page 61 | Page 62 | Page 63 | Page 64 | Page 65 | Page 66 | Page 67 | Page 68 | Page 69 | Page 70 | Page 71 | Page 72 | Page 73 | Page 74 | Page 75 | Page 76 | Page 77 | Page 78 | Page 79 | Page 80 | Page 81 | Page 82 | Page 83 | Page 84 | Page 85 | Page 86 | Page 87 | Page 88 | Page 89 | Page 90 | Page 91 | Page 92 | Page 93 | Page 94 | Page 95 | Page 96