This page contains a Flash digital edition of a book.
The issues surrounding ‘big data’ (courtesy of the Pacific Northwest National Laboratory)


extremely large file, and do so quickly. The most efficient way would be to load as much of the file into a global memory as possible to reduce the need to move large amounts of data into and out of local memory just to look for that one piece you need. For such applications, PNNL is


developing a test bed to evaluate special- purpose pattern-matching and flow-solving hardware. It goes by the name of CASS- MT (Center for Adaptive Supercomputing Software – Multithreaded Architectures). The design is based on a Cray XMT system whose blades utilise AMD Torrenza Innovation Socket technology and populates them with Cray ThreadStorm processors developed for multithreaded operation. This processor is Cray’s answer to dealing


with unstructured problems. The company says that the real computational challenges stem from the algorithms ideal for managing and analysing unstructured data. To perform well, these algorithms need thousands of computational threads interacting with a large globally shared memory with fine-grain synchronisation. Commercial computer systems leveraging commodity hardware are adequate for cache-friendly algorithms, but not for irregularly structured problems. When the necessary data is not available,


the processor must stall or spin until the data becomes available. Instead, when dealing with irregularly structured problems, you need a globally shared memory that can do fetch- and-add type of operations on every word of memory. The problem for unstructured data analysis is in creating a large, shared memory


26 SCIENTIFIC COMPUTING WORLD


with the same access characteristics as a local memory attached to a processor. For such tasks, Cray developed the


ThreadStorm processor. It can sustain 128 simultaneous threads and supports more than a thousand outstanding memory requests; so while any running thread might also stall, others are likely ready to issue instructions and so the ThreadStorm always stays busy. An XMT system can have 512 ThreadStorms with independent threads all simultaneously issuing memory requests across a network.


WHY CHOOSE A FASTER CPU IF IT’S JUST SITTING THERE WAITING FOR DATA? THE ATOM


IS SLOWISH, BUT IT’S AS FAST AS IT NEEDS TO BE. WE’D RATHER SPEND OUR MONEY ON DISKS THAN CPUS WHICH ARE


DISPROPORTIONALLY POWERFUL FOR THE PROBLEMS WE WANT TO SOLVE


For shared memory architectures like the


XMT, all processors can access the shared memory, eliminating the need for message passing and the associated latencies. Each ThreadStorm can have eight Gbytes of memory physically hanging off of it, but any processor can directly access any of this memory, which today can be a maximum address space of 128 Tbytes. Cray is also


working on the XMT2 system it intends to deliver later this year that will allow even more memory in a cabinet to help further reduce footprints.


Large memory with commercial processors Another company with an emphasis on supporting large local memories is SGI, but that firm opts for commercial processors. Its latest Altix UV systems use Intel’s Xeon chip and the Linux operating system. The UV scales up to 2,048 cores (256 sockets, 4,096 threads) with architectural support for 262,114 cores, and it supports up to 16 Tbytes of global shared memory in a single system image (the limit of the Xeon today). Within a rack, it uses the NUMA (non-uniform memory access) memory interconnect; within a server node, the Intel QPI interconnect delivers traffic from the Xeon processors to the NUMAlink hub. From a programming standpoint, says Bill Mannel, VP of product marketing, it looks just like a gigantic PC; users don’t have to worry about MPI or other message-passing schemes. It sets up the memory in a cache-coherent fashion so the system makes sure there are no race conditions or clashes in memory. SGI recently deployed an Altix UV


system at the Institute of Cancer Research in Reading, England, which now has a massively scalable shared memory system to process its growing data requirements including hundreds of Tbytes of data for biological networks, MRI imaging, mass spectrometry, phenotyping, genetics and


www.scientific-computing.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48