GPU PROCESSING
In an interesting announcement, the the mix. However, recent tools
first GPU cluster to break into the Top 500 from GPU vendors such as Cuda
list of supercomputers is the TSUBAME have gone a long way in solving this
supercomputer from the Tokyo Institute of challenge.
Technology. It uses 170 S1070s to deliver Indeed, Cuda (originally Compute
nearly 170 Tflops of peak SP performance Unified Device Architecture) is one
as well as 77.48 Tflops of measured Linpack of the driving forces behind Nvidia’s
performance. GPUs. This parallel-computing GPU/
software architecture can be programmed
AMD tunes GPU for numerics using standard languages. Free Cuda
While Nvidia has market momentum in software tools include standard C for the
The AMD FireStream 9250
general-purpose GPUs, AMD has been development of parallel applications, while
(bottom right) plugged into an HP 385 server.
intensifying its efforts in this arena, based Fortran and Java wrappers are also available.
primarily on the technology it got with the Numeric libraries consist of Cufft for fast FireStream boards plus ATI Radeon graphics
acquisition of graphics-chip maker ATI in Fourier transforms and Cublas (Compute cards.
2006. AMD, however, has only recently Unified Basic Linear Algebra Subprograms), One thing that will open up the market,
started addressing scientific/engineering and also free is a Cuda driver for fast data says AMD’s director of stream computing,
markets even though it was the first to transfers between the CPU and GPU. Patti Harrell, is an industry-standard API,
market with a double-precision GPU While using the Cuda APIs alone and she sees the answer in OpenCL, which
board. That company’s boards consist of can result in a performance boost, to is a framework for writing programs that ex-
the Radeon line for consumer graphics get spectacular improvements such as a ecute across heterogeneous platforms con-
products, the FirePro line for professional 100x acceleration, users must know that sisting of CPUs, GPUs and other processors.
graphics and workstations, and most hardware very well, specifically memory OpenCL was initially conceived by Apple,
recently the FireStream for computer- issues. So says David Yip, new technology which submitted an initial proposal to the
specific applications. business manager for HPC integrator OCF. Khronos Group, which in turn is a member-
Scientists will be most interested in the The GPU has a very small memory, but the funded industry consortium focused on the
FireStream family, which is based on the datasets being worked on are very large with creation of open standard, royalty-free APIs.
RV770 chip architecture with 800 stream some servers holding 32G bytes or more. It’s Nvidia has also pledged support for Open-
processors. Here two boards are available. important to determine which data goes on CL, in fact the standard was developed on
The single-slot FireStream 9250 offers 1 the GPU, when, and how to move it. Nvidia GPUs so it’s clear that it will have
Tflops SP/200 Gflops DP performance, For development tools, AMD prefers the broad industry support. OpenCL Ver 1.0
and slated for availability this quarter is open systems approach. Free tools start with was approved for public release at the end of
the FireStream 9270, a 2-slot board that last year, and users can look for a developer
increases the GPU’s clock speed to achieve release in the first half of this year.
1.2 Tflops SP/240 Gflops DP performance.
AMD is starting to enter into the box Tools for programmers
market and has demonstrated its boards Meanwhile, programmers doing their own
working the CA8000 from Aprius Inc, coding can take advantage of new tools that
a 4U box that holds eight FireStream
Nvidia’s T10 CPU comes packaged as a PCI
seem to be sprouting up like mushrooms.
9270s. It is also working with HP’s HPC
Express board in the Tesla C1060 and as a 1U
The Portland Group this quarter expects to
Accelerator Program to ensure that ATI
server product, the Tesla S1070.
add support for Nvidia GPU accelerators
Stream technologies are validated for use to its PGF95 Fortran and PGCC ANSI
on a selection of HP ProLiant servers. CAL, which gives low-level access to the C99 compilers and has also entered into an
hardware similar to assembly programming. agreement with AMD to develop compiler
Software catching up to hardware Next is Brook+, based on an open-source technology for FireStream boards – but this
All this hardware does no good without project from Stanford University, and it is technology preview supports only integer and
tools that help programmers write code to a C-like environment where programmers single precision floating-point operations.
run on GPUs. Indeed, software tools are create a back end that talks to CAL. The argument for using a GPU isn’t nearly
the Achilles heel of accelerators, comments The new ATI Stream SDK 1.3 includes so convincing with GPU DP performance
Ryan Schneider, CTO at Acceleware, who a re-architected version of Brook+ that as it is today, although several researchers,
adds that this is why other accelerator enhances runtime performance. In addition, such as Jack Dongarra at the University of
approaches such as FPGAs have had users can download the GPU-compatible Tennessee, are working on mixed-precision
trouble getting traction. As for GPUs, he ACML (AMD Core Math Libraries). The and iterative solvers that use SP for most
says that most ISVs have enough trouble company also offers a free download of the of their computation and only use DP
squeezing performance out of a quad-core ATI Catalyst driver that unlocks ATI Stream to converge on the results. Technically,
CPU without adding a different beast into acceleration capabilities already built into supporting DP is not costly, and the Portland
www.scientific-computing.com SCIENTIFIC COMPUTING WORLD february/march 2009
31
SCWfeb09 pp30-33 GPU.indd 31 4/2/09 14:04:10
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44