SCW_AUGSEP11

storage

that include servers to run the fi le system and associated software, a high-speed interconnect, connection to a RAID controller, and a high-density storage system housing the disk. Each subsystem in this hardware and software combination adds complexity and potential bottlenecks in terms of balancing I/O, reliability and system scalability. Xyratex’s ClusterStor 3000 appliance is

based on what the company calls a scalable storage unit (SSU). Each SSU supports two x86 embedded server modules (ESMs), which connect directly through a common midplane to all drives in the SSU, and share a redundant high-speed interconnect across the midplane for failover services. The ESMs run industry-standard Linux distributions and each module has its own dedicated CPU, memory, network, and storage connectivity. When new SSUs are added to the cluster, performance scales linearly as incremental processing network connectivity and storage media are added with each unit. This modular design removes the performance limitation of traditional scale-out models in which servers or RAID heads quickly become the bottleneck as more drives are added to the cluster. The unit comes with a multilayer software

stack, including the ClusterStor Manager, Lustre fi le system, data-protection layer (RAID), system management software (GEM) and the Linux OS. The principal hardware components are the cluster management unit that manages fi le system confi guration and metadata, SSUs, network fabric switches and a management switch. The system design provides throughput of 2.5GB/s per SSU for reads and writes. The SSU platform is a 5U, 84-drive storage enclosure, and each SSU holds up to 192TB of Lustre OSS (Object Storage Servers) fi le system capacity and includes redundant server modules that operate and manage the SSU as a discrete scale-out component. Depending on the number of SSUs, ClusterStor 3000 can scale from terabytes to tens of petabytes, and from 2.5GB/s to 1TB/s.

Storage subsystem

Interconnect fabric IB/10GbE/FC

Potential bottlenecks

Under-provisioned networks Unbalanced fabrics SD or DDR Infi niBand Gigabit Ethernet

File system servers

Old and slow systems Lack of memory Old software versions Too few servers for the underlying storage systems

RAID controller(s) HW or SW

Too many disks behind each controller Slow disk connectivity (3Gb SAS, 4Gb FC, SATA)

Disk system SATA/SAS

Too many disks for each expander Too little bandwidth available to each drive SAS dongle SATA drives

Typical storage I/O bottlenecks (courtesy of Xyratex)

to 19TB with a performance level of four million IOPS (I/O operations/sec), which, according to CTO Chris Potter, represents an improvement of two orders of magnitude over spindle memory in terms of $/IOP

HUMANS CAN NO LONGER MANAGE STORAGE THE WAY IT HAS TRADITIONALLY BEEN MANAGED. IT IS JUST TOO BIG…

Offering an alternative to spindle-based

storage at the Tier 0 level close to the servers themselves, NextIO has adapted the fl ash- memory based technology from Fusion-io for its VSTOR S200, which takes up 4U in a standard cabinet. Using this PCI-Express I/O technology, the unit offers capacity

www.scientific-computing.com

and a bandwidth improvement of 10 times depending on the application. It does so with as many as 16 modules and eight server connections. The on-board controller supports on-the-fl y switching of storage pools between hosts, and it also runs the nControl management software.

Indeed, embedding fi le systems directly

in storage appliances has become very common. For instance, in 2009, Hewlett- Packard purchased the storage-software company IBRIX, and recently it incorporated that segmented fi le system and management utilities into the X9000 storage appliance. The metadata is distributed among the two redundant servers, and this parallel metadata allows direct access to the fi les. The appliance also has a switch and power in the rack. Two versions are available: for highest performance, you get as much as 600GB of SAS drives, and for capacity you get either 1TB or 2TB of midline SAS drives. By adding extra racks, it’s possible to scale up to more than 1,000 nodes. A single namespace can

AUGUST/SEPTEMBER 2011 41

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48