This page contains a Flash digital edition of a book.
data analytics ICT


tier on a traditional disk array, as a NAS system, or in the appliance server itself. Server-side flash implementation [PCIe cards] has gained popularity as it provides the lowest latency and offers a quick and easy way to get started. Flash arrays connected by Infiniband, FC or PCIe have significantly greater capacity for those needing scalable architecture and offer performance up to 1 million IOPS and more with minimal latencies as low as a few hundred microseconds. These high end solutions are available from most of the major storage players, such as ExtremIO [EMC], with a bunch of smaller vendors offering a greater variety and longer track record, including the likes of Tintri, Tegile, Violin Memory, Pure and Whiptail.


The storage challenges for asynchronous big data use cases concern capacity, scalability, predictable performance and cost. The latency of tape based systems will generally rule them out and traditional ‘scale-up’ disk storage architectures are generally too expensive. Consequently the type of storage system required to support these applications will often be a scale-out or clustered NAS product. This is file access shared storage, that can scale out to meet capacity and increased compute requirements and uses parallel file systems that are distributed across many storage nodes that can handle billions of files, without the kind of performance degradation that happens with ordinary file systems as they grow. For some time, scale-out or clustered NAS was a distinct product category, with specialist suppliers such as Isilon and BlueArc. But a measure of the increasing importance of such systems is that both of these have now been bought by big storage vendors over the past few years – EMC and HDS respectively. Others include Dell Equalogic, HP StoreAll, and Netapp clustered mode.


These systems combined with Hadoop can enable users to construct their own highly scalable storage systems using low cost hardware providing maximum flexibility. But Hadoop, specifically HDFS, requires three copies of data be created to support the high availability environments it was designed for. That’s fine for data sets in terabytes but when capacity reaches petabytes HDFS can make storage very expensive. Scale-out storage systems suffer too as many use RAID to provide data protection at the volume level and replication at the system level. Object based storage technologies can offer an alternative solution for larger environments that may run into data redundancy problems.


Object based storage systems greatly enhance the benefits of scale- out storage by replacing the hierarchical storage architecture that many use with flexible data objects and a simple index. This enables almost unlimited scaling and further improves performance. Object based storage that incudes erasure coding doesn’t need to use RAID or replication for data protection, resulting in dramatic increases in storage efficiency. There are many object based storage systems on the market, including amongst others EMC Atmos, DataDirect Networks, Netapp StorageGRID, Quantum Lattus and Cleversafe


Q A


Are there any special requirements when it comes to the servers?


Big Data analytic workloads are becoming increasingly compute- intensive. The amount of data and processing involved requires these workloads to use clusters of systems running highly parallel code in order to handle the workload at a reasonable cost and timeframe.


38 www.dcsuk.info I May 2014


Enterprise grade servers that are well suited to Big Data analytics workloads have: £ Higher compute intensity [high ratio of operations to I/O] £ Increased parallel processing capabilities £ Increased VM’s per core £ Advanced virtualization capabilities £ Modular systems design £ Elastic scaling capacity £ Enhancements for security and compliance and hardware-assisted encryption


£ Increased memory and processor utilization


Superior, enterprise grade servers also offer a built-in resiliency that comes from integration and optimization across the full stack of hardware, firmware, hypervisor, operating system, databases and middleware. These systems are often designed, built, tuned and supported together – and are easier to scale and manage.


A completely new computing platform is however on the horizon – the Microserver [ARM server] – which will provide serious innovation and a new generation of server computing to the market. The Microserver is a server that is based on ‘system on a chip’ [SoC] technology – where the CPU, memory and system I/O and such are all on one single chip – not multiple components on a system board. This means SoC servers are small, very energy efficient, reliable, scalable and incredibly well suited to dealing with tasks involving large numbers of users, data and applications. They will use about 1/10th of the power, and less than1/10th of the rack space of a traditional rack mounted server at about half the price of what a current system costs.


Q A


What about the network – what characteristics does this require to help optimize the Big Data environment?


Big Data environments change the way data flows in the network, Big Data generates far more east-west or server-to- server than north-south or server-to-client traffic, and for every client interaction, there may be hundreds or thousands of server and data node interactions. Application architecture has evolved correspondingly from a centralized to a distributed model. This is counter to the traditional client/server network architecture built over the past 20 years.


Pulling data from a variety of sources, big data systems run on server clusters distributed over multiple network nodes. These clusters run tasks in a parallel scale-out fashion. Traffic patterns can vary considerably and dramatically from single streams to thousands and to between nodes to handling intermediate storage staging. Big Data solutions therefore require networks to be deployed on high-performance network equipment to ensure appropriate levels of performance and capacity. In addition big data services should be logically and physically segmented from the rest of the network environment to improve performance.


Q A


Are there any other hardware considerations for Big Data?


Whilst not necessarily hardware considerations or requirements there are three other vendors which have interesting products


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52  |  Page 53  |  Page 54  |  Page 55  |  Page 56