HPC_YEARBOOK

HPC 2013-14 | Big data

increasingly a lot of data, for example natural text or images, is not suited to a traditional database structure,’ Jenkins commented. ‘Instead, organisations are looking at big data technologies such as Hadoop, which allow data to be processed without requiring a predefined structure with the ability to scale yet still maintain a coherent process. Hadoop addresses these pain points with a distributed framework that combines data storage and compute together to handle high volumes of disparate data.’ Jenkins remarked that, in some cases organisations would rather rent processing capacity only when needed, or have the ability to scale on-demand from a compute cloud like Amazon or Google as traditional IT systems using large mainframes and relational databases are at a disadvantage compared to these more cost- effective and flexible IT processes. Organisations that choose instead to

focus on internal resources face the challenge of what Sissons terms ‘cluster sprawl’. He defines this as multiple, underutilised big data infrastructures that are expensive to deploy and maintain. IBM believes that making big data soſtware components co-

exist on a common set of infrastructure is a necessity for organisations with this type of project – otherwise the costs will spiral out of control and undermine the very economic benefits that organisations are hoping to realise with their big data projects. ‘Te whole point of big data is to be able to pull results from separate information

“In some cases organisations would rather rent processing capacity only when needed, or have the ability to scale on- demand from a compute cloud”

silos together to form a cohesive picture of events,’ noted Sissons, adding that because of this the company believes that the future is in ‘big data platforms’ – suites of integrated capabilities where components are available individually, but are interoperable and can feed downstream tools used for predictive analysis and visualisation. ‘Tis approach offers customers the best of both worlds.

Tey can choose the specific technology components that are best suited to addressing their big data use case, but they can avoid the cost, complexity and risk that come with trying to integrating these technologies themselves,’ Sissons commented. IBM is making big data environments

easier to deploy and use. For example, rather than learning Hadoop specific scripting languages, analysts can use tools like Big Sheets (essentially a spreadsheet for analysing big data) or Big SQL, utilising the familiar query language to make vast amounts of unstructured or semi-structured data more accessible. According to Sissons, the company is delivering infrastructure advantages by applying advances made in HPC so that big data problems can run faster on less infrastructure and be more responsive to time critical demands. Tis is being done with low-latency cluster and workload management technology from Platform Computing.

Big data storage ‘Te problems of shiſting algorithms from running on large workstations to clusters

9

Juergen Faelchle/Shutterstock.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36