SCW_JUNJUL13

data-intensive computing

Barbara Murphy, chief marketing officer at Panasas

T

he message that we have tried to get across is that Hadoop is an emerging application for non-SQL workloads – which is the vast

majority of what is being created as unstructured data. Tere needs to be a way of taking petabytes of unstructured data and putting it into a format that is semi-structured and can be used by the traditional business intelligence tools out there. Organisations have such huge silos of data that they don’t even know what they have any more, and so many are beginning to look at Hadoop as a way of categorising that content. Rather than making specific, deterministic queries, Hadoop is about clustering common elements together and bringing out patterns. In that first case, people are attempting to prove an answer, while in the second case it’s about looking for commonality in order to find an answer.

What’s interesting about Hadoop is the fact

THE SHORTAGE WILL BE OF PEOPLE WHO KNOW HOW TO PROGRAMME LANGUAGES

that the technology is still in its infancy, so the industry hasn’t quite figured out how it fits in. People are experimenting with it, and many more are using it across numerous fields, but it’s a limited set of people who have actually figured out how it fully fits into their workloads. Te approach everyone has been taking is to make Hadoop a standalone workflow, whereas it should be viewed as a piece of the puzzle. Somehow, the

Hadoop workloads became meshed with the Hadoop

hardware platform in people’s minds and there’s the idea that it’s purely local storage and compute together. But that’s not a requisite – that’s a coincidence born out of the way Google built out its original system. Te difficulty for users is that they end up with a dedicated system for

Bill Mannel, vice president of server product management at SGI

B

ig data is truly in the eyes of the beholder, but can essentially be defined as data that is significantly larger in volume than

people are accustomed to, that needs to be accessed faster than was previously necessary, or that is comprised of multiple different types. In terms of coping with the big data shiſt, Hadoop is one of the first and more successful instantiations. Born out of the internet space, Hadoop spanned out to other industries rather quickly as people witnessed its effectiveness. Tat being said, it is important to note that it’s not the only big data method out there – but it is arguably the most popular. Te main advantage of Hadoop is that

it’s open source, which has allowed an active community to be built around it. Tis community has lowered the financial and technical barriers, and ensured that innovations are occurring at a pace. Typical soſtware companies have a major release once a year, and a minor release once a year. Now we are seeing a lot of companies within the Hadoop space having one major release per quarter. Hadoop really is right on the forefront of the big data explosion. It effectively flips the paradigm because it brings compute to the

36 SCIENTIFIC COMPUTING WORLD

data, whereas data has generally been stored in an array before being brought across to the server when needed. With Hadoop, you have data nodes and assign the processing to the data nodes directly. It’s a unique proposition when it comes to big data, and it’s also very scalable and easy to add nodes to as the data grows. Tis combination is very attractive. However, as the number of nodes increases, there are limitations. Latency becomes a big issue but this can sometimes be addressed with higher-bandwidth interconnects. Tere are also soſtware challenges that come from increasing node sizes. I do expect to see more broader base solutions as the 1,000- and 10,000-node Hadoop clusters become more popular. Te key point is that users have the

opportunity to begin Hadoop projects and discover the big data in their midst. SGI, for example, has been in existence for more than 20 years and has lots of data lying around in a multitude of different places. We don’t know where much of it is or, indeed, how relevant

it is. Tis is not unique to SGI and many companies are seeking ways to explore and take advantage of that historical data. Our back office consists of a standard type

THE MAIN

ADVANTAGE OF HADOOP IS THAT IT IS OPEN SOURCE

of database and, if I want to look at different data, I have write to a request to our IT department. It could be some time before I get a new form written, enabling me to access the data differently. But if you look at what can be done with some of the tools that sit on top of Hadoop, you realise that you can start to build your own

applications and view the data how you want to. Tere are challenges in terms of using the environment and ensuring that everyone is up to speed with the technology, but based upon what was possible previously, we can now access more data, faster. Tere are many ways of becoming familiar with Hadoop, such as meetings worldwide and online training – we’ve partnered with Cloudera on training, for example.

www.scientific-computing.com

these workloads, but then have to move that data somewhere else in order to make it more usable. Te compelling message is that the Hadoop workload itself is not tied to the hardware that’s become so prevalent. Companies don’t need to invest in new or

alternative technologies, as Hadoop can be run on existing infrastructure as long as it’s a solid scale-out system that can deliver the performance required. So, it’s important that people aren’t put off by the idea that another piece of hardware is necessary. Te focus too oſten is on what piece of storage is needed to use Hadoop, whereas the real question is how to get value out of using it. Te point we want to get across is that the physical storage shouldn’t be connected to the decision of how to use Hadoop. Te technology will be taking off within 24 months, but the shortage will be of people who know how to program to the languages. Tat’s the next hurdle.

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52