SCW_AUGSEP12

HPC for oil and gas All the easy oil has already been extracted,

so companies have to look in geologically complex environments. Te high price of oil makes deep-water, sub-salt, environments viable – including not only the Gulf of Mexico, but also offshore Brazil and West Africa. Brazil has been able to double its estimated reserves by drilling through the salt and coming on new hydrocarbon reserves. But such complex geologies present geoscientists with problems as they try to interpret the information. Tere is further scope for computing in this area; to help with the sophisticated interpretation of geological structures. Unsurprisingly, these pressures are driving

companies to use accelerators, such as GPUs, and ever-larger clusters. According to Guy Gueritz, Oil and Gas business development director for Bull, companies used to buy commodity servers to run, for example, the Kirchoff seismic migration algorithm ‘which has served its purpose well. But now they are having to use particular imaging methods for the sub-salt and this has driven changes in the HPC architecture.’ Some oil companies are investing in petaflop systems costing upwards of 30 million euros. Tey need ‘very large systems to derive much more accurate velocity models, so they get images that are focused and interpretable,’ Gueritz said. In the view of Oliver Pell, vice-president of

Engineering at Maxeler Technologies, both data handling and compute speed are required to meet the needs of the oil industry. Te major utilisation of HPC is on the exploration side – to locate the optimum position for drilling and avoid dry wells – and this generates many

dozens of terabytes of data that need to be crunched many times to produce an image of the subsurface field. Because of the costs of running a drilling rig, it is important to all oil companies to turn round a job in a few days, rather than weeks of computing time. Because oil exploration is a ‘Big Data’

problem, much of the run-time is not actually in the computation itself but rather the burden of I/O – loading all that data in and getting the results out efficiently. He said: ‘We work on crunching down the run time, by putting parts of the application into custom dataflow hardware, and also optimising parts in soſtware, and having the two talk to each other very rapidly. We can build a custom cluster that

application into a dataflow and that execute it in hardware – and you can change these every second. So, when an application starts up, it will acquire some dataflow engines, configure them to do part of that application and then run calculations on them by streaming data through the chip. Tere are no instructions. So it is an efficient computational paradigm. You don’t have dependencies or have to worry about caching.’ Possibly 90 per cent of the program will be executed elsewhere, on conventional CPUs, but this may represent only one per cent of the run-time. Maxeler recently launched its MPC-X series

which puts dataflow engines on the network as a shared resource, a bit like a disk. According

COMPLEX GEOLOGIES PRESENT PROBLEMS AS GEOSCIENTISTS TRY TO INTERPRET THE DATA

has a mix of dataflow compute engines and conventional CPUs, where the work passes back and forward between them.’ Although the parts of the problem that are put through the dataflow engines may not be the most computationally intense, they are the ones that use the most time – due to the data handling requirements – and the customised dataflow solution can be 10 to 100 times faster than before. Pell continued: ‘You can build a cluster that

is optimised for your workloads. Instead of just balancing network, disk, and CPUs, you add the dataflow engines as another resource that you can balance with these as well. But, on an application-by-application basis, you can create custom chip configurations that map the

Unblocking the connections

No matter how sophisticated your applications, they are useless if your data is not in the appropriate form. So David Butler of Limit Point Systems has been looking at data exchange methods, leading to the development of the company’s Data Sheaf System. He cites the problem of how to interoperate

different numerical representations on different meshes with each other, for example moving properties – porosity, density velocity temperature – from one mesh to another. A user may start with a geological structure mesh and want to feed into a geomechanics mesh, but these can use different representations and different boundaries. In the current workflow, such ‘data munging’ is not an automatic process but takes a good deal of work on the user’s part. In the past, typical implementations have been on workstations, so the task of parallelising them – converting that data

20 SCIENTIFIC COMPUTING WORLD

munging application – is big and complex. Lots of man years have gone into parallelising the kernels – the core elements of a program – he explained, but less attention has been given to the data munging problem to enable the data to be used as input. But the reason an application can exploit that parallelism is because of parallelism somewhere in the problem. ‘Our mathematical datamodel allows us to express the parallelism and we can map that onto the hardware,’ he said. ‘The culture of HPC is that the heroes are the guys that write the central kernels,’ he added, and ‘their emphasis is on the computer performance. By and large they get the glory. The second order connection problems – such as moving data from one application to another -- are a bit like a sewage system: no one wants to think about them until they back up. We see an opportunity there for our data management technology.’

to Pell: ‘Any CPU can execute CPU code and at some point it can decide that it wants to execute a piece of dataflow code and it can pass that to another node. But the CPU node is always running the control of that application, so you can dynamically allocate as many dataflow nodes as you require and release them.’ Te use-case is that it’s basically a coprocessor for the main application. Te dataflow engines themselves map onto configurable hardware chips like FPGAs. In Pell’s view, data movement is the main problem in high-performance computing, not flops. ‘Tere are relatively few applications that are purely limited by the computational performance of, say, the CPU. Tere are more that are limited by the memory, or the interconnect bandwidth, or the storage.’ Moving data is also a focus for Bull’s

Gueritz: ‘Poor I/O on the node is going to affect performance more than anything else,’ he said. Reverse time migration (RTM) needs a lot of scratch space: ‘Te first process is to model the wavefields going downwards through the velocity model. Te values generated for each cell at each depth have to be stored temporarily and, if there is no room in memory, then they have to be stored in scratch space – a local disk. Te second stage is going back from the data received at the surface – to back propagate through the velocity model. Tose two values are then correlated together to produce the final result. Tat means that you have an I/O issue: if there is latency on your local disk, that will affect your overall performance.’ Given the sheer amount of data that has to

be read in and the constraints of the algorithms, he said, ‘a system has to be well balanced. Bull was an early supplier of hybrid systems. Tat gave us experience of designing and deploying these hybrid architectures and we have also

www.scientific-computing.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32