SCW_OCTNOV13

Too bigfor its boots

The rise and rise of big data promises to drive increasingly imaginative scientific computing approaches, says Felix Grant

I

n a delightful way, from a writing perspective, the data analysis topics over the last few issues have first synergised with one another and then led naturally

on to the consideration of big data. What exactly is ‘big data’? Te answer, it should come as no particular surprise to hear, is ‘it depends’. As a broad, rough and ready definition, it means data in sufficient volume, complexity and velocity to present practical problems in storage, management, curation and analysis within a reasonable time scale. In other words, it is data that becomes, or at least threatens to become in a specific context, too dense, too rapidly acquired and too various to handle. Clearly, specific contexts will vary from case to case and over time (technology continuously upgrades our ability to manage data as well as generate it in greater volume) but, broadly speaking, the gap remains – and seems likely to remain in the immediate future. Te poster boys and girls of big data, in this respect, are the likes

A data-filtered image of the sun, from the UK Meteorological Office’s NASA feed

of genomics, social research, astronomy and the Large Hadron Collider (LHC), whose unmanaged gross sensor output would be around 50 zettabytes per day. Tere are other thorny issues besides the

Space science in real time

One great example of scientists taking a document-orientated database approach to big data is the space weather forecasting tool at the Met Office. The team has responsibility for space weather events like coronal mass ejections and solar flares, which impact performance of the electricity grid, satellites, GPS systems, aviation and mobile communications. They used our scalable document-orientated NoSQL database to analyse a large volume and wide variety of data types including solar flare imagery from NASA and live feeds from satellites tracking radiation

14 SCIENTIFIC COMPUTING WORLD

flux, magnetic field strength and solar wind. The system not only tracks security critical events as they unfold, but also stores and monitors complex data for pattern analysis – just the type of challenge for which document orientated databases are so well suited. This is just one example among many of how scientists are using different tools to interact with big data, delivering research that changes how we understand the universe.

Matt Asay, VP corporate strategy at MongoDB

technicalities of computing. Some of them concern research ethics: to what extent, for example, is it justifiable to use big data gathered for other purposes (for example, from health, telecommunications, credit card usage or social networking) in ways to which the subjects did not give consent? Janet Currie (to mention only one recent example amongst many) suggests a stark tightrope with her ‘big data vs. big brother’ consideration of large- scale paediatric studies. Others are more of a concern to statisticians like me: there is a tendency for the sheer density of data available to obscure the idea of a representative sample – and a billion unbalanced data points can actually give much less reliable results than 30 well selected ones. Conversely, however, big data can also

be defined in terms not of problems but of opportunity. Big data approaches open up the opportunity to explore very small but crucial effects. Tey can be used to validate

@scwmagazine l www.scientific-computing.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52