SCW_OCTNOV13

big data

power, an important factor in good analysis is a live, grounded, intuitive human overview of the data under examination. Te most effective means so far developed of maintaining that overview is, as Golden’s Sabrina Pearson eloquently points out (see box: A picture is worth a million data points), mapping of variables and relationships onto sensory metaphors – predominantly by graphic visualisation though also, to a more restricted extent, by sonification. Tis was my theme in the August/September issue of Scientific Computing World[3]

study here. France’s CEA (Alternative Energies and

Atomic Energy Commission) is a government- funded research organisation with numerous ramifications. One of those is a division of immune-virology at Fontenay-aux-Roses, which explores and develops vaccine strategies for treating chronic and emerging viral infections. A major bottleneck in this division’s work lay in the various reporting systems and processes handling large bodies (up to 50,000 fluorescence-tagged cells per second) of flow cytometry data. A disproportionate amount of expert time was tied up in managing the data flow rather than analysing and understanding it, then using it to feed experimental programmes. Antonio Cosima, responsible for this aspect,

happened across Tableau, a visual business data reporting product and spotted the potential for his own situation. He installed a trial copy and tried it out on his data; the trial was a success, and went on to integrate a full installation as a core component of the division’s data handling structure. Te flood of research data, instead of progressing through a serial chain of reporting and review steps that eventually feed back into process adjustment, now passes through a single visual analysis stage that informs immediate decision making on the fly. Te system facilitates close, rapid control of instrument, material or participant selection, among other aspects. It places exploratory visualisation in the hands of each team member through quickly learned interactive dashboards, and supplies publishable report illustrations as part of its operation. Cosima estimates that switching to a visual data reporting system saves the division ‘days of work’, which

CEA www.cea.fr

IDBS www.idbs.com

Informatica www.informatica.com

Maplesoft www.maplesoſt.com

MongoDB www.mongodb.org

Qlucore www.qlucore.com

Tableau Software www.tableausoſtware.com

16 SCIENTIFIC COMPUTING WORLD Whisky galore , so I’ll stick to a big data case

Big data analytics give insights into things and relationships we never knew existed. But R&D is based on good records, defending results by enabling others to repeat, validate and utilise findings in their own work. Unless analytic outputs are re-usable and consumable, captured alongside the context of how you got there, their usefulness is seriously diminished. It’s like whisky production: start with a huge vat of ingredients, cook them up, and after various process steps distil out and capture

can now be switched to other, more productive purposes. Te challenge of large, complex, high-velocity

Further information

Golden Software www.goldensoſtware.com

data products that threaten to exceed available storage capacity is oſten met by applying reduction strategies inline. Tese are oſten based on quite traditional methods. Qlucore’s Omics Explorer, for example, uses principal component analysis and hierarchical clustering to fish out the most relevant informational strands from the data flood. Omics explorer takes its name from its roots at Lund University in the big data sets of proteomics, genomics, etc. – and the reduction of massive data sets to understandable results in a short time is central to its purpose. Study of gene expression in meningiomas, or in circulating blood cells following the holy grail of individualised diabetes treatment, is an example. NextGen RNA sequencing is cutting-edge, but the statistical tests involved may once again be even more traditional than PCA. F and T tests would have been familiar to statisticians of my grandparents’ time, never mind my own, but their application in a few seconds to sample profiling from immense microarray data sets would have defied belief not so very long ago. Looking ahead, the inevitable

rise and rise of big data promises to drive increasingly imaginative scientific computing approaches. Tere will, of course, be continuations of the high-performance route, testing the limits of processor and architecture development, but artificial neural networks (ANNs), to take just one example, are leading in other interesting directions. Researchers monitoring ecosystems off the Florida coast, for example, analyse high

the very small final amount of valuable product. Advanced electronic laboratory notebooks, like our E-WorkBook, do just that with big data. Aggregation and analysis must not disrupt day-to-day tasks. Technology must be the bridge connecting big data analytics to experimental process management. Scientists can then capture the results they need at the bench.

Paul Denny-Gouldson, VP for translational medicine at IDBS

density data streams[4] from autonomous robot

environmental sensor systems for significant patterns using ANNs that explicitly mimic natural processes. It’s not a huge conceptual leap from

developing ANNs to the simulation or even storage of a whole organism. Tis (usually focused on the capture of a functioning human intellect) has been a favourite science fiction theme at least since the invention of the electronic computer, and would involve immense big data issues – unfeasible as yet, but

IT’S NOT A HUGE

CONCEPTUAL LEAP FROM DEVELOPING ANNS TO THE SIMULATION OR EVEN STORAGE OF A WHOLE ORGANISM

no longer laughably so. Te idea is beginning to get serious consideration and funding for small beginnings in that direction. Te University of Waterloo’s Computational Neuroscience Research Group have built the largest functional model yet of a human brain[5]

and made it

control an arm using simulation soſtware MapleSim; only a relatively paltry two and a half million neurons as yet, but the principle is established. In the last few months, researchers at MIT and the Max Planck Institute have reconstructed[6]

a neural wiring for a respectable

chunk of a mouse retina. Never mind the storing of a human mind; if a realistic central nervous system analogue of any kind could one day be constructed on a human complexity scale, big data would have created its own best analytic engine.

References and Sources For a full list of references and sources, visit www.scientific-computing.com/features/ referencesoct13.php

@scwmagazine l www.scientific-computing.com

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52