Grappling with the growth of scientific data
Metadata is key to mastering the volumes of data in science and engineering, argues
Bob Murphy, and tools are available to make it easier
I
t’s no surprise to readers of Scientific Computing World that scientific data is increasing exponentially. And ever- advancing storage technology is making
it easier and cheaper than ever to store all this data (vendors will soon be shipping 840TB in a single 4U enclosure). So what’s missing? How about: how to keep track of all that data? How to find what you are looking for in these multi-petabyte ‘haystacks’? How to share selected data with your colleagues for collaborative research, and then make it available to support the
mandate that published results must be reproducible? How to ensure the consistency and trustworthiness of scientific data, selective access, provenance, curation and availability in the future? How to find data that was created years or decades ago but is needed now? And how to identify and remove data that’s no longer needed, to avoid accumulating useless ‘data junkyards’?
Metadata is the key Te solution has been around for decades: it’s metadata. Metadata, or data about data,
Taking action on big data B
Today the US National Consortium for Data Science is an active and growing organisation.
Stan Ahalt discusses how and why it came into being
ig data is such a hot topic it has finally outgrown the descriptor ’big’. From scientific journals to the popular press, so much has been said about
big data and the challenges and opportunities it presents, that sorting through the data on big data has itself become a challenge. Discussing the issues is a good start, but action is even better. In 2013, a handful
26 SCIENTIFIC COMPUTING WORLD
of academic researchers and business professionals, located mainly in the Research Triangle Park area of North Carolina, put their heads together to develop a strategy for action and practical projects that used data for maximum impact in science, business, and education. Tat effort is now known as the National Consortium for Data Science, or NCDS.
Back in 2012 and early 2013, I was one
of the main proponents of the NCDS. I spent many hours talking to people who create data in their work; those who use it to develop products, conduct research, and understand their customers; and the technology experts who build tools for collecting, sharing, analysing, and managing data. My message was simple: making the most of our data-rich world will require more than focused, domain-specific research projects and isolated product development efforts. Barriers that separate science domains and isolate the worlds of research, business, and government must be toppled. Taming the data deluge and gleaning real knowledge from data must be a broad-
@scwmagazine l
www.scientific-computing.com
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40