high-performance computing
lets scientists find the valuable data they are looking for. Metadata especially helps find value in data that’s been created by others, no matter when or where. Without rich metadata, scientists increasingly risk spending their time just looking for data, or worse, losing it – instead of exploiting that data for analysis and discovery.
Physicists are the high priests of metadata, and astronomers their first disciples In addition to inventing the World Wide Web to support its amazing work, big science physics pioneered the use of metadata to manage the moving, processing, sharing, tracking and storing of massive amounts of data among global collaborators. Physicists
WITHOUT RICH
METADATA SCIENTISTS RISK LOSING DATA
have been using metadata to manage really big data for decades, developing their own bespoke metadata and data management tools with each new project. Cern actually developed three separate metadata systems to manage the two storage systems used in their ground-breaking LHC work that famously captured 1PB of detector data per second in search of the elusive Higgs boson. So when NASA needed to keep track of
➤
based and strategic effort, and must address everything from education of data specialists to figuring out ways to quickly translate data research into breakthrough products and services. Tat message rang true for many people
who view data challenges from very different perspectives. As a result, today the NCDS is an active and growing organisation whose members include US research universities (including three University of North Carolina campuses, Drexel University, Texas A & M, and UNC General Administration), major corporations (Cisco, Deloitte, EMC, GE, and IBM), and government agencies and non-profit organisations (RTI International, MCNC, and the US Environmental Protection Agency). Our success probably has something to
do with our impatience. We didn’t have all the answers for harnessing big data, but we knew action was essential or the data
www.scientific-computing.com l
juggernaut would steamroll right over us. So the NCDS – led by a dedicated team at the University of North Carolina Chapel Hill’s Renaissance Computing Institute (RENCI) – opted for action. In that first year, it was learn-by-doing in the extreme. Not everything turned out exactly as we
SOLUTIONS MUST BREAK DOWN BARRIERS BETWEEN SCIENTIFIC DOMAINS
expected, but we did successfully stand up an organisation consisting of diverse members with different agendas. We also came to understand how the NCDS could make the most impact on important data challenges. Trough hard work and development of programmes and events aimed at members, students, the data
@scwmagazine
workforce, and data researchers, we learned
the following: l Finding solutions to the challenges of data sharing, analysis, management, and long- term curation requires recognising data science as a science on par with any other domain. Te NCDS defines data science as the systematic study of the flow, curation, and analysis of digital data to enable research discoveries, decision-making, and a data-driven economy. And the need for data scientists is acute;
as Google chief economist Hal Varian said back in 2009: Te ability to take data – to be able to understand it, to process it, to extract value from it, to visualise it, to communicate it – that’s going to be a hugely important skill in the next decades. Te McKinsey Global Institute estimates
that by 2018, the US alone could face a shortage of 140,000 to 190,000 people with deep analytical skills. But despite a
APRIL/MAY 2015 27 ➤
McIek/
Shutterstock.com
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40