This page contains a Flash digital edition of a book.
data integration


planning committee for the foundation. He explained that an open ecosystem will provide a document standard, metadata guidelines and class libraries to use the standards: ‘Te ecosystem will provide a document standard, which means a format in which primary analytical data can be stored in a non-proprietary way.’ In terms of the development of metadata guidelines, the foundation is analysing the current metadata structure across its member companies in order to determine a standard vocabulary. Noelken acknowledged that there are many different vocabularies in place already; however none has been broadly adopted. To this end, the foundation will use existing vocabularies and information standards wherever possible. Founded in 2012, Allotrope Foundation


is relatively new but, in the past year, has managed to develop a precise problem definition and come up with a plan of how to resolve it, and is now in the process of finding soſtware development partners. Noelken commented that, as a consortium, the key has been for members to leave their in-house mentality behind in order to come together and communicate something as abstract as a data standard. He hopes that using an open framework like Allotrope will revolutionise the way data is shared within companies, across companies and


Heterogeneous data


Ryan Sasaki, director of Global Strategy at Advanced Chemistry Development (ACD/Labs), discusses small data problems


Heterogeneous data distributed across different silos in different labs is a major obstacle to retrieving and re-using information and knowledge extracted from analytical data. The first issue is related to data access. Currently, there are more than 20 historically important analytical instrument manufacturers – many of which offer more than one instrument model – for chromatography, mass spectrometry, optical spectroscopy, nuclear magnetic resonance (NMR) spectroscopy, thermal analysis, and x-ray powder diffraction. Throughout any given organisation there may be a variety of different instrument- selection strategies based on the complexity and business importance of the scientific problem. As a result, laboratory managers may seek a balance between ‘best-of-breed’ and ‘fit-for-purpose’ instrument solutions that inevitably compound the


12 SCIENTIFIC COMPUTING WORLD


problem described above by creating disparate laboratory environments that consist of a variety of instruments and data systems that don’t work together.


While the long sought-after, ‘holy grail’ solution is to come up with an industry standard file format for all analytical instrument data, this standard does not exist today. ACD/Labs is a software informatics company with long-standing partnerships with analytical instrument vendors and serves as a third-party technology resource to help organisations tackle the challenge of heterogeneous laboratory data formats. With the file-handling support of more than 150 formats from multiple analytical techniques, this technology can help laboratory organisations provide their scientists with better access to ‘live’ analytical data.


The second major challenge that sits on top of the heterogeneous data format challenge is the ability to extract and capture knowledge effectively from analytical data. This will often require dynamic interaction between the analytical


data acquired, the chemistry being studied, and the scientist doing the analysis. At the end of the day, while the optimisation and generalisation of an analytical measurement is constantly evolving, for most end-users the actual data is a means to an end. Traditional informatics systems like LIMS, ELNs, and archiving systems do a fine job in handling the regulatory aspects of proving that data was generated in accordance with a sample and for documenting scientific conclusions. However, these systems do not capture the key observations and interpretations that lead to a structure confirmation or characterisation. Through the adoption of a unified laboratory intelligence (ULI)1


framework, organisations can


collect data from different instruments across laboratories, convert heterogeneous data to homogenous structured data with metadata, and store unified chemical, structural, and analytical information as ‘live’ data. This provides the ability to apply chemical context (why) to the vast amounts of analytical content (what). 1


Ryan Sasaki and Bruce Pharr, Unified Laboratory Intelligence


between companies, CROs and regulators. While Allotrope focuses on building a


common laboratory information framework compliant with the regulatory environment, the Pistoia Alliance seeks to ‘lower barriers to innovation in life science research and development through optimising business processes in the pre-competitive domain’. John Wise, executive director of the Pistoia Alliance, explains that Pistoia enables pharma R&D, scientists and informaticians to communicate with the technology community and commercial providers,


THE PROJECT IS


ATTEMPTING TO BUILD BRIDGES BETWEEN THESE HETROGENEOUS DATA SOURCES


as well as academic or government organisations, such the European Bioinformatics Institute (EBI). One of the many projects Pistoia


is currently involved in is the HELM (Hierarchical Editing Language for Macromolecules) project. Originally developed at Pfizer, HELM will enable a standard notation and soſtware tools


for the examination and exchange of information about macro molecules. Tis consistency is much needed, Wise explains: ‘Te industry is moving away from small chemical compounds into the larger more complicated biological molecule space, and the way of describing macro molecules and the supporting soſtware to manage macro molecules has yet to be well defined.’ Te aim is to develop this once internal technology into a universal industry standard. Tackling standards from yet another


perspective is the AnIML project. AnIML – the analytical information markup language – is a standardised data format for the storing and sharing of experimental data. Recognising that there are discreet silos of information within an organisation, the project is attempting to build bridges between these heterogeneous data sources. BSSN Soſtware has been involved with the


project since 2003, and president Burkhard Schaefer says the he hopes the project will push the core standard through the ASTM balloting process towards the end of 2013: ‘Of course, it’s one thing to have the de facto standard – which has been complete for almost two years now – but actually getting the seal of approval from ASTM is crucial for fostering adoption and ensuring that people aren’t running aſter a moving target when they actually deploy the standard.’


@scwmagazine l www.scientific-computing.com


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52