Text and data mining
grammar from scholarly literature and other sources.
Where this trade-off between simplicity and flexibility lies depends on the size of an organisation using the tools and the depth of their pockets, according to Virkar-Yates. He said that what Semantico provides, through TEMIS’ Luxid tool, is a ‘volume-based load, entry-level solution for small publishers.’ However, he said some publishers are not even able to do basic text and data mining. ‘I would suggest that many publishers are still playing catch-up and are being let down by their vendors. For example, you should be able to type two different spellings of a word and get the same results in a search,’ he explained. ‘There is a disconnect between what vendors talk about and what they do. The reality on the ground may be different because it’s expensive, or because they may not have the expertise. It comes down to money and is still in the realms of the people with deeper pockets.’
Accuracy and formats
And there are other challenges with text and data mining today too.
‘Automated tagging of data within unstructured
text is never 100 per cent accurate, and should always be checked by a
subject-area specialist. This can be costly, but the time taken to identify false-positives is far less than tagging this data manually. This process of identifying unwanted matches can also be fed back into the data-mining service to improve the accuracy of subsequent imports,’ said Virkar-Yates.
Camilleri commented: ‘I’d say the biggest challenge is finding domain-specific thesauri or controlled vocabularies that we can use to develop annotators for our clients. Varying
‘People have changed their attitudes to text mining but we need to educate them in how mining is
different from search’ Phil Hastings
source data formats are also challenging because organisations hold data in many different formats, so an initial data conversation exercise may be required to normalise the data before we can run it through our enrichment engine.’ Hastings added: ‘There’s still an educational challenge too. People have very much changed in their attitudes to text mining but
FEATURE
we still need to educate them in how mining is different from search.’ He also noted that the need for access to content is obvious with any search.
Despite these issues, however, text and data
mining is becoming more firmly embedded in the processes and plans of big – and smaller – companies. ‘Text mining has a very bright future,’ said Hastings. ‘Clearly the amount of unstructured information is not going to go away, and we see customers expanding mining into more areas. When we first started Linguamatics, people questioned whether text mining could have an impact, but over the past five years in particular people have been asking us more and more about how and where they can use it, because they already understand that it can have a significant impact.’
Camilleri agreed on the potential of these techniques: ‘With increasing numbers of organisations “opening up” their research data for the greater scientific good, combining data from disparate data sources will become the norm.
‘Not only will there be an increase in the number of organisations using these tools, but it will be increasingly difficult to carry out effective research without them.’
TOP THINKERS MAKE TOP LEADERS
St.Gallen International Publishing Management Course
Module 1: 2 – 6 September 2013 in St. Gallen, Switzerland Module 2: 6 – 11 October 2013 at the Frankfurt Book Fair
Frankfurt Academy invites emerging publishing leaders and senior level professionals to participate in an exclusive exchange of know- ledge with other experts. For the first time in Europe, top international publishing opinion-leaders meet to discuss issues vital to success in today’s globalised markets, including:
Innovation management New business development Re-structuring of operations and teams New perspectives for international business
Your questions set the curriculum. Look forward to an enriching experience and rewarding new contacts!
www.book-fair.com/academy
*Register for the
St.Gallen International Publishing Course until 31 May 2013 to get a 10 % discount. Please mention this advert when registering.
Organised by =mcm institute, University of
St.Gallen, and Frankfurt Academy
U B
Participating faculty and business experts include: Prof. Robert Picard, Oxford University, United Kingdom
Prof. Miriam Meckel, University of
St.Gallen, Switzerland Prof. Vincent Kaufmann, University of
St.Gallen, Switzerland Jesús Badenes, MD Books Division, Grupo Planeta, Spain
Richard Charkin, Executive Director, Bloomsbury, United Kingdom Bill McCoy, Executive Director, International Digital Publishing Forum, USA
… and many others
Course fee: € 5,200 + VAT, incl. All Access Ticket (net value of € 1,800) for unlimited access to all Frankfurt Academy events during the Frankfurt Book Fair (9 – 13 October 2013).
-
U
Register today to secure your participation
(and get your free All Access Ticket to unlimited Frankfurt Academy events during the Frankfurt Book Fair):
www.book-fair.com/ publishingcourse
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
W
.
W
W
.
.
.
B
.
O
O
S
K
R
F
A
O
I
C
R
G
.
N
C
I
O
EXCLUSIVE CIRCLE OF EXPERTS
JOIN AN
Limited number of participants only!
M
S
/
L I
P
H
E
. .
. .
.
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32