InfoPro Rewired 2025

l Information Scarcity;

l Infromation tied to locations eg. libraries;

l High variable costs of production and distribution.

l Beginning of digital age;

l Restricted to larger organisations;

l Focus on data processing eg. payroll.

l Mass digital connectivity;

l Information abundance;

l Focus on finding relevant information eg. Google.

l Computers in our pockets;

l Collecting, finding and sharing information;

l Wherever we go eg. social media.

l Computers creating information;

l Computers acting independently eg. Agentic AI.

a commodification of LLM products as the cost using them falls and differences in performance narrow.

The implications of this for the busi- ness models of LLM developers and their investors could be severe but for devel- opers building applications on top of such models the future is much brighter. Ultimately, AI-powered applications need to solve real organisational problems and be designed around the needs of different industries and specific use cases. A key dif- ferentiator that offers competitive advan- tages over more generic models will be the data used in the pre- and post-training of LLMs and the information then processed by AI applications that resides within or- ganisations or that they have unique access to. As unique and proprietary data begins to take a more central role in AI deploy- ment success so the professionals trained in managing that data will become more valuable.

Data quality is key

The old computing aphorism, ‘garbage in, garbage out’ holds true with GenAI. An LLM is only as good as the data it is trained on, and it is becoming apparent that well-structured data optimised for models to ingest makes for better outputs. Many of the foundational models offered by OpenAI, Anthropic and Google were trained on a mix of structured and unstructured data harvested from the public web and other freely available sources.

The scale and scope of this training data has allowed these LLMs to generate impressive outputs based on the patterns observed during the ingestion and training

Rewired 2025

phases. However, here lies one of the core weaknesses with such predictive mod- els: they can generate results from users’ prompts that look accurate, but which may be hallucinations based on patterns observed in the training data. A technique to reduce the risks of hallucinations and increase the relevancy of results is called retrieval-augmented generation (RAG). This involves fine-tuning an LLM by instructing it to call on additional data when respond- ing to requests. These additional datasets can be added by users that have specific needs and who have access to information directly relevant to those needs. RAG is widely used in customer service scenarios where a chatbot needs to call on company policies and product information to answer specific queries. It is also used in medical applications where accurate and detailed scientific answers are required as well as a host of other scenarios. In these cases, having well-structured data that fol- lows defined conventions makes for faster and more accurate results. Recent research into the use of RAG based on structured data to improve analysis in the financial services sector saw the accuracy of results improve by 23 per cent while reducing response times to user prompts by over a third3

.

The growing acceptance of RAG as a technique for organisations to implement GenAI initiatives focused on their specific needs presents a massive opportunity for information professionals. Using RAG alongside an LLM requires a range of technical skills on the programming side but also relies on having well-prepared data that is appropriately structured and

cleaned and a retrieval method such as semantic search and regular data updates to ensure results are relevant and current. Knowledge and information management practitioners will be familiar with all these approaches and well placed to implement a RAG approach to LLM projects.

Sourcing and combining data RAG provides a practical approach to tailoring LLMs to specific organisational needs. It also offers a route to market differentiation for businesses seeking competitive advantage. As LLMs become increasingly commoditised, the data they use to generate outputs will become more important. When most organisations have access to the same GenAI tools, it will be the ways they are used and customised that will offer the greatest advantages. Finding and preparing new data sources internally and from outside the organisa- tion will become an increasingly important task. These might be data assets already used for other purposes or generated by a business’s routine activities such as vehicle fleet management, building management systems, payroll data and customer feedback information. LLMs offer the potential to derive insights from datasets of all sizes at scale and speeds not previously possible. This might include sentiment analysis of customer reviews, creating knowledge bases for employees and customers from product catalogues, summarising human resources (HR) pro- cedures for new employees or identifying bottlenecks in supply chains from inventory and delivery documentation. The benefits of such initiatives might be improved cus-

INFORMATION PROFESSIONAL DIGITAL 15

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48

orderForm.title