Pathology In Practice May 2026

STATISTICS

the software default, understand what a confidence interval contributes, and treat outliers as objects for investigation rather than automatic deletion.

Why this topic matters in laboratory medicine In laboratory medicine, descriptive statistics are often treated as a preliminary stage that must be cleared before the ‘real’ analysis begins. In practice, they are frequently the stage at which the most consequential judgement calls are made. A local verification exercise may involve only a modest number of results, making the estimate unstable and vulnerable to the influence of one or two unusual observations. A service audit may summarise turnaround times using a mean even though the data are visibly right-skewed, giving a distorted impression of the typical performance. A small local healthy volunteer dataset may be inspected too casually, with litle atention paid to subgroup mixing or a handful of extreme values. In each case, the problem is not the lack of statistical output. The problem is that the wrong descriptive framework has been applied to the data. Laboratory datasets are especially

prone to this because several sources of variability and distortion may act at the same time. Some come from the biology of the analyte or population under study. Some come from the analytical method or measuring range. Some arise before the sample even reaches the analyser, through transport delays, collection issues, haemolysis, or contamination. Some are purely administrative, such as transcription errors or inclusion of cases that should not have been in the dataset at all. The shape of a distribution is therefore

In laboratory medicine, descriptive statistics are often treated as a preliminary stage that must be cleared before the ‘real’ analysis begins. In

practice, they are frequently the stage at which the most consequential judgement calls are made

not just a mathematical curiosity. It may represent an important clue about the process that generated the data. Confidence intervals mater for similarly practical reasons. A point estimate can be tidy, but tidy is not the same as informative. If a laboratory reports that the median turnaround time for a test was 58 minutes, or that the 95th percentile of an analyte in a local dataset was a particular value, that number alone says nothing about how uncertain the estimate is. A small or highly variable sample may produce a much less stable estimate than a larger and more consistent one. Without an interval, the reader cannot judge how much weight to place on the number. This is not limited to means. Percentiles, medians, and proportions are all estimated from samples and therefore all have uncertainty. Outliers illustrate the practical

importance of all this especially well. In routine work, a value may be labelled ‘an outlier’ and excluded. That can be a serious mistake. An extreme value may indeed be spurious, but it may also be the first sign of a pre-analytical problem, an analytical issue, subgroup contamination, or genuine rare biology.

Treating statistical unusualness as proof of invalidity is poor practice. The outlier literature in laboratory medicine does not support such a simplistic approach, and modern evidence has shown that different outlier handling strategies can materially alter results.

Practical use cases in clinical laboratories The concepts in this article arise in a wide range of laboratory activities. One obvious seting is local evaluation or verification work. Small datasets are common in these projects, which makes the choice of summary statistics and the treatment of unusual values especially important. A single extreme observation may alter the mean noticeably, widen the standard deviation, and change the overall interpretation. Another seting is service audit.

Operational variables such as turnaround time, transport delay, add-on request intervals, or referral delays are often strongly skewed. Reporting only means in such setings can be misleading, because a minority of very long delays may dominate the summary. Median and percentile-based summaries are often more informative if the aim is to describe

A

B

C

Approximately symmetric

Right-skewed Measurement value

Fig 1. Stylised examples of common distribution paterns in laboratory data. Panel A shows an approximately symmetric distribution. Panel B shows a right-skewed distribution, in which a minority of larger values extend the upper tail. Panel C shows a mixed or bimodal distribution, which may arise when distinct populations are combined.

May 2026 WWW.PATHOLOGYINPRACTICE.COM 23

Mixed / biomodal

Relative frequency

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52

orderForm.title