What, then, to do? How should an

organisation or individual best approach this reality – in which an essential tool of all scientific work is, for the most part, wielded by non-specialists? Te answer, unsurprisingly, varies from case to case. Publishers of data analytic and

visualisation soſtware are, of course, aware of all this. While they generally don’t say it in so many words they do, without exception, make great efforts to bridge the gap in practice. In various ways, they provide responsible and (so far as is possible) robust support for inexpert users and then back it up with the means for those users to gradually increase their expertise through experience. Tat is probably the biggest source of statistical education, and an important part of any serious attempt to place statistical work on a solidly productive foundation within an organisational environment. Provision of facilities doesn’t mean that

they will necessarily be used, however. I recently took a straw poll of non-statisticians using Minitab, and none of them had ever looked at, nor even been aware of, the power and sample size submenu. Sample design is a crucial aspect of good statistical work by

non-statisticians, but is notoriously under considered by non-statisticians. Some years ago, an interesting lesson emerged when Statistical Solutions donated a copy of nQuery Adviser to a research project run by a group of mature research students; the statistical design quality of their work immediately improved. Tis seemed to be a psychological result: an unglamorous facet had been promoted in status by the arrival of a separate, dedicated program to service it. Subsequent evolution has seen nQuery Adviser, already an excellent tool, enhanced by combination with nTerim, a merger reported to be fully integrated in the forthcoming upgrade due for market around the time you read this. For people like Carla, my intuitive medical

data jockey, there are two good approaches that happen to complement each other nicely. One (as Carl-Johan Ivarsson of Qlucore points out, see box: Five steps to Eden) is the use of visualisation to explore data for interesting features. All data analytic soſtware offers this, of course, but for the non-statistician a dedicated plotting or visualisation package may be more accessible. Te other is a set of clear guidelines or even black boxes for accept/maybe/reject decision making zones on statistical values – the best known example, and a good model for adaption, being process control charts. Te careful provision of sensible defaults soſtware – non-statisticians being the least likely to set these parameters for themselves – is a variation on this. Exploring different visualisation

Exploration of a gynomorph variable pair in MagicPlot

Go lognormal! Peter Vijn, data consultant at BioInq

A rewarding generic step in data analysis is to use an appropriate data transformation directly after data acquisition – a subject not usually covered in basic statistical courses or textbooks, frequently causing a communication gap between statisticians and non-statisticians.

Of all possible transformations, the logarithmic is the true work horse, effectively switching to the lognormal probability density function. While this only works for positive-valued and non-zero data, most

18 SCIENTIFIC COMPUTING WORLD

real data complies with it. If there are zero-valued data points, replace them with the smallest non-zero value that your instruments can detect. The lognormal distribution often gives a superior fit, especially if the coefficient of variation is large. Create a dataset from the number of words in the email messages in your current mailbox, calculate the mean and standard deviation, and plot the resulting (mis)fit with the normal curve. Now plug in the logarithmic transformation, do the same, and look at the great fit. This simple recipe also addresses another

persistent and often overlooked problem: heteroscedasticity (unequal variances of group means) causing output interpretation issues in most statistical tests.

The best thing about the log transformation is

preparation of large dynamic range datasets to meet smaller range assumptions. The only thing you have to get used to is that confidence regions transformed back to the original measurement scale will be asymmetric, but even that is a natural consequence if you realise that data cannot cross the zero line.

@scwmagazine l www.scientific-computing.com

approaches with non-specialists, from mathphobic pre-degree students to highly skilled professionals, I’ve found that the

Structures and similarities in gene expression data, investigated using principal component analysis (PCA) in Qlucore. From Johansson et al[3]

THE BEST TOOL VARIES NOT ONLY WITH COMPETENCE BUT WITH

TEMPERAMENT

best tool for the same task oſten varies not with the level of competence but with individual temperament. Preference for OriginPro versus SigmaPlot, for example, seems to correlate with different general mindsets, which shows that it is important for users to experiment with a variety of available options, making full use of the trial copies which are usually available. An interesting (and

relatively recent) entrant to this market segment is MagicPlot, which blends commercial and free soſtware to good effect.

Tis is especially effective in encouraging productive approaches in groups with both lateral and vertical structures. Small teams with technical, secretarial and administrative staff, for example, or academic staff, technicians and students, working together in exploratory conversation, which ties in with the importance of support networks to which I’ll return later.

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52

organisation or individual best approach this reality – in which an essential tool of all scientific work is, for the most part, wielded by non-specialists? Te answer, unsurprisingly, varies from case to case. Publishers of data analytic and

visualisation soſtware are, of course, aware of all this. While they generally don’t say it in so many words they do, without exception, make great efforts to bridge the gap in practice. In various ways, they provide responsible and (so far as is possible) robust support for inexpert users and then back it up with the means for those users to gradually increase their expertise through experience. Tat is probably the biggest source of statistical education, and an important part of any serious attempt to place statistical work on a solidly productive foundation within an organisational environment. Provision of facilities doesn’t mean that

they will necessarily be used, however. I recently took a straw poll of non-statisticians using Minitab, and none of them had ever looked at, nor even been aware of, the power and sample size submenu. Sample design is a crucial aspect of good statistical work by

non-statisticians, but is notoriously under considered by non-statisticians. Some years ago, an interesting lesson emerged when Statistical Solutions donated a copy of nQuery Adviser to a research project run by a group of mature research students; the statistical design quality of their work immediately improved. Tis seemed to be a psychological result: an unglamorous facet had been promoted in status by the arrival of a separate, dedicated program to service it. Subsequent evolution has seen nQuery Adviser, already an excellent tool, enhanced by combination with nTerim, a merger reported to be fully integrated in the forthcoming upgrade due for market around the time you read this. For people like Carla, my intuitive medical

data jockey, there are two good approaches that happen to complement each other nicely. One (as Carl-Johan Ivarsson of Qlucore points out, see box: Five steps to Eden) is the use of visualisation to explore data for interesting features. All data analytic soſtware offers this, of course, but for the non-statistician a dedicated plotting or visualisation package may be more accessible. Te other is a set of clear guidelines or even black boxes for accept/maybe/reject decision making zones on statistical values – the best known example, and a good model for adaption, being process control charts. Te careful provision of sensible defaults soſtware – non-statisticians being the least likely to set these parameters for themselves – is a variation on this. Exploring different visualisation

Exploration of a gynomorph variable pair in MagicPlot

Go lognormal! Peter Vijn, data consultant at BioInq

A rewarding generic step in data analysis is to use an appropriate data transformation directly after data acquisition – a subject not usually covered in basic statistical courses or textbooks, frequently causing a communication gap between statisticians and non-statisticians.

Of all possible transformations, the logarithmic is the true work horse, effectively switching to the lognormal probability density function. While this only works for positive-valued and non-zero data, most

18 SCIENTIFIC COMPUTING WORLD

real data complies with it. If there are zero-valued data points, replace them with the smallest non-zero value that your instruments can detect. The lognormal distribution often gives a superior fit, especially if the coefficient of variation is large. Create a dataset from the number of words in the email messages in your current mailbox, calculate the mean and standard deviation, and plot the resulting (mis)fit with the normal curve. Now plug in the logarithmic transformation, do the same, and look at the great fit. This simple recipe also addresses another

persistent and often overlooked problem: heteroscedasticity (unequal variances of group means) causing output interpretation issues in most statistical tests.

The best thing about the log transformation is

preparation of large dynamic range datasets to meet smaller range assumptions. The only thing you have to get used to is that confidence regions transformed back to the original measurement scale will be asymmetric, but even that is a natural consequence if you realise that data cannot cross the zero line.

@scwmagazine l www.scientific-computing.com

approaches with non-specialists, from mathphobic pre-degree students to highly skilled professionals, I’ve found that the

Structures and similarities in gene expression data, investigated using principal component analysis (PCA) in Qlucore. From Johansson et al[3]

THE BEST TOOL VARIES NOT ONLY WITH COMPETENCE BUT WITH

TEMPERAMENT

best tool for the same task oſten varies not with the level of competence but with individual temperament. Preference for OriginPro versus SigmaPlot, for example, seems to correlate with different general mindsets, which shows that it is important for users to experiment with a variety of available options, making full use of the trial copies which are usually available. An interesting (and

relatively recent) entrant to this market segment is MagicPlot, which blends commercial and free soſtware to good effect.

Tis is especially effective in encouraging productive approaches in groups with both lateral and vertical structures. Small teams with technical, secretarial and administrative staff, for example, or academic staff, technicians and students, working together in exploratory conversation, which ties in with the importance of support networks to which I’ll return later.

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52