This page contains a Flash digital edition of a book.
statistics for non-statisticians Five steps to Eden


Carl-Johan Ivarsson, president of Qlucore


Sheer size of data sets is a common problem in science. The human brain is very good at detecting structures or patterns, and active use of visualisation techniques can enable even the non-statistician to identify them very quickly with instant feedback as results are generated. At Qlucore, we recommend a five-step approach to ensure repeatable and significant results.


High dimension data should first be reduced to lower dimensions for 3D plotting, usually using


Carla, as it happens, uses both OriginPro


and SigmaPlot for initial visual exploration, then tests her discoveries progressively in the analytic facilities of both packages before progressing to Unistat (for what she describes as ‘its straightforward, no- nonsense structure’ and its particularly close integration with Excel in which her data is supplied for their extended facilities) and Minitab. Her recommendations are based on cross comparison of her results in all four environments. Her working methods, observed objectively, show interesting informal parallels with the more structured approach outlined by Qlucore’s Ivarsson. Putting aside specific soſtware


considerations, what general approaches can be recommended? A researcher commented[2]


last year in


Advances in Nutrition that: ‘An important issue with epidemiological studies is the inaccessibility of the data to reader analysis. Data is so heavily processed through multiple layers of mathematical filters that results are intractable to non-statisticians; conclusions must be accepted on faith.’ Tis is widely true, with two aspects which need to be considered. From a consumer viewpoint, there is the snare which caught my mathematical physicist mentioned above: the lesson is to always work from raw, unmanipulated data (which percentages, for example, are not). At the producer end of the process, always supply the raw data from which your analysis was built, so that others can check your process and help you correct any misunderstandings. Following from that is the old advice


oſten abbreviated to KISS: ‘keep it simple, stupid’. Don’t apply more esoteric methods than you need, and stick with those you are confident you fully understand. One professional proudly brought me a good


www.scientific-computing.com l


principal component analysis. Data colouring, filters, and tools to select and deselect parts of the data set also enhance information. Step one: detect and remove the strongest signal in the dataset. This allows other obscured signals to be seen, and also usually reduces the number of active samples and/or variables. Step two: measure strength of visually detected signals or patterns by examining variance in a 3D PCA-plot compared to what would be expected with random variables, giving a clear indication of the identified pattern’s reliability. Step three: if there is significant signal-to-noise


sixth power polynomial fit to his experimental data; there are cases where a sixth power fit is appropriate, but they need to be examined carefully for suitability. In this case, a high enough order curve was bound to fit eventually but the points in his small sample were actually random with no association whatsoever. While raw data should be


ratio, remove variables most likely contributing to the noise.


Step four: apply statistical tests to any or all of the other stages of the five-step process. Step five: use graphs to refine the search for


subgroups or clusters. Connecting samples in networks, for example, can move you into more than three dimensions, providing more insight into data structures.


Repeat all steps until no more structures are found. Used this way, visualisation can be a powerful tool for researchers, without having to rely on statistics or informatics specialists.


Further


your ideal starting point, a simple transform may help you to see what is going on within it. Peter Vijn of BioInq consultancy suggests one likely candidate (see box: Go lognormal) and most data analysis soſtware offers a selection which can be applied at the click of a mouse. Remembering my KISS advice, don’t apply transforms just for the sake of it – but do explore carefully whether one of them might be useful in your particular setting. Try to find people who will be your


information BioInq www.bioinq.com MagicPlot www.magicplot.com Minitab www.minitab.co.uk OriginLab www.originlab.com Qlucore www.qlucore.com Statistical Solutions


www.statistical- solutions-soſtware. com


Statsoft www.statsoſt.com


Systat Software www.sigmaplot.co.uk


Unistat www.unistat.com


techniques which you understand best, but then seek both to deepen and extend your understanding. Your statistical soſtware has a wealth of guidance material hidden within its help system; some of it (perhaps even most of it) may seem like gibberish, but look at the sections which build upon what you already know, and gradually you will find increasing areas of fog becoming clear. Minitab, which originated as a teaching system, has served Carla particularly well in this respect; Statsoſt (publishers of Statistica) offer an online statistics textbook as well. Try applying what you learn to old, already solved problems in your own area, to see how they respond without any pressure to get it right. Discuss what you are learning with your support network, and you’ll progress all the faster. Keep it simple, start from


personal support network. Not consultants like me who will charge you for advice, but colleagues, friends, acquaintances who are happy to discuss aspects of your work. Carla has a team leader who knows how to provide her with clear parameters for assessing the usefulness of exploratory discoveries to her department. She also has an informal support line to a statistics professor in another university who is willing to help her develop data analysis ideas through their keen shared amateur interest in archaeology. Putting that all together brings me to development paths. Start with those


@scwmagazine


what you know, look for ways to gradually build a carefully expanding repertoire of techniques. Welcome the willingness of others to help you. Seek reliable guidelines against which your results can be tested. Make the most of both your analytic soſtware and the wealth of helpful support which it supplies. Follow all of those guidelines, and you should be able to do great statistical things without being a statistician.


References and Sources For a full list of references and sources, visit www.scientific-computing.com/features/ referencesjun13.php


JUNE/JULY 2013 19


Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52