SCW_OCTNOV12

review

Review STATISTICAL SCIENCE

Solas 4.0 Felix Grant reviews the latest Solas release M

issing data values are the bane of a statistician’s life. At best, when the distribution of missing

values is truly random, they reduce sample size, which has a knock-on effect on conclusions and confidence. At worst, when there is an extensive pattern of systematic omissions, they can render a whole data set completely useless. At the same time, attempting to fill in (or, in the language of the trade, impute) the missing values is a fraught business full of pitfalls. There is an ever-present danger of building assumptions into the data, which are then expressed in analyses. Very often, however, the benefits of imputation in retrieving at least part of a study for useful non-content-robust analysis outweigh purist qualms. The question is then how to go about the process in the best way and this is where Solas comes in. Irish company Statistical Solutions has a good track

AdviseStats

record in providing focused provision for this type of important, but under- addressed, problem and Solas is no exception.

A variety of methods are provided for both single (group means, hot deck, last value carried forward and predicted mean) and multiple (predictive model, propensity score, predictive mean matching, Mahalanobis and an option to combine the last three) imputations. Graphics, tabulation, and so on, back up all this. A full gamut of controls are available, and rightly so, though making good use of them is a matter not only of knowledge, understanding and experience, but a degree of intuition as well. That being so, it’s good to see that sensible settings are in place and ready to run without touching anything.

The multiple methods default at

start up to five imputed data sets, but can be set as high as 50; this is important, because while efficiency

capture levels off rapidly above five, statistical power gains considerably as the number of sets grows. In an ideal world, perhaps it would be nice to see the upper limit expanded to 100, but that would involve a considerable expansion in computational load and 50 is a sensible compromise. As a test of how useful Solas actually is in practice I took an existing

A FULL GAMUT OF CONTROLS ARE AVAILABLE, AND RIGHTLY SO

14-variable data set of 65 cases, with no missing values, and calculated (in another data analysis package) a selection of standard descriptors, tests and regressions available within Solas. Progressively eliminating data values, first randomly and then systematically in a variety of ways, gave a selection of simulated sets

with between 35 and 85 per cent remaining. After each elimination, the same descriptors, tests and regressions were calculated again. Solus then imputed the missing values in various ways, up to and including multiple imputation, by combination methods over the maximum allowed 50 generated sets and then setting the program to calculate the same descriptors, tests and regressions for the resulting imputed sets. I did some quick spot checking of intermediate steps and of final combination; not surprisingly, everything was right on the button except where I had myself made a mistake. The acid test, though, was how much the final analytic results for partially missing data sets were improved (that is, converged with the known true results from the complete set) and the answer was ‘significantly’ or, in the case of high power multiple imputations, ‘dramatically’. www.solasmissingdata.com

Felix Grant reviews intelligent statistics and analytics advisor, AdviseStats

Open AdviseStats and you are confronted by an unusual sight: not a worksheet, not even a menu system, but an empty (apart from branding) grey rectangular window containing one word at the top left: ‘Start’. AdviseStats is a keyboard-free zone in which you work entirely by pointer selection, and it loves a tablet computer. Click on ‘Start’ and you take the first step into the decision tree; five choices of which two are administrative, two tutorial and one is ‘Access data’. Assuming that you choose to access data, you get the choice of local and web sources or randomly generated matter.

www.scientific-computing.com To keep things short, here, we’ll

proceed directly to a local disk CSV file containing a mixture of continuous, discrete and qualitative variables and answer a question about missing data values (none, in this case). The next choice is about what we want to do: the options depend upon the data, but in this case we have eight from ‘anomalies’ to ‘view data’ (this last option opens a spreadsheet-style display grid), each of them offering a little more detail when the mouse hovers over them. Picking ‘Compare’ presents a list of variables, each with a little distribution graphic next to it: I’ll click on ‘Iron’

and ‘Titanium’, then from the grouping popup I’ll choose ‘Site’. The last task is to click compute; another list of fine tuning options is available, but I’ll click ‘Go’. One final question asks me whether I want to ‘transform variables if needed to satisfy assumptions’, then the test is run. There’s a lot more than that, of course, but I’ve illustrated the central approach. Under the bonnet, so to speak, there is an impressive array of tools ranging, for example, from outlier handling to cluster analysis, composition descriptors to principal components. There is a facility for making very specific interrogations

of the data on the basis of natural language constructs to answer particular questions in specific ways. Is it perfect? Not entirely yet, but

it’s getting there and works best for its intended audience. I encountered an ‘Execution error in assembling output’ message a couple of times, for example, but students making fewer assumptions about their own knowledge did not. In comparisons between student groups using AdviseStats and those making their own decisions, AdviseStats consistently produced more correct outcomes and higher quality results. https://adviseanalytics.com

OCTOBER/NOVEMBER 2012 13

For regular product updates, please visit www.scientific-computing.com/products

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52