Date: Mon, 22 Sep 2008 21:56:13 +0100 David Hitchin "SPSSX(r) Discussion" David Hitchin Re: Opinions about validity of Predictive Analytics programs? To: "Pirritano, Matthew" <97D6F0A82A6E894DAF44B9F575305CC905913838@HCAMAIL03.ochca.com> text/plain; charset=ISO-8859-1

Quoting "Pirritano, Matthew" <MPirritano@ochca.com>: > I know that SPSS has a predictive analytics module. I've also been > exposed to predictive analytic programs that make use of actuarial > data to predict risk in healthcare settings. etc > > All joking aside, does anyone have an opinion about this? As a lowly > peon I'm not sure if my opinion is valid or if I'm missing something > basic. > As always, Hector Maletta has written some very sensible words in reply, but I have a few comments to add.

The first is that predictive equations work on the assumption that the world has not changed between the formulation of the model and the calculation of risk in the future.

Next, when a very large proportion of a population behaves in a similar way, then trivial predictions seem to have great predictive power, e.g. if only 1 person in 1000 gets a rare disease, then the trivial prediction that of 1000 people will all remain healthy achieves an accuracy of 99.9% - but is completely useless at identifying the one person who may need treatment. (The statistics for screening programmes which attempt to identify cancers generally identify so many false positives that they are of questionable value).

Often models are fitted on the basis of limited data, with no subjects at all observed under some of the combinations of circumstances, so predictions are inappropriate when new individuals are observed with those characteristics.

Finally, fitted models need proper validation. A common method is to construct a "hits and misses table", i.e. each observation is checked against the model to see what the outcome should be, and this is compared with the known outcome. The flaw here is testing the model on the same data which was used to set its parameters. Jack-knifing and bootstrapping methods can be used to reduce this bias, e.g. no observation is used at the same time for fitting the model and testing it.

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Back to: Top of message | Previous page | Main SPSSX-L page