| Date: | Mon, 22 Sep 2008 21:56:13 +0100 |
| Reply-To: | David Hitchin <d.h.hitchin@sussex.ac.uk> |
| Sender: | "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU> |
| From: | David Hitchin <d.h.hitchin@sussex.ac.uk> |
| Subject: | Re: Opinions about validity of Predictive Analytics programs? |
|
| In-Reply-To: | <97D6F0A82A6E894DAF44B9F575305CC905913838@HCAMAIL03.ochca.com> |
| Content-Type: | text/plain; charset=ISO-8859-1 |
Quoting "Pirritano, Matthew" <MPirritano@ochca.com>:
> I know that SPSS has a predictive analytics module. I've also been
> exposed to predictive analytic programs that make use of actuarial
> data to predict risk in healthcare settings. etc
>
> All joking aside, does anyone have an opinion about this? As a lowly
> peon I'm not sure if my opinion is valid or if I'm missing something
> basic.
>
As always, Hector Maletta has written some very sensible words in reply,
but I have a few comments to add.
The first is that predictive equations work on the assumption that the
world has not changed between the formulation of the model and the
calculation of risk in the future.
Next, when a very large proportion of a population behaves in a similar
way, then trivial predictions seem to have great predictive power, e.g.
if only 1 person in 1000 gets a rare disease, then the trivial
prediction that of 1000 people will all remain healthy achieves an
accuracy of 99.9% - but is completely useless at identifying the one
person who may need treatment. (The statistics for screening programmes
which attempt to identify cancers generally identify so many false
positives that they are of questionable value).
Often models are fitted on the basis of limited data, with no subjects
at all observed under some of the combinations of circumstances, so
predictions are inappropriate when new individuals are observed with
those characteristics.
Finally, fitted models need proper validation. A common method is to
construct a "hits and misses table", i.e. each observation is checked
against the model to see what the outcome should be, and this is
compared with the known outcome. The flaw here is testing the model on
the same data which was used to set its parameters. Jack-knifing and
bootstrapping methods can be used to reduce this bias, e.g. no
observation is used at the same time for fitting the model and testing it.
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|