|
Thanks Peter, that's exactly what I needed for the conversation with the business user
-----Original Message-----
From: Peter Flom [mailto:peterflomconsulting@mindspring.com]
Sent: Friday, March 04, 2011 12:13 PM
To: Suzanne McCoy; SAS-L@LISTSERV.UGA.EDU
Subject: RE: std deviation question
Suzanne McCoy wrote
>>>The business user spec requests the value be calculated as
std(median_nbr_trips). Do I question the spec or just calc the number and not worry about it? I know it is mathematically incorrect >>>to average averages so was just curious if taking the standard deviation of an average was okay. This is only for estimation purposes across a group of similar shoppers.
Well, you could say "that's what they are paying for, that's what I'll give them" but it does seem an odd thing. I'm guessing nbr_trips is number of trips to, say, a supermarket. Now, suppose we have Joe, Bill and Sue. Over a period of 4 weeks, Joe takes 0, 0, 3 and 2 trips. Bill takes 5, 6, 6, and 7. Sue takes 1, 1, 1, and 1. This gives medians of 2, 6 and 1. Then you find the sd of those and .... what? How is this useful? USUALLY, if you want to take a median, rather than a mean, you want an interquartile range, or maybe median absolute deviation, rather than an SD.
I *think* what the user might want is the median over all, and then the IQR overall.
Even worse, suppose each person has data for a different number of weeks ....
Also, it's not mathematically wrong to take the average of averages, it's just that the result is not what most people doing such a thing would want.
Peter
|