Date: Fri, 4 Mar 2011 12:21:48 -0500 Suzanne McCoy "SAS(r) Discussion" Suzanne McCoy Re: std deviation question To: Peter Flom <009301cbda8f\$6e121b50\$4a3651f0\$@mindspring.com> text/plain; charset="us-ascii"

Thanks Peter, that's exactly what I needed for the conversation with the business user

-----Original Message----- From: Peter Flom [mailto:peterflomconsulting@mindspring.com] Sent: Friday, March 04, 2011 12:13 PM To: Suzanne McCoy; SAS-L@LISTSERV.UGA.EDU Subject: RE: std deviation question

Suzanne McCoy wrote

>>>The business user spec requests the value be calculated as std(median_nbr_trips). Do I question the spec or just calc the number and not worry about it? I know it is mathematically incorrect >>>to average averages so was just curious if taking the standard deviation of an average was okay. This is only for estimation purposes across a group of similar shoppers.

Well, you could say "that's what they are paying for, that's what I'll give them" but it does seem an odd thing. I'm guessing nbr_trips is number of trips to, say, a supermarket. Now, suppose we have Joe, Bill and Sue. Over a period of 4 weeks, Joe takes 0, 0, 3 and 2 trips. Bill takes 5, 6, 6, and 7. Sue takes 1, 1, 1, and 1. This gives medians of 2, 6 and 1. Then you find the sd of those and .... what? How is this useful? USUALLY, if you want to take a median, rather than a mean, you want an interquartile range, or maybe median absolute deviation, rather than an SD.

I *think* what the user might want is the median over all, and then the IQR overall.

Even worse, suppose each person has data for a different number of weeks ....

Also, it's not mathematically wrong to take the average of averages, it's just that the result is not what most people doing such a thing would want.

Peter

Back to: Top of message | Previous page | Main SAS-L page