Date: Thu, 30 May 2002 18:38:26 -0400
Reply-To: "Huang, Ya" <ya.huang@PFIZER.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Huang, Ya" <ya.huang@PFIZER.COM>
Subject: Re: Avg subsets of values in a dataset
Content-Type: text/plain
David,
Thanks for the 'week.n' explanation.
In your code, you used "round(sum(value)/count(value)) as average",
which is actually equivalent to round(mean(value)), because mean()
dose not count missing. I don't think (may be Howard neither), this is
what the original post wanted. In his sample data:
1/1/01 55
1/3/01 58
1/4/01 59
1/6/01 57
1/7/01 56
he wanted the average of week 1 to be 41, which is (55+0+58+59+57+56)/7,
where 0 is for the missing day 1/2/01. If we use mean(value) or
sum(value)/count(value), it would be (55+0+58+59+57+56)/6=47!
The original post did ask to treat missing as zero, but he did not
say treat it as low. A data like below should result in a
avg=(55+0-59+59+57+56)/7=24, and low=-59, high=59. This is my understanding
to the original post, and also the basis of my solution which
involved merging an index data, and set missing to zero, which is not as
elegant as Howard's solution.
1/1/01 55
1/3/01 -59
1/4/01 59
1/6/01 57
1/7/01 56
Kind regards,
Ya Huang
-----Original Message-----
From: David L. Cassell [mailto:Cassell.David@EPAMAIL.EPA.GOV]
Sent: Thursday, May 30, 2002 3:13 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Avg subsets of values in a dataset
"Huang, Ya" <ya.huang@PFIZER.COM> replied [in part]:
> What is 'week.2', I couldn't find the explanation from Online Doc.
The only place I've seen it in the Online Docs is in a tiny
example under the INTCK() function. Look at example 5 there,
and you'll see the use of it with 'week2' instead of 'week'.
I actually learned of it from reading SAS-L instead of from the
docs. The 'week.n' form allows you to specify the day of the
week you use as the demarcation point. Sunday=1, Monday=2, etc.
> By the way, the way you calculated average is very clever,
> instead of using mean(), you used sum()/7, which assumed missing
> value to be zero.
Actually, that was the only thing about Howard's solution that
bothered me. This is clearly a case where missing values should
occur. [See below for my resolution to the problem.]
> But the way you find the min, might not be
> very robust, in case that a week has some missing days, and one of
> the value is a negative number, you'll get 0 for low, in stead of
> the real negative number. Of course, the original post dose
> not seem to have any negative value.
The original post *asked* for a missing value to get a min() of 0.
I thought that was a bad idea, and it will probably cause problems
for the poster somewhere down the line. I say, let missing values
be missing values! Don't make them stay in the fake-number closet!
:-)
So here's my version, based on Ya's always-impressive work
[warning - untested code]:
proc sql;
select intnx('week.2',date,0) as monday format=mmddyys8.,
calculated monday + 6 as sunday format=mmddyys8.,
round(sum(value)/count(value)) as average,
case when count(value)=7 then min(value) else 0 end as low,
max(value) as high
from xx
group by monday;
quit;
Although I still think the low when there are missing values
should not be set to zero...
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician