LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2002, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 30 May 2002 18:38:26 -0400
Reply-To:     "Huang, Ya" <ya.huang@PFIZER.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Huang, Ya" <ya.huang@PFIZER.COM>
Subject:      Re: Avg subsets of values in a dataset
Comments: To: "Cassell.David@EPAMAIL.EPA.GOV" <Cassell.David@EPAMAIL.EPA.GOV>
Content-Type: text/plain

David,

Thanks for the 'week.n' explanation.

In your code, you used "round(sum(value)/count(value)) as average", which is actually equivalent to round(mean(value)), because mean() dose not count missing. I don't think (may be Howard neither), this is what the original post wanted. In his sample data:

1/1/01 55 1/3/01 58 1/4/01 59 1/6/01 57 1/7/01 56

he wanted the average of week 1 to be 41, which is (55+0+58+59+57+56)/7, where 0 is for the missing day 1/2/01. If we use mean(value) or sum(value)/count(value), it would be (55+0+58+59+57+56)/6=47!

The original post did ask to treat missing as zero, but he did not say treat it as low. A data like below should result in a avg=(55+0-59+59+57+56)/7=24, and low=-59, high=59. This is my understanding to the original post, and also the basis of my solution which involved merging an index data, and set missing to zero, which is not as elegant as Howard's solution.

1/1/01 55 1/3/01 -59 1/4/01 59 1/6/01 57 1/7/01 56

Kind regards,

Ya Huang -----Original Message----- From: David L. Cassell [mailto:Cassell.David@EPAMAIL.EPA.GOV] Sent: Thursday, May 30, 2002 3:13 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Avg subsets of values in a dataset

"Huang, Ya" <ya.huang@PFIZER.COM> replied [in part]: > What is 'week.2', I couldn't find the explanation from Online Doc.

The only place I've seen it in the Online Docs is in a tiny example under the INTCK() function. Look at example 5 there, and you'll see the use of it with 'week2' instead of 'week'. I actually learned of it from reading SAS-L instead of from the docs. The 'week.n' form allows you to specify the day of the week you use as the demarcation point. Sunday=1, Monday=2, etc.

> By the way, the way you calculated average is very clever, > instead of using mean(), you used sum()/7, which assumed missing > value to be zero.

Actually, that was the only thing about Howard's solution that bothered me. This is clearly a case where missing values should occur. [See below for my resolution to the problem.]

> But the way you find the min, might not be > very robust, in case that a week has some missing days, and one of > the value is a negative number, you'll get 0 for low, in stead of > the real negative number. Of course, the original post dose > not seem to have any negative value.

The original post *asked* for a missing value to get a min() of 0. I thought that was a bad idea, and it will probably cause problems for the poster somewhere down the line. I say, let missing values be missing values! Don't make them stay in the fake-number closet! :-)

So here's my version, based on Ya's always-impressive work [warning - untested code]:

proc sql; select intnx('week.2',date,0) as monday format=mmddyys8., calculated monday + 6 as sunday format=mmddyys8., round(sum(value)/count(value)) as average, case when count(value)=7 then min(value) else 0 end as low, max(value) as high from xx group by monday; quit;

Although I still think the low when there are missing values should not be set to zero...

David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page