LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2007, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 8 Jan 2007 09:43:44 -0600
Reply-To:     Yu Zhang <zhangyu05@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Yu Zhang <zhangyu05@GMAIL.COM>
Subject:      Re: Another algorithm to capture number of EPISODES of event
Comments: To: SK <skauchali@gmail.com>
In-Reply-To:  <1168252092.917351.313550@38g2000cwa.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,Sk,

Here is one solution for your real dataset as you described (min=0 max=270). The idea is concatenate the di's for same childid,then count the pattern of '000' if there is 1 in that string.

HTH

Yu

data test; input childid :$6. Day di; cards; 104701 1 0 104701 2 0 104701 3 0 104701 4 0 104701 5 0 104701 6 0 104701 7 0 104841 1 0 104841 2 0 104841 3 0 104841 4 0 104841 5 0 104841 6 0 104841 7 0 104901 1 0 104901 2 0 104901 3 0 104901 4 0 104901 5 0 104901 6 0 104901 7 0 104921 1 0 104921 2 1 104921 3 1 104921 4 0 104921 5 0 104921 6 0 104921 7 1 104991 1 1 104991 2 1 104991 3 1 104991 4 1 104991 5 1 104991 6 1 104991 7 1 105011 1 1 105011 2 1 105011 3 . 105011 4 . 105011 5 . 105011 6 1 105011 7 1 105041 1 1 105041 2 0 105041 3 1 105041 4 0 105041 5 0 105041 6 0 105041 7 1 ; run;

proc sort data=test; by childid day; run;

data _null_; length childid $6 alldi $270; retain alldi ' ';

if _n_=1 then do; declare hash h(hashexp: 4); rc = h.defineKey('childid'); rc = h.defineData('childid','alldi'); rc = h.defineDone(); end;

set test end=last;

if h.find()=0 then do; put 'here' _all_; if missing(di) then di=0; alldi=cats(alldi,put(di,8. -L)); h.replace();end; else do; alldi=' '; if missing(di) then di=0; alldi=cats(alldi,put(di,8. -L));h.add();end; if last then h.output(dataset: "work.out");

run;

data out; set out; if index(alldi,'1') then do; diepis=count(alldi,'000')+1; end; else diepis=0; run;

On 1/8/07, SK <skauchali@gmail.com> wrote: > > Hi there; thanks for the help. Here is the data structure I have (child > seen daily and daily records of event kept till child was 270 days old > (maximum days seen). the child could have been seen for any number of > days (min=0 max=270). > > childid Day di > > 104701 1 0 > 104701 2 0 > 104701 3 0 > 104701 4 0 > 104701 5 0 > 104701 6 0 > 104701 7 0 > 104841 1 0 > 104841 2 0 > 104841 3 0 > 104841 4 0 > 104841 5 0 > 104841 6 0 > 104841 7 0 > 104901 1 0 > 104901 2 0 > 104901 3 0 > 104901 4 0 > 104901 5 0 > 104901 6 0 > 104901 7 0 > 104921 1 0 > 104921 2 1 > 104921 3 1 > 104921 4 0 > 104921 5 0 > 104921 6 0 > 104921 7 1 > 104991 1 1 > 104991 2 1 > 104991 3 1 > 104991 4 1 > 104991 5 1 > 104991 6 1 > 104991 7 1 > 105011 1 1 > 105011 2 1 > 105011 3 . > 105011 4 . > 105011 5 . > 105011 6 1 > 105011 7 1 > 105041 1 1 > 105041 2 0 > 105041 3 1 > 105041 4 0 > 105041 5 0 > 105041 6 0 > 105041 7 1 > > > You will notice I have excerpted 7 child records, seen for the first 7 > days only (repeated daily records). > 1 is an event > 0 is no event > I want the algorithm to define an EPISODE of di to be so: > dieps (di EPISODE): an episode of di is when there is a 1 in the week > (7days) that is separated by 3 consecutive di free days (zero's). > > So for example this algorithm would produce a flat (one childid per > row) output dataset like so: > childid dieps > 104701 0 > 104841 0 > 104901 0 > 104921 2 > 104991 1 > 105011 2 > 105041 2 > > Notice that childid 104921 has 2 episodes of di in that week one at the > beginning and one at the end; episodes separated by at least 3 days of > di free days. However, for childid 105041 there are also 2 di episodes. > In this case, the first 3 days would be one episode (because they are > not separted by at least 3 di free days), and the last day would the > second episode. > > I am also not sure how to take into account the missing days (see child > 105011); if we assume it is the same episode, then we may be > underestimating the number of episodes in the sample; if we assume > there are 2 separate episodes then we may be overestimating total > number of episodes in sample. > > I would appreciate if I could get some help doing this data > preparation. > > Many thanks > SK >


Back to: Top of message | Previous page | Main SAS-L page