Date: Mon, 8 Jan 2007 09:43:44 -0600
Reply-To: Yu Zhang <zhangyu05@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Yu Zhang <zhangyu05@GMAIL.COM>
Subject: Re: Another algorithm to capture number of EPISODES of event
In-Reply-To: <1168252092.917351.313550@38g2000cwa.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi,Sk,
Here is one solution for your real dataset as you described (min=0 max=270).
The idea is concatenate the di's for same childid,then count the pattern of
'000' if there is 1 in that string.
HTH
Yu
data test;
input childid :$6. Day di;
cards;
104701 1 0
104701 2 0
104701 3 0
104701 4 0
104701 5 0
104701 6 0
104701 7 0
104841 1 0
104841 2 0
104841 3 0
104841 4 0
104841 5 0
104841 6 0
104841 7 0
104901 1 0
104901 2 0
104901 3 0
104901 4 0
104901 5 0
104901 6 0
104901 7 0
104921 1 0
104921 2 1
104921 3 1
104921 4 0
104921 5 0
104921 6 0
104921 7 1
104991 1 1
104991 2 1
104991 3 1
104991 4 1
104991 5 1
104991 6 1
104991 7 1
105011 1 1
105011 2 1
105011 3 .
105011 4 .
105011 5 .
105011 6 1
105011 7 1
105041 1 1
105041 2 0
105041 3 1
105041 4 0
105041 5 0
105041 6 0
105041 7 1
;
run;
proc sort data=test;
by childid day;
run;
data _null_;
length childid $6 alldi $270;
retain alldi ' ';
if _n_=1 then do;
declare hash h(hashexp: 4);
rc = h.defineKey('childid');
rc = h.defineData('childid','alldi');
rc = h.defineDone();
end;
set test end=last;
if h.find()=0 then do; put 'here' _all_;
if missing(di) then di=0;
alldi=cats(alldi,put(di,8. -L)); h.replace();end;
else do;
alldi=' ';
if missing(di) then di=0;
alldi=cats(alldi,put(di,8. -L));h.add();end;
if last then h.output(dataset: "work.out");
run;
data out;
set out;
if index(alldi,'1') then do;
diepis=count(alldi,'000')+1;
end;
else diepis=0;
run;
On 1/8/07, SK <skauchali@gmail.com> wrote:
>
> Hi there; thanks for the help. Here is the data structure I have (child
> seen daily and daily records of event kept till child was 270 days old
> (maximum days seen). the child could have been seen for any number of
> days (min=0 max=270).
>
> childid Day di
>
> 104701 1 0
> 104701 2 0
> 104701 3 0
> 104701 4 0
> 104701 5 0
> 104701 6 0
> 104701 7 0
> 104841 1 0
> 104841 2 0
> 104841 3 0
> 104841 4 0
> 104841 5 0
> 104841 6 0
> 104841 7 0
> 104901 1 0
> 104901 2 0
> 104901 3 0
> 104901 4 0
> 104901 5 0
> 104901 6 0
> 104901 7 0
> 104921 1 0
> 104921 2 1
> 104921 3 1
> 104921 4 0
> 104921 5 0
> 104921 6 0
> 104921 7 1
> 104991 1 1
> 104991 2 1
> 104991 3 1
> 104991 4 1
> 104991 5 1
> 104991 6 1
> 104991 7 1
> 105011 1 1
> 105011 2 1
> 105011 3 .
> 105011 4 .
> 105011 5 .
> 105011 6 1
> 105011 7 1
> 105041 1 1
> 105041 2 0
> 105041 3 1
> 105041 4 0
> 105041 5 0
> 105041 6 0
> 105041 7 1
>
>
> You will notice I have excerpted 7 child records, seen for the first 7
> days only (repeated daily records).
> 1 is an event
> 0 is no event
> I want the algorithm to define an EPISODE of di to be so:
> dieps (di EPISODE): an episode of di is when there is a 1 in the week
> (7days) that is separated by 3 consecutive di free days (zero's).
>
> So for example this algorithm would produce a flat (one childid per
> row) output dataset like so:
> childid dieps
> 104701 0
> 104841 0
> 104901 0
> 104921 2
> 104991 1
> 105011 2
> 105041 2
>
> Notice that childid 104921 has 2 episodes of di in that week one at the
> beginning and one at the end; episodes separated by at least 3 days of
> di free days. However, for childid 105041 there are also 2 di episodes.
> In this case, the first 3 days would be one episode (because they are
> not separted by at least 3 di free days), and the last day would the
> second episode.
>
> I am also not sure how to take into account the missing days (see child
> 105011); if we assume it is the same episode, then we may be
> underestimating the number of episodes in the sample; if we assume
> there are 2 separate episodes then we may be overestimating total
> number of episodes in sample.
>
> I would appreciate if I could get some help doing this data
> preparation.
>
> Many thanks
> SK
>
|