|
Michael:
1. Decide on a gap size such that:
- dates separated by more than that gap belong to different groups.
- dates separated by less than or equal to gap belong to the same group.
In your case it looks like a gap size of 10 will work.
2. Sort your data by date into dataset HAVE.
proc sort data=mydata out=have;
by date test;
run;
3. run a program like this, which assumes that the satisfactory GAP size is 10 days:
data want;
retain groupnum 0 gap 10;
set have;
lagdate=lag(date);
if _n_=1 or (date-lagdate) > gap then groupnum=groupnum+1;
run;
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Michael Vostrich
> Sent: Friday, August 27, 2010 3:12 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: How to recognize a pattern and assign a group
>
> Dear SAS-L experts,
>
> I would greatly appreciate you help in finding a solution to a problem
> of
> grouping the observations in the following situation. I have tests that
> were
> performed in several groups, but the group number is missing. I know
> that
> the tests in a group were performed on “seemingly consecutiveâ€
> days, while
> groups were separated by several weeks (at least, this is true within
> the
> same test number). I need to group the records by assigning the group
> number
> to a group of tests for the same test number.
> The example dataset is dataset A below. I calculated the duration,
> which
> gives me a very visual pattern (see the proc print), and I will
> appreciate
> your help with the code that would create GROUPID, which I just added
> manually myself to explain what I need (GROUPID serves to group the
> "similar
> tests" within the same test number).
>
> Thank you in advance!
> Michael
>
> data a;
> format testdate date.;
> input testno testdate date.;
> cards;
> 1 15mar2010
> 1 16mar2010
> 1 17mar2010
> 1 31mar2010
> 1 01apr2010
> 1 02apr2010
> 1 20apr2010
> 1 22apr2010
> 2 13mar2010
> 2 15mar2010
> 2 30mar2010
> 2 31mar2010
> 2 01apr2010
> 2 02apr2010
> 2 20apr2010
> 2 24apr2010
> ;
>
> data b (drop=_:);
> set a ;
> by testno testdate;
> _temp=lag(testdate);
> if first.testno then _temp=.;
> duration=testdate - _temp;
> run;
> proc print data=b; run;
>
> testdate testno duration GROUPID
> manually added
>
> 15MAR02 1 . 1
> 16MAR02 1 1 1
> 17MAR02 1 1 1
> 31MAR02 1 14 2
> 01APR02 1 1 2
> 02APR02 1 1 2
> 20APR02 1 18 3
> 22APR02 1 2 3
> 13MAR02 2 . 1
> 15MAR02 2 2 1
> 30MAR02 2 15 2
> 31MAR02 2 1 2
> 01APR02 2 1 2
> 02APR02 2 1 2
> 20APR02 2 18 3
> 24APR02 2 4 3
|