Date: Thu, 26 Mar 2009 23:59:22 -0400
Reply-To: dynamicpanel@YAHOO.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: dynamicpanel@YAHOO.COM
Subject: Re: Groupings
Okay, I understood it as to flag the record as 1 if *more than one value of
ID_A appears in a combined unit of ID_B, ID_C*.
anyway, it doens't matter, we all have fun coding
On Thu, 26 Mar 2009 22:22:48 -0400, Muthia Kachirayan
<muthia.kachirayan@GMAIL.COM> wrote:
>Randy,
>
>Dan's solution follows the OP's requirement:
>
>==> So if ID_B with ID_C as one unit appears in *more than one ID_A then
>the
>flag should be 1 or else the flag should be 0*. Please note that it is a
>huge dataset. <==
>
>A simple modification can eliminate the PROC SORT. Further the dataset can
>be scanned only twice as in code posted by dynamicpanel. The hash table can
>store only the unique key(ID_B and ID_C) with COUNT as data part -
hopefully
>the OP's huge dataset may fit into memory. Here is the code. As external
>sort is eliminated, this code is expected to be faster.
>
>data want(drop = count);
>if _n_ = 1 then do;
> declare hash h();
> h.definekey('id_b','id_c');
> h.definedata('count');
> h.definedone();
>
> do until(eof);
> set sample end = eof;
> if h.find() ne 0 then count = 0;
> count ++ 1;
> h.replace();
> end;
>end;
> set sample;
> h.find();
> flag = 0;
> if count > 1 then flag = 1;
>run;
>
>Muthia Kachirayan
>
>
>On Thu, Mar 26, 2009 at 9:12 PM, <dynamicpanel@yahoo.com> wrote:
>
>> My previous post has a error, please refer to this one which deals with
>> more generic situation. I also trickled the original sample data: I added
>> in a duplicate record for the first combination: 1 A XA
>>
>> /***/
>> data sample;
>> input ID_A ID_B $ ID_C $;
>> cards;
>> 1 A XA
>> 1 A XA
>> 1 B XA
>> 1 C XA
>> 2 A XA
>> 2 E XB
>> 2 F XA
>> 3 C XA
>> 3 F XC
>> 4 F XD
>> ;
>> run;
>>
>> proc sort data=sample; by ID_B ID_C ID_A; run;
>>
>>
>> data want;
>> set sample; by ID_B ID_C;
>> *retain last_g ;
>> retain flag 0;
>> *j=_n_;
>> if first.ID_C & last.ID_C then flag=0;
>> else do;
>> if flag=0 then do;
>> j=_n_+1; IDB=ID_B; IDC=ID_C; eog=0;
>> do while(eog=0);
>> set sample( rename=(ID_A=IDA ID_B=IDB ID_C=IDC))
>> point=j;
>> if IDB=ID_B & IDC=ID_C then do; if
>> ID_A^=IDA then flag=1; end;
>> else eog=1;
>> j+1;
>> end;
>> end;
>> end;
>> IDC1=first.ID_C; IDC2=last.ID_C;
>> drop IDA IDB IDC eog;
>> run;
>>
>> Final View:
>> ID_A ID_B ID_C flag
>> 1 A XA 1
>> 1 A XA 1
>> 2 A XA 1
>> 1 B XA 0
>> 1 C XA 1
>> 3 C XA 1
>> 2 E XB 0
>> 2 F XA 0
>> 3 F XC 0
>> 4 F XD 0
>>
>> /*******************/
>> *If we change the data into;
>>
>> data sample;
>> input ID_A ID_B $ ID_C $;
>> cards;
>> 1 A XA
>> 1 A XA
>> 1 B XA
>> 1 C XA
>> 2 E XB
>> 2 F XA
>> 3 C XA
>> 3 F XC
>> 4 F XD
>> ;
>> run;
>>
>> Pattern 1 A XA should have flag=0, but Dan's program assigns flag=1. My
>> program works correctly.
>>
|