| Date: | Fri, 22 Jul 2011 08:54:30 -0400 |
| Reply-To: | Jim Groeneveld <jim.1stat@YAHOO.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Jim Groeneveld <jim.1stat@YAHOO.COM> |
| Subject: | Re: Identifyiing groups |
|---|
Hi Randy,
Sorry, I meant to say: there is NO rule of thumb.
Regards - Jim.
>--
>Jim Groeneveld, Netherlands
>Statistician/SAS consultant
>http://jim.groeneveld.eu.tf
On Fri, 22 Jul 2011 08:48:41 -0400, Jim Groeneveld <jim.1stat@YAHOO.COM> wrote:
>Hi Randy,
>
>There is rule of thumb; it is dependent of many circumstances.
>You just have to test a few different approaches (algorithms)
>with a moderate set of data. A (small) data step approach is:
>
>DATA Flags;
> MERGE Events /* assumed sorted by Date, ID_A, ID_B, ID_C */
> Events (RENAME=(Event=Event_Flag) WHERE=(Event_Flag EQ 1));
> BY Date ID_A ID_B ID_C;
> IF (MISSING(Event_Flag)) THEN Event_Flag = 0;
>RUN;
>
>This merges Events with a (slightly modified) subset of itself,
>while initialising Event_Flag to missing for each new group,
>possibly reading its group value for the first record of a group
>and retaining Event_Flag values of 1 over a whole group
>(group made of unique values for Date, ID_A, ID_B and ID_C).
>
>This can also be interpreted and explained as merging a single
>(or a few) group record(s) multiple times with many group records.
>
>Regards - Jim.
>--
>Jim Groeneveld, Netherlands
>Statistician/SAS consultant
>http://jim.groeneveld.eu.tf
>
>
>On Fri, 22 Jul 2011 07:21:18 -0400, Randy <randistan69@HOTMAIL.COM> wrote:
>
>>Dear All:
>> The data set that I am working with is very large. Will a data step
>>solution be better?
>> Randy
>>
>>
>>On Fri, 22 Jul 2011 04:47:33 -0400, Jim Groeneveld <jim.1stat@YAHOO.COM>
wrote:
>>
>>>Hi Randy,
>>>
>>>There are many solutions possible; one using PROC SQL is:
>>>
>>>DATA Events;
>>> INFILE CARDS TRUNCOVER;
>>> INPUT Date $ ID_A $ ID_B $ ID_C $ Event; * (ID_A also read as character);
>>> CARDS;
>>>01JUL 1 X A
>>>01JUL 1 X A
>>>01JUL 1 X A
>>>01JUL 1 X B
>>>01JUL 1 X B 1
>>>01JUL 1 X B
>>>01JUL 2 X A
>>>01JUL 2 X A
>>>01JUL 2 X A
>>>02JUL 1 Y C
>>>02JUL 1 Y C
>>>02JUL 1 Y C 1
>>>;
>>>RUN;
>>>
>>>* Corrections of inconsistencies;
>>>* - O1JUL (letter O) changed to 01JUL (digit 0);
>>>* - removed Event=1 for 01Jul/2/X/A as in result;
>>>
>>>TITLE "Dataset = Events"; PROC PRINT DATA=Events; RUN;
>>>
>>>PROC SQL;
>>> CREATE TABLE Flags AS
>>> SELECT *,
>>> CASE MAX(Event)
>>> WHEN . THEN 0
>>> ELSE 1
>>> END AS Flag_Event
>>> FROM Events
>>> GROUP BY Date, ID_A, ID_B, ID_C;
>>>QUIT;
>>>
>>>TITLE "Dataset = Flags"; PROC PRINT DATA=Flags; RUN;
>>>
>>>Regards - Jim.
>>>--
>>>Jim Groeneveld, Netherlands
>>>Statistician, SAS consultant
>>>http://jim.groeneveld.eu.tf
>>>
>>>
>>>On Thu, 21 Jul 2011 23:12:25 -0400, Randy <randistan69@HOTMAIL.COM> wrote:
>>>
>>>>Dear all:
>>>>
>>>>My data is as follows
>>>>
>>>>Date ID_A ID_B ID_C Event
>>>>O1JUL 1 X A
>>>>01JUL 1 X A
>>>>01JUL 1 X A
>>>>01JUL 1 X B
>>>>01JUL 1 X B 1
>>>>01JUL 1 X B
>>>>01JUL 2 X A
>>>>01JUL 2 X A 1
>>>>01JUL 2 X A
>>>>02JUL 1 Y C
>>>>02JUL 1 Y C
>>>>02JUL 1 Y C 1
>>>>
>>>>The data that I want is as follows:
>>>>
>>>>Date ID_A ID_B ID_C Event Flag_Event
>>>>O1JUL 1 X A 0
>>>>01JUL 1 X A 0
>>>>01JUL 1 X A 0
>>>>01JUL 1 X B 1
>>>>01JUL 1 X B 1 1
>>>>01JUL 1 X B 1
>>>>01JUL 2 X A 0
>>>>01JUL 2 X A 0
>>>>01JUL 2 X A 0
>>>>02JUL 1 Y C 1
>>>>02JUL 1 Y C 1
>>>>02JUL 1 Y C 1 1
>>>>
>>>>So whenever an event has occurred, I want the event to be flagged for those
>>>ID's
>>>>
>>>> Thanks
>>>> Randy
|