Date: Mon, 6 Aug 2007 23:17:37 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: compare variables within same dataset
See comments, etc. inline.
On Mon, 6 Aug 2007 18:09:50 +0000, toby dunn <tobydunn@HOTMAIL.COM> wrote:
>Proc SQL ;
>Create Table Need As
>Select ID , Race
>From Have
> Group By ID
> Having Count( Distinct Race ) > 1 ;
>Quit ;
Including RACE leads to a remerge, so there will be a bunch of exra rows in
the result (if Sarah's request is taken literally, she only wants one
number; I'm assuming however that she really wants a list of distinct IDs).
>
>
>
>
>
>Toby Dunn
>
>
>From: Sarah Tran <st.tran@GMAIL.COM>
>Reply-To: Sarah Tran <st.tran@GMAIL.COM>
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: compare variables within same dataset
There are ways to compare a data set with itself. Typically it involves
subsetting on one or both sides of the comparison. But this is really a
counting problem more than a comparing problem.
>Date: Mon, 6 Aug 2007 10:55:29 -0700
>
>I have the following dataset where each observation represents a
>hospital visit, ID variable is a unique identifier for a patient, and
>RACE is their race. I want to count how many people had their race
>recorded inconsistently any time during their hospital visits. For
>example, only ID's #2 and 4 had their races recorded inconsistently in
>the visits they had. The full dataset has over 30,000 observations and
>a patient (e.g., a unique ID) can appear over 50 times.
>
>I tried using PROC COMPARE but can't get it to do what I want it to
>do.
PROC COMPARE indeed is not a good fit for the present problem.
>Does anyone have suggestions? Maybe with MACROS?
Don't be so eager for a macro. If anything, be reluctant. Don't think that
when a problem seems difficult, a macro must be the solution. The macro
facility is not an all-purpose supercharger for SAS.
>Thanks for any help.
Others have provided solutions.
For a counting problem, PROC FREQ should come to mind. In this case you need
back-to-back PROC FREQ steps.
proc freq data=have noprint;
tables id * race / out=halfbaked(drop = count percent);
run;
proc freq data=halfbaked noprint;
tables id / out=need(drop = percent where = (count>1) );
run;
>
>ID RACE
>1 A
>1 A
>2 A
>2 O
>2 A
>3 B
>3 B
>4 N
>4 N
>4 O
>4 O
>4 O
>..
>..
>..