Date: Fri, 18 Nov 2005 05:25:01 -0800
Reply-To: Dave <david.brewer@UC.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dave <david.brewer@UC.EDU>
Organization: http://groups.google.com
Subject: Re: PROC SQL Help -- SOLVED
In-Reply-To: <BAY103-F14A9120F4F7E069D140425B05E0@phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"
Howard,
Thanks so much for your solution...this is what I needed.
Also, much thanks to David Cassell, Arild Skogmo, chris@OVIEW.CO.UK,
Howard Schreier for their excellent suggestions and taking time out of
their busy schedule to assist me.
Dave
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Howard Schreier <hs AT dc-sug DOT org>
Sent: Thursday, November 17, 2005 11:12 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: PROC SQL Help
I think the essential problem can be solved with a rather simple DATA
step.
Test data:
data cases;
input begin_month case_data $ @@;
cards;
7 a
8 b
9 c
11 d 11 e 11 f 11 g
12 h
;
data controls;
input begin_month control_data $ @@;
cards;
7 A
8 B 8 C
9 D
11 E 11 F 11 G 11 H 11 I 11 J
12 K
14 L
;
I would use SQL to randomize the controls within each group. This is a
refinement, and is not critical for the technique used below.
proc sql;
create view rancontrols as
select * from controls
order by begin_month, ranuni(123);
quit;
Now do the merge. It's a many-to-many, but that's OK in this case.
data matches;
incases = 0;
merge cases(in=incases keep=begin_month) rancontrols;
by begin_month;
if incases;
run;
The explicit reset of the IN= variable INCASES makes the many-to-many
situations behave appropriately. It's the one unusual thing here.
I did not deal with the consequences of a group having fewer controls
than cases.
Results:
begin_ control_
Obs month data
1 7 A
2 8 C
3 9 D
4 11 I
5 11 J
6 11 F
7 11 E
8 12 K
On Thu, 17 Nov 2005 05:31:47 -0500, Dave Brewer <david.brewer@UC.EDU>
wrote:
>Greetings.
>
>I need to create a SAS data set via PROC SQL that will give me a one
>control for every case record I have and only keep the variables found
>in the control data set; I will be matching on begin_month.
>
>For example, my case data set contains 8 records consisting of one
>record for begin_month=7, one record for begin_month=8, one record for
>begin_month=9, four records for begin_month=11 and one record for
>begin_month=12; this data set contains 30 variables.
>
>My control data set contains 4000 records with various months and has
>30 variables. I need to wind up with a sas data set that contains
>eight control records with all 30 variables/columns from the control
>data set only (no variables from case, except begin_month) matching on
>begin_month found in the case file.
>
>I am using SAS v9 on Windows XP.
>
>Any help you can provide would be greatly appreciated.
>
>Thanks.
>Dave