LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2001, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 8 Jan 2001 10:49:41 -0800
Reply-To:     "Huang, Ya" <ya.huang@AGOURON.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Huang, Ya" <ya.huang@AGOURON.COM>
Subject:      Re: Efficiency in SQL
Comments: To: kviel <kviel@GMCF.ORG>
Content-Type: multipart/alternative;

Kevin,

I am not sure whether proc sql is the most efficient way for this case, even though I am a sql fan. Some data step and proc sort will do the job though:

data one; input id b_diag1 $ b_diag2 $ b_diag3 $; cards; 1 a b c 2 c d e 3 f g h ; run;

data two; input code $; cards; a h j ; run;

proc sort; by code;

data one (keep=id code); set one; array bd $ b_diag1-b_diag3; do over bd; code=bd; output; end;

proc sort; by code id;

data keep; merge one (in=one) two (in=two); by code; if one and two;

proc sort data=keep nodupkey; by id;

proc print; run;

----------------------- OBS ID CODE

1 1 a 2 3 h

regards,

Ya Huang

-----Original Message----- From: kviel [mailto:kviel@GMCF.ORG] Sent: Monday, January 08, 2001 10:22 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Efficiency in SQL

I have a dataset with multiple diagnoses for each observation, most of which are missing. I have a second dataset of codes meeting my case definition. I decided to use the SQL procedure to identify any observation with at least one diagnosis that meets the case definition. I have written the following, but wonder if I have created a resource intensive beast. Thankfully, there are only ten diagnosis fields per observation and no more than 500 case definition codes. However, there are millions of observations. I thought if there might be an equivalent to an ARRAY[] that I would not need the OR's and would only need to SELECT * FROM two once. Thanks in advance for any ideas. Below is a sample:

data one; input id b_diag1 $ b_diag2 $ b_diag3 $; cards; 1 a b c 2 c d e 3 f g h ; run;

data two; input code $; cards; a h j ; run;

proc sql; select id from one where b_diag1 in (select * from two) or b_diag2 in (select * from two) or b_diag3 in (select * from two) ; quit;

Kevin

Kevin Viel Georgia Medical Care Foundation 57 Executive Park South, NE suite 200 Atlanta, GA 30329-2224

------------------------ CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents, files or previous e-mail messages attached to it may contain proprietary, privileged or confidential information. If you are not an intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify me by reply e-mail and destroy the original transmission and its attachments without saving them in any manner. -----------------------


[text/html]


Back to: Top of message | Previous page | Main SAS-L page