LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2010, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 27 Jan 2010 09:13:03 -0500
Reply-To:     msz03@albany.edu
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Mike Zdeb <msz03@ALBANY.EDU>
Subject:      Re: problems selecting subst
Content-Type: text/plain;charset=iso-8859-1

hi ... as others have pointed out, you have numeric data and are using a character function

so, since your diagnosis codes are numeric and you want to look at the 1st three characters here's an idea that allows you to look at the 1st three characters and not get any of those type conversion notes that you should see in the LOG if you used your original code

*** array prx (3) diag1-diag3; do i = 1 to 3; if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1; ***

the CAT function allows you to treat the array elements as character values and the ":" allows you to just look at the first three characters of various diagnoses ... you can stop the loop once you find a hit ... the equation

tag = (cat(prx(_n_)) in : ('171' '172'));

produces a 1 if you find a diagnosis and a 0 if you do not

* your data; data dx; input diag1-diag3; datalines; 172100 147790 900000 202509 900000 900000 184490 140190 144390 159900 207000 900000 172110 900000 900000 143590 900000 900000 141490 141390 900000 156400 145580 140190 147390 137230 149390 173530 147790 900000 149600 116200 900000 171900 100880 118500 ; run;

* a suggestion; data dxplus; set dx; array prx(3) diag1-diag3; do _n_ = 1 to 3 until (tag eq 1); tag = (cat(prx(_n_)) in : ('171' '172')); end; run;

no nasty LOG messages ... 408 data dxplus; 409 set dx; 410 array prx(3) diag1-diag3; 411 do _n_ = 1 to 3 until (tag eq 1); 412 tag = (cat(prx(_n_)) in : ('171' '172')); 413 end; 414 run; NOTE: There were 12 observations read from the data set WORK.DX. NOTE: The data set WORK.DXPLUS has 12 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

and the data set... Obs diag1 diag2 diag3 tag 1 172100 147790 900000 1 2 202509 900000 900000 0 3 184490 140190 144390 0 4 159900 207000 900000 0 5 172110 900000 900000 1 6 143590 900000 900000 0 7 141490 141390 900000 0 8 156400 145580 140190 0 9 147390 137230 149390 0 10 173530 147790 900000 0 11 149600 116200 900000 0 12 171900 100880 118500 1

it also works fine with your 5-character strings you can leave the ":" in the tag equation or take it out since you are looking for all 5 characters (works either way)

data dxplus; set dx; array prx(3) diag1-diag3; do _n_ = 1 to 3 until (tag eq 1); tag = (cat(prx(_n_)) in ('172100', '172110')); end; run;

-- Mike Zdeb U@Albany School of Public Health One University Place Rensselaer, New York 12144-3456 P/518-402-6479 F/630-604-1475

> The data are numeric: > Hi All, > > I am having some problems selecting a substring of data, and I was hoping someone might have some insight on the problem. > > I am trying to select ICD-9 codes. My code works successfully in other datasets, but I am not able to select my observations in the below dataset and I am not sure why. > > My code that works on other datasets is as follows: > > > data z.NMSC; > set z.namc94; > tag = 0; > array prx (3) diag1-diag3; > do i = 1 to 3; > if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1; > end; > run; > > In the dataset described below, the code does not select any observations. However, if I change the code to the below, I select a few observations, but not the entire set I want: > > > data z.NMSC; > set z.namc94; > tag = 0; > array prx (3) diag1-diag3; > do i = 1 to 3; > if prx{i} in ('172100', '172110') then tag = 1; > end; > run; > > Any help on the problem? As a work around, I could list all possible codes in the (), but there has to be a better way. > > > Dataset: > > 26 DIAG1 Num 8 > 27 DIAG2 Num 8 > > Example of data: > > Obs DIAG1 DIAG2 DIAG3 > > 1 172100 147790 900000 2 202509 900000 900000 3 184490 140190 144390 4 159900 207000 900000 5 172110 900000 900000 6 143590 900000 900000 7 141490 141390 900000 8 156400 145580 140190 9 147390 137230 149390 > 10 173530 147790 900000 11 149600 116200 900000 12 171900 100880 118500 > > > Thank you! >


Back to: Top of message | Previous page | Main SAS-L page