Date: Wed, 27 Jan 2010 09:13:03 -0500
Reply-To: msz03@albany.edu
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mike Zdeb <msz03@ALBANY.EDU>
Subject: Re: problems selecting subst
Content-Type: text/plain;charset=iso-8859-1
hi ... as others have pointed out, you have numeric data and are using a character function
so, since your diagnosis codes are numeric and you want to look at the 1st three characters
here's an idea that allows you to look at the 1st three characters and not get any of those
type conversion notes that you should see in the LOG if you used your original code
***
array prx (3) diag1-diag3;
do i = 1 to 3;
if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1;
***
the CAT function allows you to treat the array elements as character values and the ":" allows
you to just look at the first three characters of various diagnoses ... you can stop the loop
once you find a hit ... the equation
tag = (cat(prx(_n_)) in : ('171' '172'));
produces a 1 if you find a diagnosis and a 0 if you do not
* your data;
data dx;
input diag1-diag3;
datalines;
172100 147790 900000
202509 900000 900000
184490 140190 144390
159900 207000 900000
172110 900000 900000
143590 900000 900000
141490 141390 900000
156400 145580 140190
147390 137230 149390
173530 147790 900000
149600 116200 900000
171900 100880 118500
;
run;
* a suggestion;
data dxplus;
set dx;
array prx(3) diag1-diag3;
do _n_ = 1 to 3 until (tag eq 1);
tag = (cat(prx(_n_)) in : ('171' '172'));
end;
run;
no nasty LOG messages ...
408 data dxplus;
409 set dx;
410 array prx(3) diag1-diag3;
411 do _n_ = 1 to 3 until (tag eq 1);
412 tag = (cat(prx(_n_)) in : ('171' '172'));
413 end;
414 run;
NOTE: There were 12 observations read from the data set WORK.DX.
NOTE: The data set WORK.DXPLUS has 12 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
and the data set...
Obs diag1 diag2 diag3 tag
1 172100 147790 900000 1
2 202509 900000 900000 0
3 184490 140190 144390 0
4 159900 207000 900000 0
5 172110 900000 900000 1
6 143590 900000 900000 0
7 141490 141390 900000 0
8 156400 145580 140190 0
9 147390 137230 149390 0
10 173530 147790 900000 0
11 149600 116200 900000 0
12 171900 100880 118500 1
it also works fine with your 5-character strings
you can leave the ":" in the tag equation or take it out since
you are looking for all 5 characters (works either way)
data dxplus;
set dx;
array prx(3) diag1-diag3;
do _n_ = 1 to 3 until (tag eq 1);
tag = (cat(prx(_n_)) in ('172100', '172110'));
end;
run;
--
Mike Zdeb
U@Albany School of Public Health
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
> The data are numeric:
> Hi All,
>
> I am having some problems selecting a substring of data, and I was hoping someone might have some insight on the problem.
>
> I am trying to select ICD-9 codes. My code works successfully in other datasets, but I am not able to select my observations in the below dataset
and I am not sure why.
>
> My code that works on other datasets is as follows:
>
>
> data z.NMSC;
> set z.namc94;
> tag = 0;
> array prx (3) diag1-diag3;
> do i = 1 to 3;
> if substr (prx {i}, 1, 3) in ('171', '172') then tag = 1;
> end;
> run;
>
> In the dataset described below, the code does not select any observations. However, if I change the code to the below, I select a few
observations, but not the entire set I want:
>
>
> data z.NMSC;
> set z.namc94;
> tag = 0;
> array prx (3) diag1-diag3;
> do i = 1 to 3;
> if prx{i} in ('172100', '172110') then tag = 1;
> end;
> run;
>
> Any help on the problem? As a work around, I could list all possible codes in the (), but there has to be a better way.
>
>
> Dataset:
>
> 26 DIAG1 Num 8
> 27 DIAG2 Num 8
>
> Example of data:
>
> Obs DIAG1 DIAG2 DIAG3
>
> 1 172100 147790 900000 2 202509 900000 900000 3 184490 140190 144390 4 159900
207000 900000 5 172110 900000 900000 6 143590 900000 900000 7 141490 141390
900000 8 156400 145580 140190 9 147390 137230 149390
> 10 173530 147790 900000 11 149600 116200 900000 12 171900 100880 118500
>
>
> Thank you!
>