Date: Wed, 2 May 2007 08:15:32 -0700
Reply-To: z <gzuckier@SNAIL-MAIL.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: z <gzuckier@SNAIL-MAIL.NET>
Organization: http://groups.google.com
Subject: Re: Cleaning ICD9 codes
In-Reply-To: <1178049878.018281.194260@y5g2000hsa.googlegroups.com>
Content-Type: text/plain; charset="iso-8859-1"
On May 1, 4:04 pm, jofo <joey.fo...@gmail.com> wrote:
> Hi All,
>
> I have some ICD9 codes that I would like to standardize before I
> search through.
>
> Some have decimals, most do not.
>
> I would like to remove the decimal or space in place.
> I tried this...
>
> data nodot;
> set test;
> array dx{8} $ dx1-dx8;
> array dxcode{8};
>
> do i=1 to 8;
> if length(dx{i})then do;
> dxcode{i} = prxchange('s/(?:(.+)((\.)|(\s))(\d+))/$1$3/',0,
> dx{i});
> end;
> end;
> run;
>
> This fails miserably though.
>
> Anyone know of a way to remove a dot or space from the middle of the
> code?
> Anyone know if this is a bad idea and should be avoided?
>
> Thanks.
My experience, you get pretty good results by using a character format
(so that it's left aligned, and to handle V codes, etc.), then using
compress function to eliminate the period. In a character format, the
period becomes redundant to the length of the code. I.e., 001 is
distinct from 001.0 even if you eliminate the period.
|