Date: Fri, 1 May 1998 14:35:57 -0400
Reply-To: Paul Dorfman <sashole@EARTHLINK.NET>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Paul Dorfman <sashole@EARTHLINK.NET>
Organization: ...
Subject: Re: separate one field into many...
Content-Type: text/plain; charset=koi8-r
Andy Yuan, in particular, wrote:
>
> Hi, I have a SAS problem. Please help...
> data: field like this..
>
> OBS CT_STZIP
>
> 185 Pompano Beach FL 33064-1049
> 186 Boca Raton FL 33486-5649
> 187 Boca Raton FL 33428-2849
> 188 Deerfield Beach FL 33441-3291
> 189 Coconut Creek FL 33066-1206
> 190 Coconut Creek FL 33066-1233
> 191 Delray Beach FL 33446-2982
> 192 Boca Raton FL 33433-6770
> 193 Delray Beach FL 33484-1432
> 194 Deerfield Beach FL 33442-3364
> 195 Lighthouse Point FL 33064-7304
> 196 Oak Brook IL 60521
> 197 Bethesda MD 20814-4217
> 198 Hong Kong 00000
> 199 Los Angeles CA 90025
> 200 Arlington VA 22201
> 201 England NJ 00000
> 202 Alexandria VA 22304-4034
>
> goal: want separate it into 3 fields: city, state, zip
> problem: city can be one, two even more word, zip can be 5 digit or 10 digit,
> some cases (foreign) no state
> make it complicated
>
Andy,
In SAS, it is not as complicated as it may seem at the first glance (just
try, for the sake of comparison, to code it in Cobol!). Let's first
reproduce your data:
data all_rec;
input rec & $200.;
cards;
Pompano Beach FL 33064-1049
Boca Raton FL 33486-5649
Boca Raton FL 33428-2849
Deerfield Beach FL 33441-3291
Coconut Creek FL 33066-1206
Coconut Creek FL 33066-1233
Delray Beach FL 33446-2982
Boca Raton FL 33433-6770
Delray Beach FL 33484-1432
Deerfield Beach FL 33442-3364
Lighthouse Point FL 33064-7304
Oak Brook IL 60521
Bethesda MD 20814-4217
Hong Kong 00000
Los Angeles CA 90025
Arlington VA 22201
England NJ 00000
Alexandria VA 22304-4034
;
run;
Now, this simple DATA step should get you what you need:
data split(keep=city state zip);
set all_rec;
rev_rec = left(reverse(compbl(rec)));
zip = reverse(scan(rev_rec,1,' '));
if zip = '00000' then do;
state = ' ';
city = left(reverse(substr(rev_rec,length(zip)+2)));
end;
else do;
state = reverse(scan(rev_rec,2,' '));
city = left(reverse(substr(rev_rec,length(zip)+4)));
end;
run;
The output produced by the line of code
proc print data=split; run;
should speak for itself:
OBS ZIP STATE CITY
1 33064-1049 FL Pompano Beach
2 33486-5649 FL Boca Raton
3 33428-2849 FL Boca Raton
4 33441-3291 FL Deerfield Beach
5 33066-1206 FL Coconut Creek
6 33066-1233 FL Coconut Creek
7 33446-2982 FL Delray Beach
8 33433-6770 FL Boca Raton
9 33484-1432 FL Delray Beach
10 33442-3364 FL Deerfield Beach
11 33064-7304 FL Lighthouse Point
12 60521 IL Oak Brook
13 20814-4217 MD Bethesda
14 00000 Hong Kong
15 90025 CA Los Angeles
16 22201 VA Arlington
17 00000 England NJ
Note that if a need should arise to validate the input data, the line
if state ne zipstate(put(substr(zip,1,5),5.)) then output BAD_DATA;
would suffice. Of course, you would have to specify the BAD_DATA data set
on the DATA statement.
I hope it may help you to deal with the "SAS problem".
Most cordially,
Paul.