LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 1998, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 1 May 1998 14:35:57 -0400
Reply-To:     Paul Dorfman <sashole@EARTHLINK.NET>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Paul Dorfman <sashole@EARTHLINK.NET>
Organization: ...
Subject:      Re: separate one field into many...
Content-Type: text/plain; charset=koi8-r

Andy Yuan, in particular, wrote: > > Hi, I have a SAS problem. Please help... > data: field like this.. > > OBS CT_STZIP > > 185 Pompano Beach FL 33064-1049 > 186 Boca Raton FL 33486-5649 > 187 Boca Raton FL 33428-2849 > 188 Deerfield Beach FL 33441-3291 > 189 Coconut Creek FL 33066-1206 > 190 Coconut Creek FL 33066-1233 > 191 Delray Beach FL 33446-2982 > 192 Boca Raton FL 33433-6770 > 193 Delray Beach FL 33484-1432 > 194 Deerfield Beach FL 33442-3364 > 195 Lighthouse Point FL 33064-7304 > 196 Oak Brook IL 60521 > 197 Bethesda MD 20814-4217 > 198 Hong Kong 00000 > 199 Los Angeles CA 90025 > 200 Arlington VA 22201 > 201 England NJ 00000 > 202 Alexandria VA 22304-4034 > > goal: want separate it into 3 fields: city, state, zip > problem: city can be one, two even more word, zip can be 5 digit or 10 digit, > some cases (foreign) no state > make it complicated >

Andy,

In SAS, it is not as complicated as it may seem at the first glance (just try, for the sake of comparison, to code it in Cobol!). Let's first reproduce your data:

data all_rec; input rec & $200.; cards; Pompano Beach FL 33064-1049 Boca Raton FL 33486-5649 Boca Raton FL 33428-2849 Deerfield Beach FL 33441-3291 Coconut Creek FL 33066-1206 Coconut Creek FL 33066-1233 Delray Beach FL 33446-2982 Boca Raton FL 33433-6770 Delray Beach FL 33484-1432 Deerfield Beach FL 33442-3364 Lighthouse Point FL 33064-7304 Oak Brook IL 60521 Bethesda MD 20814-4217 Hong Kong 00000 Los Angeles CA 90025 Arlington VA 22201 England NJ 00000 Alexandria VA 22304-4034 ;

run;

Now, this simple DATA step should get you what you need:

data split(keep=city state zip); set all_rec; rev_rec = left(reverse(compbl(rec))); zip = reverse(scan(rev_rec,1,' ')); if zip = '00000' then do; state = ' '; city = left(reverse(substr(rev_rec,length(zip)+2))); end; else do; state = reverse(scan(rev_rec,2,' ')); city = left(reverse(substr(rev_rec,length(zip)+4))); end; run;

The output produced by the line of code

proc print data=split; run;

should speak for itself:

OBS ZIP STATE CITY

1 33064-1049 FL Pompano Beach 2 33486-5649 FL Boca Raton 3 33428-2849 FL Boca Raton 4 33441-3291 FL Deerfield Beach 5 33066-1206 FL Coconut Creek 6 33066-1233 FL Coconut Creek 7 33446-2982 FL Delray Beach 8 33433-6770 FL Boca Raton 9 33484-1432 FL Delray Beach 10 33442-3364 FL Deerfield Beach 11 33064-7304 FL Lighthouse Point 12 60521 IL Oak Brook 13 20814-4217 MD Bethesda 14 00000 Hong Kong 15 90025 CA Los Angeles 16 22201 VA Arlington 17 00000 England NJ

Note that if a need should arise to validate the input data, the line

if state ne zipstate(put(substr(zip,1,5),5.)) then output BAD_DATA;

would suffice. Of course, you would have to specify the BAD_DATA data set on the DATA statement.

I hope it may help you to deal with the "SAS problem".

Most cordially,

Paul.


Back to: Top of message | Previous page | Main SAS-L page