LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2004, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 28 Jan 2004 08:45:41 -0500
Reply-To:     jsl <nospam@NOSPAM.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         jsl <nospam@NOSPAM.COM>
Organization: University of Georgia
Subject:      Re: Challenging Data Parsing Program - Multiple records crammed
              into one

Awesome and thanks. A colleague of mine has already tested your code and says it works great -- much appreciated! Jim

"Chang Y. Chung" <chang_y_chung@HOTMAIL.COM> wrote in message news:200401272011.i0RKBnR06903@listserv.cc.uga.edu... > On Tue, 27 Jan 2004 13:01:14 -0500, jsl <nospam@NOSPAM.COM> wrote: > > >per request, the spreadsheet can be had here: > >http://www.terry.uga.edu/people/jlinck/files/TestData.xls > > > > > >"jsl" <nospam@nospam.com> wrote in message > >news:bv5r9k$pmf$1@cronkite.cc.uga.edu... > >> I have attached a small Excel spreadsheet with some data I am trying to > >> parse using SAS. Having no luck so far. The problem is that each Excel > >> "record" contains multiple records of data crammed into one cell. For > >> example, in the first record, for 1-800 Contacts, their are several > NAMES, > >> each with corresponding AGES, WAGES, and TITLES. The > different "records" > >> within each cell are separated by some sort of carriage return character > >so > >> I think I can properly line them up if I can read the data properly. My > >> objective is to parse out each record into several records, one for each > >> name, making sure the AGES, WAGES and TITLES go with the correct name. > I > >> have tried everything I can think of -- the only success I've add, is > >> "opening" each cell individually and then pasting into another sheet. > >This, > >> of course, is not a practical solution since the final dataset will have > >> about 25,000 of these records, with about 10 fields each that have this > >> problem. Even with an Excel macro, it'd be trouble since the file will > be > >> too large for Excel too handle (and way to slow anyway). > >> > >> Anybody have any suggestions? Thanks! > >> Jim > > Hi, Jim, > > If you have sas/access OLE DB, then try the following. Modify the length > and format as you see fit -- I just assumed that each name is at max 100 > characters long. It just imports each cell as a character variable and > seperate each line out by looking for '0A'x character within. In some > cases, you have different number of 'lines' of information for different > columns. I just keep generating observations as long as at least one > column has some values in it. Remember to close the excel workbook before > you submit this code -- otherwise sas will say something like "file not > found." Good luck. > > Cheers, > Chang > > <sasl:code> > libname e oledb > provider="Microsoft.Jet.OLEDB.4.0" > properties=('Data Source'='d:\chang\g\testData.xls') > provider_string="Excel 8.0;hdr=no" > preserve_tab_names=yes > ; > %put &sysdbmsg; /* any message returned? */ > > data testdata; > set e.'Sheet1$a2:f10'n( > dbmax_text=32767 > dbsastype=( > f1='char(10)' > f2='char(100)' > f3='char(32767)' > f4='char(32767)' > f5='char(32767)' > f6='char(32767)' > ) > ); > length code $10. coname $100. pId 8. name $100. > age 8. wage 8. title $100.; > format wage $12.0; > keep code--title; > code =f1; > coname =f2; > pId = 1; > name = scan(f3, pId, '0A'x); > age = input(scan(f4, pId, '0A'x),??best.); > wage = input(scan(f5, pId, '0A'x),??dollar12.); > title = scan(f6, pId, '0A'x); > do while (not (missing(name) & missing(age) > & missing(wage) & missing(title))); > output; > pId + 1; > name = scan(f3, pId, '0A'x); > age = input(scan(f4, pId, '0A'x),??best.); > wage = input(scan(f5, pId, '0A'x),??dollar12.); > title = scan(f6, pId, '0A'x); > end; > run; > proc print data=testData; > format code $10. coname $10. pId 8. name $10. > age 8. wage dollar12. title $10.; > run; > > libname e clear; /* prevent sas from locking the excel file up */ > /* on lst > Obs code coname pId name age wage title > > 1 C000033186 1-800 Cont 1 COON, JONA 33 $251,979 PRESIDENT, > 2 C000033186 1-800 Cont 2 NICHOLS, J 42 $215,691 VICE PRESI > 3 C000033186 1-800 Cont 3 YACKTMAN, 33 . DIRECTOR ( > 4 C000033186 1-800 Cont 4 KATZMAN, D 43 . DIRECTOR ( > 5 C000033186 1-800 Cont 5 BOGGS, THO 62 . DIRECTOR > 6 C000033186 1-800 Cont 6 BUTLER, ED 58 . DIRECTOR ( > 7 C000033186 1-800 Cont 7 KNIGHT, BR 44 . DIRECTOR > 8 C000033186 1-800 Cont 8 SUBOTKY, J 32 . DIRECTOR > 9 C000065081 1-800-Flow 1 MCCANN, JA 52 $1,210,000 CHAIRMAN O > 10 C000065081 1-800-Flow 2 MCCANN, CH 42 $423,500 PRESIDENT, > 11 C000065081 1-800-Flow 3 CALCANO, L 40 . DIRECTOR ( > 12 C000065081 1-800-Flow 4 CONEFRY, J 59 . DIRECTOR ( > 13 C000065081 1-800-Flow 5 ELMORE, LE 51 . DIRECTOR ( > ... > 50 C000020823 4Kids Ente 5 GOLDSTEIN, 61 . DIRECTOR ( > 51 C000020823 4Kids Ente 6 GARRITY, J 47 . CHIEF OPER > 52 C000004213 7-Eleven I 1 ITO, MASAT 70 . CHAIRMAN O > 53 C000004213 7-Eleven I 2 SUZUKI, TO 47 . VICE CHAIR > */ > </sasl:code>


Back to: Top of message | Previous page | Main SAS-L page