Date: Wed, 28 Jan 2004 08:45:41 -0500
Reply-To: jsl <nospam@NOSPAM.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: jsl <nospam@NOSPAM.COM>
Organization: University of Georgia
Subject: Re: Challenging Data Parsing Program - Multiple records crammed
into one
Awesome and thanks. A colleague of mine has already tested your code and
says it works great -- much appreciated!
Jim
"Chang Y. Chung" <chang_y_chung@HOTMAIL.COM> wrote in message
news:200401272011.i0RKBnR06903@listserv.cc.uga.edu...
> On Tue, 27 Jan 2004 13:01:14 -0500, jsl <nospam@NOSPAM.COM> wrote:
>
> >per request, the spreadsheet can be had here:
> >http://www.terry.uga.edu/people/jlinck/files/TestData.xls
> >
> >
> >"jsl" <nospam@nospam.com> wrote in message
> >news:bv5r9k$pmf$1@cronkite.cc.uga.edu...
> >> I have attached a small Excel spreadsheet with some data I am trying
to
> >> parse using SAS. Having no luck so far. The problem is that each
Excel
> >> "record" contains multiple records of data crammed into one cell. For
> >> example, in the first record, for 1-800 Contacts, their are several
> NAMES,
> >> each with corresponding AGES, WAGES, and TITLES. The
> different "records"
> >> within each cell are separated by some sort of carriage return
character
> >so
> >> I think I can properly line them up if I can read the data properly.
My
> >> objective is to parse out each record into several records, one for
each
> >> name, making sure the AGES, WAGES and TITLES go with the correct name.
> I
> >> have tried everything I can think of -- the only success I've add, is
> >> "opening" each cell individually and then pasting into another sheet.
> >This,
> >> of course, is not a practical solution since the final dataset will
have
> >> about 25,000 of these records, with about 10 fields each that have this
> >> problem. Even with an Excel macro, it'd be trouble since the file will
> be
> >> too large for Excel too handle (and way to slow anyway).
> >>
> >> Anybody have any suggestions? Thanks!
> >> Jim
>
> Hi, Jim,
>
> If you have sas/access OLE DB, then try the following. Modify the length
> and format as you see fit -- I just assumed that each name is at max 100
> characters long. It just imports each cell as a character variable and
> seperate each line out by looking for '0A'x character within. In some
> cases, you have different number of 'lines' of information for different
> columns. I just keep generating observations as long as at least one
> column has some values in it. Remember to close the excel workbook before
> you submit this code -- otherwise sas will say something like "file not
> found." Good luck.
>
> Cheers,
> Chang
>
> <sasl:code>
> libname e oledb
> provider="Microsoft.Jet.OLEDB.4.0"
> properties=('Data Source'='d:\chang\g\testData.xls')
> provider_string="Excel 8.0;hdr=no"
> preserve_tab_names=yes
> ;
> %put &sysdbmsg; /* any message returned? */
>
> data testdata;
> set e.'Sheet1$a2:f10'n(
> dbmax_text=32767
> dbsastype=(
> f1='char(10)'
> f2='char(100)'
> f3='char(32767)'
> f4='char(32767)'
> f5='char(32767)'
> f6='char(32767)'
> )
> );
> length code $10. coname $100. pId 8. name $100.
> age 8. wage 8. title $100.;
> format wage $12.0;
> keep code--title;
> code =f1;
> coname =f2;
> pId = 1;
> name = scan(f3, pId, '0A'x);
> age = input(scan(f4, pId, '0A'x),??best.);
> wage = input(scan(f5, pId, '0A'x),??dollar12.);
> title = scan(f6, pId, '0A'x);
> do while (not (missing(name) & missing(age)
> & missing(wage) & missing(title)));
> output;
> pId + 1;
> name = scan(f3, pId, '0A'x);
> age = input(scan(f4, pId, '0A'x),??best.);
> wage = input(scan(f5, pId, '0A'x),??dollar12.);
> title = scan(f6, pId, '0A'x);
> end;
> run;
> proc print data=testData;
> format code $10. coname $10. pId 8. name $10.
> age 8. wage dollar12. title $10.;
> run;
>
> libname e clear; /* prevent sas from locking the excel file up */
> /* on lst
> Obs code coname pId name age wage title
>
> 1 C000033186 1-800 Cont 1 COON, JONA 33 $251,979 PRESIDENT,
> 2 C000033186 1-800 Cont 2 NICHOLS, J 42 $215,691 VICE PRESI
> 3 C000033186 1-800 Cont 3 YACKTMAN, 33 . DIRECTOR (
> 4 C000033186 1-800 Cont 4 KATZMAN, D 43 . DIRECTOR (
> 5 C000033186 1-800 Cont 5 BOGGS, THO 62 . DIRECTOR
> 6 C000033186 1-800 Cont 6 BUTLER, ED 58 . DIRECTOR (
> 7 C000033186 1-800 Cont 7 KNIGHT, BR 44 . DIRECTOR
> 8 C000033186 1-800 Cont 8 SUBOTKY, J 32 . DIRECTOR
> 9 C000065081 1-800-Flow 1 MCCANN, JA 52 $1,210,000 CHAIRMAN O
> 10 C000065081 1-800-Flow 2 MCCANN, CH 42 $423,500 PRESIDENT,
> 11 C000065081 1-800-Flow 3 CALCANO, L 40 . DIRECTOR (
> 12 C000065081 1-800-Flow 4 CONEFRY, J 59 . DIRECTOR (
> 13 C000065081 1-800-Flow 5 ELMORE, LE 51 . DIRECTOR (
> ...
> 50 C000020823 4Kids Ente 5 GOLDSTEIN, 61 . DIRECTOR (
> 51 C000020823 4Kids Ente 6 GARRITY, J 47 . CHIEF OPER
> 52 C000004213 7-Eleven I 1 ITO, MASAT 70 . CHAIRMAN O
> 53 C000004213 7-Eleven I 2 SUZUKI, TO 47 . VICE CHAIR
> */
> </sasl:code>
|