LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2000, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 7 Jul 2000 15:28:12 -0400
Reply-To:   Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Subject:   Re: file input statement

The code which Louis posted seems to come pretty close. I don't think it reassembles the title fragments however.

A one-pass solution does seem possible if we have an upper bound on the number of authors.

Aside: yesterday I suggested that Paula probably did not need a RETAIN statement in a DATA step featuring use of trailing "@" signs because she did not seem to be inheriting or accumulating data from record to record. Here is a contrasting example, since (1) the "AU" and "TI" flags implicitly apply to follow-on records and (2) the title fragments must be accumulated.

However, I suspect that the data volume here is pretty small, so I'm going to abandon concern with efficiency in favor of an approach which disentangles the problems at hand.

First I should mention that I took Lynn's illustrative input records and loaded them into a couple of text files with extension "aty" in my C:\SAS directory. The purpose was to allow a simple wildcard-based INFILE. If the real problem does not support this approach it may be necessary to enumerate the files in a list and use FILEVAR= to sweep through them.

Anyway, my first objective was to simply fill in the implicit "AU" and "TI" indicators and also assign sequential ID numbers to the items:

data filldown; infile '*.aty'; input rectype $ 1-2 rest:&$50.; select (rectype); when ('AU') idnum ++1; when (' ') rectype = remember; otherwise; end; * Read following 3 statements aloud 3 times quickly :-) ; remember = rectype; retain remember; run;

Now everything is in a SAS data set designed to support WHERE filtering and BY processing.

PROC TRANSPOSE can directly build the author arrays:

proc transpose data=filldown(where=(rectype='AU')) out=au(drop=_name_) prefix=au; by idnum; var rest; run;

A DATA step with an explicit loop simplifies housekeeping for the process of reassembling the titles:

data ti; keep idnum title; length title $ 200; do until (last.idnum); set filldown(where = (rectype='TI')); by idnum; title = left(trim(title)||' '||rest); end; run;

The years should be the simplest, just a character-to-numeric conversion. But presumably there should be just one year record per item, so it's good to check for exceptions:

data yr; set filldown(where = (rectype='YR')); by idnum; keep idnum yr; if not first.idnum then do; put 'Unexpected follow-on to YR record' / _all_; delete; end; yr = input(rest,4.); run;

Another kind of exception is a record of unknown type. A separate step can do a check:

data _null_; set filldown; where not (rectype in ('AU','TI','YR')); put 'Unknown record type' / _all_; run;

Finally, combine the components:

data autiyr; merge au ti yr; by idnum; run;

Results:

IDNUM AU1 AU2 AU3 TITLE YR

1 Person1 person2 person3 What I know about SAS and procs 2000 2 Person1 What I don't know about SAS and procs 1999 On Thu, 6 Jul 2000 17:54:32 +0200, Laproi, L.G.E (Louis) <Louis.Laproi@REAAL.NL> wrote:

>HI, > >Would this source do? > > >cull > >*=========================================================; > >DATA xxx; > INPUT label $ 1-2 > value $ 4-30; > > RETAIN booknum recnum lastlabl; > > * Recnum = seq number within label; > * booknum = seq number of the book; > * lastlabl = last known label; > > > * First record?; > IF _N_ EQ 1 THEN > DO; > booknum = 1; > END; > > * make sure you have labels; > IF label EQ ' ' THEN > DO; > * Use the last known label; > label = lastlabl; > recnum = recnum + 1; > END; > ELSE > DO; > * Remember this label for future use; > lastlabl = label; > recnum = 1; > END; > > * Make new label; > nlabel = label || PUT(recnum,z2.); > > * After every book add 1 to book number; > IF label EQ 'YR' THEN > DO; > OUTPUT; > booknum = booknum + 1; > END; > ELSE > OUTPUT; > >CARDS; >AU Person1 > person2 > person3 >TI What I know about SAS > and procs >YR 2000 >AU Person1 >TI What I don't know about SAS > and procs >YR 1999 >; >RUN; > > >PROC PRINT; > RUN; > > > > >PROC TRANSPOSE DATA=xxx OUT=yyy ; > BY booknum; > ID nlabel; > VAR value; > RUN; > >PROC PRINT; > RUN; > >-----Oorspronkelijk bericht----- >Van: Foster-Johnson [mailto:Foster-Johnson@DARTMOUTH.EDU] >Verzonden: donderdag 6 juli 2000 16:52 >Aan: SAS-L@LISTSERV.UGA.EDU >Onderwerp: file input statement > > >I'm trying to input a series of txt files (w/ hard returns) >into a SAS dataset; > >the files are structured as follows > >AU Person1 > person2 > person3 >TI What I know about SAS > and procs >YR 2000 >. >. >. >AU Person1 >TI What I don't know about SAS > and procs >YR 1999 > > >The files may have 1 to xx number of authors, so I'm trying >to create code that will read in values where appropriate >and leave blank otherwise. > >I have written some incredibly inelegant code using a >combination of lags and conditional statements on the label >var (AU, TI) to read in au1-au5 (for example see below) and >the title. > >label au1 au2 au3 title1 >AU p1 > p2 > p3 >TI what I know.. >AU p1 >TI What I don't > >My difficulty is this: when I use a retain statement (so I >can output the final record of each set of au-yr data), the >records which only have 1 author retain values from previous >records, thereby keeping values for other authors. I would >like to have the code to reset au1-au5 to a blank value each >time the label value equals a new au. > >Also, if anyone has any suggestions for a more efficient >(and elegant) method to read in these vars.. I'd appreciate >input-- > >Thanks in advance > >Lynn Foster-Johnson


Back to: Top of message | Previous page | Main SAS-L page