| Date: | Fri, 7 Jul 2000 15:28:12 -0400 |
| Reply-To: | Howard Schreier <Howard_Schreier@ITA.DOC.GOV> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Howard Schreier <Howard_Schreier@ITA.DOC.GOV> |
| Subject: | Re: file input statement |
|---|
The code which Louis posted seems to come pretty close. I don't think it
reassembles the title fragments however.
A one-pass solution does seem possible if we have an upper bound on the
number of authors.
Aside: yesterday I suggested that Paula probably did not need a RETAIN
statement in a DATA step featuring use of trailing "@" signs because she
did not seem to be inheriting or accumulating data from record to
record. Here is a contrasting example, since (1) the "AU" and "TI" flags
implicitly apply to follow-on records and (2) the title fragments must
be accumulated.
However, I suspect that the data volume here is pretty small, so I'm
going to abandon concern with efficiency in favor of an approach which
disentangles the problems at hand.
First I should mention that I took Lynn's illustrative input records and
loaded them into a couple of text files with extension "aty" in my
C:\SAS directory. The purpose was to allow a simple wildcard-based
INFILE. If the real problem does not support this approach it may be
necessary to enumerate the files in a list and use FILEVAR= to sweep
through them.
Anyway, my first objective was to simply fill in the implicit "AU" and
"TI" indicators and also assign sequential ID numbers to the items:
data filldown;
infile '*.aty';
input rectype $ 1-2 rest:&$50.;
select (rectype);
when ('AU') idnum ++1;
when (' ') rectype = remember;
otherwise;
end;
* Read following 3 statements aloud 3 times quickly :-) ;
remember = rectype;
retain remember;
run;
Now everything is in a SAS data set designed to support WHERE filtering
and BY processing.
PROC TRANSPOSE can directly build the author arrays:
proc transpose data=filldown(where=(rectype='AU'))
out=au(drop=_name_) prefix=au;
by idnum;
var rest;
run;
A DATA step with an explicit loop simplifies housekeeping for the
process of reassembling the titles:
data ti;
keep idnum title;
length title $ 200;
do until (last.idnum);
set filldown(where = (rectype='TI'));
by idnum;
title = left(trim(title)||' '||rest);
end;
run;
The years should be the simplest, just a character-to-numeric
conversion. But presumably there should be just one year record per
item, so it's good to check for exceptions:
data yr;
set filldown(where = (rectype='YR'));
by idnum;
keep idnum yr;
if not first.idnum then do;
put 'Unexpected follow-on to YR record' / _all_;
delete;
end;
yr = input(rest,4.);
run;
Another kind of exception is a record of unknown type. A separate step
can do a check:
data _null_;
set filldown;
where not (rectype in ('AU','TI','YR'));
put 'Unknown record type' / _all_;
run;
Finally, combine the components:
data autiyr;
merge au ti yr;
by idnum;
run;
Results:
IDNUM AU1 AU2 AU3 TITLE
YR
1 Person1 person2 person3 What I know about SAS and procs
2000
2 Person1 What I don't know about SAS and procs
1999
On Thu, 6 Jul 2000 17:54:32 +0200, Laproi, L.G.E (Louis)
<Louis.Laproi@REAAL.NL> wrote:
>HI,
>
>Would this source do?
>
>
>cull
>
>*=========================================================;
>
>DATA xxx;
> INPUT label $ 1-2
> value $ 4-30;
>
> RETAIN booknum recnum lastlabl;
>
> * Recnum = seq number within label;
> * booknum = seq number of the book;
> * lastlabl = last known label;
>
>
> * First record?;
> IF _N_ EQ 1 THEN
> DO;
> booknum = 1;
> END;
>
> * make sure you have labels;
> IF label EQ ' ' THEN
> DO;
> * Use the last known label;
> label = lastlabl;
> recnum = recnum + 1;
> END;
> ELSE
> DO;
> * Remember this label for future use;
> lastlabl = label;
> recnum = 1;
> END;
>
> * Make new label;
> nlabel = label || PUT(recnum,z2.);
>
> * After every book add 1 to book number;
> IF label EQ 'YR' THEN
> DO;
> OUTPUT;
> booknum = booknum + 1;
> END;
> ELSE
> OUTPUT;
>
>CARDS;
>AU Person1
> person2
> person3
>TI What I know about SAS
> and procs
>YR 2000
>AU Person1
>TI What I don't know about SAS
> and procs
>YR 1999
>;
>RUN;
>
>
>PROC PRINT;
> RUN;
>
>
>
>
>PROC TRANSPOSE DATA=xxx OUT=yyy ;
> BY booknum;
> ID nlabel;
> VAR value;
> RUN;
>
>PROC PRINT;
> RUN;
>
>-----Oorspronkelijk bericht-----
>Van: Foster-Johnson [mailto:Foster-Johnson@DARTMOUTH.EDU]
>Verzonden: donderdag 6 juli 2000 16:52
>Aan: SAS-L@LISTSERV.UGA.EDU
>Onderwerp: file input statement
>
>
>I'm trying to input a series of txt files (w/ hard returns)
>into a SAS dataset;
>
>the files are structured as follows
>
>AU Person1
> person2
> person3
>TI What I know about SAS
> and procs
>YR 2000
>.
>.
>.
>AU Person1
>TI What I don't know about SAS
> and procs
>YR 1999
>
>
>The files may have 1 to xx number of authors, so I'm trying
>to create code that will read in values where appropriate
>and leave blank otherwise.
>
>I have written some incredibly inelegant code using a
>combination of lags and conditional statements on the label
>var (AU, TI) to read in au1-au5 (for example see below) and
>the title.
>
>label au1 au2 au3 title1
>AU p1
> p2
> p3
>TI what I know..
>AU p1
>TI What I don't
>
>My difficulty is this: when I use a retain statement (so I
>can output the final record of each set of au-yr data), the
>records which only have 1 author retain values from previous
>records, thereby keeping values for other authors. I would
>like to have the code to reset au1-au5 to a blank value each
>time the label value equals a new au.
>
>Also, if anyone has any suggestions for a more efficient
>(and elegant) method to read in these vars.. I'd appreciate
>input--
>
>Thanks in advance
>
>Lynn Foster-Johnson
|