|
Dear SAS-L-ers,
Roy Pardee posted this interesting problem:
> I've got a bunch of text files I need to read in, each of
> which has a 'header row' of var names that I need sas to skip
> over. So I wrote the
> following:
>
> * ================================== ;
> filename oic 'N:\oncology_infusion_center\IP401B-*.txt' ;
>
> data gnu ;
> infile oic firstobs = 2 ;
> input
> @1 consumno $char8.
> @12 acct $char9.
> @22 attending_provider $char30.
> @53 service_date date11.
> @113 quantity 3.0
> <etc.>
> ;
> run ;
> * ================================== ;
>
> This works just fine if I edit that filename statement so it
> only refers to a single file. But if I leave it as written,
> I see things like:
>
> NOTE: Invalid data for service_date in line 1985 53-63.
> NOTE: Invalid data for quantity in line 1985 113-115.
> NOTE: Invalid data errors for file OIC occurred outside
> the printed range.
> NOTE: Increase available buffer lines with the INFILE n= option.
>
> (I'm confused by the literal data that sas prints out around
> those NOTEs--some of it looks like actual data, and some of
> it looks like the header row.)
>
> I've tried removing the specific file complained about &
> re-running, only to have SAS start complaining about a
> different file. This leads me to the theory that the
> FIRSTOBS = 2 option is only being applied to the first file.
>
> Is that plausible? And more to the point--how do I get sas
> to skip line 1 of every file?
>
Roy, you are correct; FIRSTOBS is only relevant to the first file in
your concatenation string of files. However, all is not lost! Try out
the EOV option of the INFILE statement to identify the first record of
subsequent concatenated files.
Your example SAS program would look something like this:
filename oic 'N:\oncology_infusion_center\IP401B-*.txt' ;
data gnu ;
infile oic FIRSTREC=2 EOV=FIRSTREC;
input
@1 consumno $char8. @
;
if FIRSTREC = 1 then do; /* Set EOV back to zero and discard first
record of file */
FIRSTREC = 0;
delete;
end;
else do; /* Process data in records 1 through N */
input
@12 acct $char9.
@22 attending_provider $char30.
@53 service_date date11.
@113 quantity 3.0
<etc.>
;
end;
run ;
In the code, above, we create FIRSTREC as the EOV "pointer variable".
We read the first data field on the record, and test for whether this is
the first record of the file. If it is (and it thus shall be for the
2nd through nth files in the concatenation), then the record is
disgarded. (Note: the trailing @ in the first INPUT statement is there
to keep the record in case we need it). If it is not the first record
of the file, then we grab the other fields that we need. Simple, neat,
and not a great departure from what you are doing!
Roy, best of luck with reading your flat files!
I hope that this suggestion proves helpful now, and in the future!
Of course, all of these opinions and insights are my own, and do not
reflect those of my organization or my associates. All SAS code and/or
methodologies specified in this posting are for illustrative purposes
only and no warranty is stated or implied as to their accuracy or
applicability. People deciding to use information in this posting do so
at their own risk.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Michael A. Raithel
"The man who wrote the book on performance"
E-mail: MichaelRaithel@westat.com
Author: Tuning SAS Applications in the MVS Environment
Author: Tuning SAS Applications in the OS/390 and z/OS Environments,
Second Edition
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172
Author: The Complete Guide to SAS Indexes
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Failure is not the only punishment for laziness; there is also the
success of others. - Jules Renard
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|