Date: Sun, 22 May 2005 21:51:10 -0400
Reply-To: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Subject: Re: I wish SAS would allow a *string* as a field delimiter
Scott Bass wrote:
> Hi,
>
> (for brevity, this post is a simplified example of a more complex
> problem...)
>
> Summary: Is there a way to make SAS accept a *string* of characters
> as a field delimiter when reading a flat file? From the docs, it
> looks like NO (rats!) but thought I'd ask.
>
> Details: I have to read in a file with embedded carriage returns in
> the fields. I note Richard DeVenezia's post on how to do this:
> http://listserv.uga.edu/cgi-bin/wa?A2=ind0206D&L=sas-l&P=R33076.
>
> Say this is the sample file:
>
> This is record 1, VAR1
>
> With one embedded carriage return
> =====
> This is record 1, VAR2
>
>
> With two embedded carriage returns
> Oops, and a =line= containing a single = character
> =====
> This is record 1, VAR3
>
>
>
> With three embedded carriage returns
> =====
> This is record 2, VAR1
>
> With one embedded carriage return
> =====
> =====
> This is record 2, VAR3
>
>
>
> With three embedded carriage returns
> =====
>
> What I want is two observations, three vars, embedded CRs in the
> vars. For obs 2, var2 is missing.
>
> I *wish* I could say:
>
> data foo;
> infile 'c:\temp\arbitrary.txt' recfm=n dsd dlm='====='; /* or
> perhaps "\n=====\n" in hex */
>
> length var1-var3 $5000;
>
> input var1-var3;
>
> ... any additional processing ... ;
> run;
>
> but dlm only accepts a single character for the delimiter. Bummer.
>
> Rationale: I'd like the input file to be "human readable". If I
> change the delimiter to Tab, #, etc, the file is hard to "human"
> read. If I could specify the exact *string* I want to delimit
> fields, it would be easier to accomplish this.
>
> (The actual "file" is a PIPE to a Perl script. Another (ugly) option
> is a command line switch to have two output types: one for SAS
> import, and the other for STDOUT)
>
> Any bright ideas, O gurus of SAS-L?
>
> Regards,
> Scott
Your input does have a consistency that can be taken advantage of; namely,
that the field separator ==== occurs by itself on a line. In the sample
code ==== is used as a boundary landmark.
----------------------------
filename inhuman temp;
%let seed = 161803399;
data _null_;
file inhuman;
do row = 1 to 2;
do col = 1 to 3;
put row=/col=;
do char = 1 to row*col;
put char=;
end;
put '====';
end;
end;
run;
options noxwait noxsync xmin;
*x start "notepad" notepad %sysfunc(pathname(inhuman));
data fruit_rollup;
infile inhuman;
array myvar[3] $5000;
col = 0;
do until (col = 3);
col+1;
p = 1;
input;
do while (_infile_ ne '====');
* append the just read line to the current column value;
substr(myvar[col],p) = _infile_ ;
p + length(_infile_);
* tack on a newline;
substr(myvar[col],p) = byte(10);
p + 1;
input;
end;
* untack the final newline;
if p > 1 then
substr (myvar[col],p-1) = ' ' ;
end;
* three columns have been got, and thus a row has been made;
output;
keep myvar1-myvar3;
run;
----------------------------
If the landmark was allowed to occur midline, then the processing could be
more complicated, but generally the same approach.
Richard A. DeVenezia
http://www.devenezia.com/
|