LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2005, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sun, 22 May 2005 21:51:10 -0400
Reply-To:     "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Subject:      Re: I wish SAS would allow a *string* as a field delimiter
Comments: To: sas-l@uga.edu

Scott Bass wrote: > Hi, > > (for brevity, this post is a simplified example of a more complex > problem...) > > Summary: Is there a way to make SAS accept a *string* of characters > as a field delimiter when reading a flat file? From the docs, it > looks like NO (rats!) but thought I'd ask. > > Details: I have to read in a file with embedded carriage returns in > the fields. I note Richard DeVenezia's post on how to do this: > http://listserv.uga.edu/cgi-bin/wa?A2=ind0206D&L=sas-l&P=R33076. > > Say this is the sample file: > > This is record 1, VAR1 > > With one embedded carriage return > ===== > This is record 1, VAR2 > > > With two embedded carriage returns > Oops, and a =line= containing a single = character > ===== > This is record 1, VAR3 > > > > With three embedded carriage returns > ===== > This is record 2, VAR1 > > With one embedded carriage return > ===== > ===== > This is record 2, VAR3 > > > > With three embedded carriage returns > ===== > > What I want is two observations, three vars, embedded CRs in the > vars. For obs 2, var2 is missing. > > I *wish* I could say: > > data foo; > infile 'c:\temp\arbitrary.txt' recfm=n dsd dlm='====='; /* or > perhaps "\n=====\n" in hex */ > > length var1-var3 $5000; > > input var1-var3; > > ... any additional processing ... ; > run; > > but dlm only accepts a single character for the delimiter. Bummer. > > Rationale: I'd like the input file to be "human readable". If I > change the delimiter to Tab, #, etc, the file is hard to "human" > read. If I could specify the exact *string* I want to delimit > fields, it would be easier to accomplish this. > > (The actual "file" is a PIPE to a Perl script. Another (ugly) option > is a command line switch to have two output types: one for SAS > import, and the other for STDOUT) > > Any bright ideas, O gurus of SAS-L? > > Regards, > Scott

Your input does have a consistency that can be taken advantage of; namely, that the field separator ==== occurs by itself on a line. In the sample code ==== is used as a boundary landmark.

---------------------------- filename inhuman temp;

%let seed = 161803399;

data _null_; file inhuman; do row = 1 to 2; do col = 1 to 3; put row=/col=; do char = 1 to row*col; put char=; end; put '===='; end; end; run;

options noxwait noxsync xmin;

*x start "notepad" notepad %sysfunc(pathname(inhuman));

data fruit_rollup;

infile inhuman;

array myvar[3] $5000;

col = 0;

do until (col = 3); col+1; p = 1; input; do while (_infile_ ne '===='); * append the just read line to the current column value; substr(myvar[col],p) = _infile_ ; p + length(_infile_); * tack on a newline; substr(myvar[col],p) = byte(10); p + 1; input; end; * untack the final newline; if p > 1 then substr (myvar[col],p-1) = ' ' ; end;

* three columns have been got, and thus a row has been made; output;

keep myvar1-myvar3; run; ----------------------------

If the landmark was allowed to occur midline, then the processing could be more complicated, but generally the same approach.

Richard A. DeVenezia http://www.devenezia.com/


Back to: Top of message | Previous page | Main SAS-L page