Date: Thu, 31 Mar 2011 07:06:34 -0400
Reply-To: "Kirby, Ted" <ted.kirby@LEWIN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Kirby, Ted" <ted.kirby@LEWIN.COM>
Subject: Re: Replacing Double Tabs with <tab> <space. <tab>
In-Reply-To: A<4CE1D9E650D14720A60FE036A98929EF@D1871RB1>
Content-Type: text/plain; charset="us-ascii"
I am using 9.1.3
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Nat
Wooding
Sent: Thursday, March 31, 2011 6:20 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Replacing Double Tabs with <tab> <space. <tab>
Ted
Are you using SAS 9.2 or 9.1.3 or, horrors, V8?
If you have 9.2, you could read the file directly using the Excel
Libname engine and avoid these problems. The Excel sheet would appear to
SAS as if it were a SAS data set.
Nat Wooding
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Kirby, Ted
Sent: Thursday, March 31, 2011 6:11 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Replacing Double Tabs with <tab> <space. <tab>
I have a text file that is generated by creating a tab-delimited file
from an Excel(r) worksheet. The cells of this worksheet can contain
paragraphs of text, complete with periods, commas, and even some special
characters such as asterisks. To create a SAS dataset from this
Excel(r) worksheet, I use the _INFILE_ variable and the SCAN function as
follows:
data UploadChk;
infile InData length=RLength lrecl=32767 truncover firstobs=2;
input ;
Line = _infile_;
Award_Number=scan(line,1,'09'x);
Award_Description=scan(line,2,'09'x);
Project_Name_or_Program_Title=scan(line,3,'09'x);
Project_Description=scan(line,4,'09'x);
Time_Stamp_of_Last_Update=scan(line,5,'09'x);
run;
However, since the scan function treats consecutive delimiters as a
single delimiter and there may be empty columns, I thought I could use
CALL SCAN and Perl(r) Regular Expressions to change occurrences of
consecutive tab characters to a tab character, a space, and a tab
character as needed as follows:
%let NumCols=5;
data UploadChk;
if _n_=1 then do;
retain dbltab_re;
dbltab_re = prxparse('s/\t\t/\t \t/');
end;
infile InData length=RLength lrecl=32767 truncover firstobs=2;
input ;
Line = _infile_;
/* When there are less than 5 variables in the observation, then CALL
SCAN
will return zeros for the position and length parameters. This is
the
"signal" that we need to correct the double-delimiter problem. */
call scan(Line,&NumCols.,position,length,'09'x);
if position=0 and length=0 then do;
call prxchange(dbltab_re, -1, line);
end;
Award_Number=scan(line,1,'09'x);
Award_Description=scan(line,2,'09'x);
Project_Name_or_Program_Title=scan(line,3,'09'x);
Project_Description=scan(line,4,'09'x);
Time_Stamp_of_Last_Update=scan(line,5,'09'x);
run;
The problem I have is that the code above replaces the double tabs with
"\t \t" string literal, not <tab> <space> <tab> as I thought it would.
Thus it appears that "\t" can *find* tabs, but is not the correct code
to *write* tabs.
This is my first foray into Perl(r) and CALL PRXCHANGE. The basis for
my code was the code sample in Example 2 of the "Pattern Matching Using
SAS Regular Expressions (RX) and Perl Regular Expressions (PRX)" section
of the SAS documentation accessed from the Help Menu in SAS 9.1.3.
Obviously, I have much to learn.
Kwperl kwprxparse kwprxchange kwscan kw_infile_
************* IMPORTANT - PLEASE READ ********************
This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
************* IMPORTANT - PLEASE READ ********************
This e-mail, including attachments, may include confidential and/or proprietary information,
and may be used only by the person or entity to which it is addressed. If the reader of this
e-mail is not the intended recipient or his or her authorized agent, the reader is hereby
notified that any dissemination, distribution or copying of this e-mail is prohibited. If you
have received this e-mail in error, please notify the sender by replying to this message
and delete this e-mail immediately.