|
from Master Ian, Praise indeed !
many thanks.
It may point up to those who have had to write then re-read
data, that the update-able _infile_ might offer much
potential for shorter solutions than used to be necessary.
Input statements provide so many text parsing capabilities !
With update-able _infile_, we can take advantage of
these, without having to go to external text files.
(sounds like it might be one for coder's corner ! )
(.............but maybe a bit late with v9 around the corner )
Kind Regards
Peter Crawford
for those who received this window in a useless wrapped state, here
(in 70 column width)is the most important area, the 9 parts extracted
from the input strings
+FSVIEW: WORK.NEW_SCAN (B)------------------------------------------+
| Obs part1 part2 part3 part4 part5 part6 part7 part8 part9 |
| |
| 1 AAA BBB CC DD EE FF GG HHH |
| 2 AAA HHH |
| 3 AAA GGG |
| 4 HHH |
| 5 dlm BBB CCC / , FF GGG HHH |
| 6 AAA H |
| |
+--------------------------------------------------------------------+
Datum: 01/05/2002 15:14
An: Peter Crawford/Zentrale/DeuBaExt@Zentrale
Betreff: RE: Function SCAN and delimiter question
Nachrichtentext:
Peter,
Really neat trick!
IanWhitlock@westat.com
-----Original Message-----
From: Peter Crawford [mailto:peter.crawford@DB.COM]
Sent: Wednesday, May 01, 2002 10:56 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Function SCAN and delimiter question
My suggestion about using an infile buffer was
intended to tolerate those features DSD supports,
like more than one type of delimiter
and delimiters embedded in data, masked in quotes
Having given it a try, it seems to be controlable
Here is a log for a revision of Mike Zdeb's data (more delimiters and some
quotes)
with my _infile_ scan() process
357 data old;
358 infile datalines truncover ;
359 input string $char40.;
360 list;datalines;
RULE: ----+----1----+----2----+----3----+----4----+----5----+--
361 AAA/BBB/CC/DD,EE/FF/GG/HHH
362 AAA///,,//HHH
363 AAA//////GGG/
364 ////////HHH
365 dlm/BBB CCC/"/"/","/FF/GGG/HHH
366 AAA/////,/H
NOTE: The data set WORK.OLD has 6 observations and 1 variables.
NOTE: DATA statement used:
real time 0.13 seconds
cpu time 0.04 seconds
367 ;
368 data new_scan;
369 infile './*' dsd dlm=dlm truncover; /* infile used for scan */
370 dlm = ",/" || "09"x ;
371 set old;
372 input @1 @@; /* establish a _infile_ buffer */
373 _infile_=string; /* replace _infile_ with string to scan with DSD */
374 input (part1 - part9)($) @@ ;
375 run;
NOTE: The infile './*' is: /* the first file in the current directory */
File Name=e:\users\crawford\sas8\addoptn.sas,
File List=e:\users\crawford\sas8\*,RECFM=V,
LRECL=256
NOTE: 1 record was read from the infile './*'.
The minimum record length was 37.
The maximum record length was 37.
NOTE: There were 6 observations read from the data set WORK.OLD.
NOTE: The data set WORK.NEW_SCAN has 6 observations and 10 variables.
NOTE: DATA statement used:
real time 0.73 seconds
cpu time 0.03 seconds
376 dm log 'fsv;formula ' fsv; /* produces this window (because I have
customised the view formula) */
+FSVIEW: WORK.NEW_SCAN
(B)-------------------------------------------------------------------------
---------+
| Obs string part1 part2 part3 part4
part5 part6 part7 part8 part9 |
|
|
| 1 AAA/BBB/CC/DD,EE/FF/GG/HHH AAA BBB CC DD
EE FF GG HHH |
| 2 AAA///,,//HHH AAA
HHH |
| 3 AAA//////GGG/ AAA
GGG |
| 4 ////////HHH
HHH |
| 5 dlm/BBB CCC/"/"/","/FF/GGG/HHH dlm BBB CCC / ,
FF GGG HHH |
| 6 AAA/////,/H AAA
H |
|
|
+---------------------------------------------------------------------------
---------------------------------+
Of course, it would be more interesting to implement on _real_ data !
Regards
Peter Crawford
Datum: 01/05/2002 13:51
An: Peter Crawford/Zentrale/DeuBaExt@Zentrale
Betreff: Re: Function SCAN and delimiter question
Nachrichtentext:
Hi. I think this works OK...TRANWRD then SCAN
data old;
infile datalines truncover;
input string $50.;
datalines;
AAA/BBB///EEE/FFF//HHH
AAA/BBB/CCC/DDD/EEE/FFF/GGG/HHH
AAA///////HHH
AAA//////GGG/
////////HHH
AAA///////
;
run;
data new;
length temp $100;
array part(10) $20;
set old;
temp = tranwrd(string,'/','/~');
j=0;
do while (1=1);
j+1;
part(j) = scan(temp,j,'/');
if part(j) eq '' then leave;
part(j) = compress(part(j),'~');
end;
drop j temp;
run;
proc print data=new;
var string part1-part8;
run;
Mike Zdeb
New York State Department of Health
ESP Tower - Room 1811
Albany, NY 12237
P/518-473-2855 F/630-604-1475
Peter Crawford
<peter.crawford@D To: SAS-L@LISTSERV.UGA.EDU
B.COM> cc:
Sent by: "SAS(r) Subject: Re: Function SCAN and
delimiter question
Discussion"
<SAS-L@LISTSERV.U
GA.EDU>
05/01/02 06:06 AM
Please respond to
Peter Crawford
The behaviour wanted by RICH0850 <rich0850@AOL.COM>
is probably available "real-soon-now" in v9 !
But here is an interim alternative !
it only requires sas v8 to update _infile_
I haven't tried this ...yet, but
why not use that DSD option on our text string ?
Prepare a spare infile statement which you only use
for this purpose. It should have the dsd option and
dlm=<varname> (where that variable <varname>
has the delimiters you need to use)
When you want to use scan() with dsd behaviour,
issue an infile statement using your special infile ,
place the text you want to scan, into _infile_
use input statements as necessary to extract from your string
(but keep a trailing @ or @@ )
If the rest of the data step is using infile/inputs then you
need to take of current the _infile_ buffer and column position
before swithing to the special infile.
Has anyone tried this kind of thing ?
Regards
Peter Crawford
Datum: 01/05/2002 01:16
An: SAS-L@LISTSERV.UGA.EDU
Antwort an: RICH0850 <rich0850@AOL.COM>
Betreff: Function SCAN and delimiter question
Nachrichtentext:
Dear Group:
I want to use function SCAN as defined in the documentation.
Unfortunately, if
you have multiple occurances of the delimiter you have specified occuring
in
sequence, SCAN treats them as a single occurance. I want SCAN to return me
results similar to the DSD option of the INFILE statement where multiple
occurances of the delimiter are interpreted as missing values between
delimiters.
Any ideas or work arounds?
Thanks.
--Richard
--
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
|