| Date: | Mon, 9 May 2005 17:14:46 -0400 |
| Reply-To: | Peter Crawford <peter.crawford@BLUEYONDER.CO.UK> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Peter Crawford <peter.crawford@BLUEYONDER.CO.UK> |
| Subject: | Re: Processing multiple CSV files from a directory |
|
On Mon, 9 May 2005 16:19:01 -0000, sa polo <solouga2@REDIFFMAIL.COM> wrote:
>Hi All,
I am attempting to write a program to
read CSv files from several sub-directories and process them
by adding a date field and part of the file name to the new
file which is again in CSV format.
Directory: c:\
Sub-Dirs : 20010101, 20010102 .................
CSV Files in each directory abc_a.csv, abc_ab.csv abc_se.csv abc_po.csv
abc_**** .........
Extract the data from each file merge the date(obtained from the sub-
directory) in date9. format and whatever appears after the _(underscore )
in the
file name .
example
File abc_a in directory 20010101:
1,se,oo,poll
output
1,se,oo,poll,01JAN2001,a
The program should read all the sub-directories in the directory
in this case c:\ and process all files and write them to a separate
location on say another drive or directory in the same csv format .
All assistance is much appreciated as usual.
Sa
I think we've seen this question before, so a better option
may be to read the archives, than the following ....
This from memory gets close (*** beware untested code *****)
%let pathRoot = c:\ ;
%let pathMask = &pathroot.200 ; * filter paths with this prefix ;
%let fileMask = abc_ ; * only filenames beginning like this ;
* assuming all files have same column lengths/types structure ;
%macro the_data( len_defn ) ;
length &len_defn ;
input %scan( &len_defn, 1 ) /* first column name */
-- %scan( &len_defn,-2 ) /* name before the last length */
%mend;
* get filenames & paths from command pipe ;
filename collect pipe "dir /b/s ""&pathRoot"" " lrecl= 1000 ;
* get data ;
filename dum1 '.' ; * dummy fileref for filevar infile ;
data results( drop= filen ) ;
length pathName filen $1000 file $60 ;
retain filen ;
infile collect ;
input ;
pathName = _infile_ ;
*I load the path and filename from the pipe, like that
to avoid multiple embedded blank input problems ;
if upcase(pathname) =: "%upcase(&pathMask)" ;
filen = scan( pathName, -1, '\' );
if filen =: "&fileMask" ;
dir_date = input( substr( pathname, 4,8), ?? yymmdd8. ) ;
if dir_date =. then delete ; * must be valid date!;
* avoid paths that do not start with a valid date ;
format dirDate date9. ;
file = substr( filen, 1+ index( filen, '_' ));
* now point into file to be read ;
infile dum1 filevar= pathname lrecl= 30000 DSD end= eofD ;
input ; * drops the normal column-names heading-line ;
* not needed if there is no heading line ;
do while( not eofD );
%the_data( id 8 cat1 $2 cat2 $2 poll $8 ) ;
output ;
end;
* reset the end-of-data flag ready to read the next file ;
eofD= 0;
run;
* for unix, the command to pipe would be a little different
and the parsing would use / not \
Little else would need to change ;
Does that seem to cover the customisation needed ?
Peter Crawford
|