Date: Thu, 2 Nov 2006 04:59:28 -0500
Reply-To: Peter Crawford <Peter.crawford@BLUEYONDER.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Crawford <Peter.crawford@BLUEYONDER.CO.UK>
Subject: Re: reading in multple csv files and exporting to excel via macro
On Wed, 1 Nov 2006 14:19:39 -0800, Neerav Monga <neerav.monga@GMAIL.COM>
>I have about 50 files that are in csv format labelled file1.csv ,
>file2.csv etc... (however it does skip numbers, eg.. 1-5, then 20-25,
>they aren't necessarily consecutive numbers). I saved all of them in
>one folder in windows xp using SAS 9.1.3.
>I know i can put the proc import/export into a macro and feed the macro
>the input filename and export file name, however, is there a way to
>automate this somehow? It would save me a lot of time.
>I know the non consecutive numbers might be an issue, but they have to
>stay as labelled. After the file is imported, I need to remove the
>first four characters in a field in the file (which I can easily do
>with the substring command), drop the old field with extra characters
>and then export it.
>Perhaps there's a way to import / export an entire folder/library at
>once? That might be the easiest way to go?
>Any help would be appreciated,
why load all these 50 files into SAS ?
why assume proc import is appropriate ?
Proc Import is limited in what it can achieve for one .csv file
>>>> for example,
determining date data formats 03APR03, 12/12/12
determining that a column is mixed and not numeric
When you are loading 50 files, your problems escalate enormously!
If you want these .csv files for your own processing, consider using
SAS Enterprise Guide which is renowned for what it does in this area.
When you need to provide such data for someone else, probably
you already know what data formats to expect. In that situation
proc import is not your best approach, because you cannot give
it the information about datatypes like dates.
Instead of using proc import, use one datastep for each differing
data layout. Data of the same layout can be read in one datastep
even when there are many files with the same layout.
The reason for using this approach is that it combines the
different problems (directory list, one-at-a-time macro, datatypes)
into one manageable step.
The infile statement can refer to a quoted file name with
* and ? global characters in the file name. The effect is to
read through all files conforming to that name pattern.
Alternatively, it could refer to a fileref associated in a filename
statement with multiple files - called file concatenation. Again
all lines of all the files listed in the concatenation, get read.
In a datastep, not only can you use substr() type processing
to modify the data as it is loaded, you can also use
something similar to identify the file-type. Given that, you
can execute the input and output statements relevant to
that file and column layout.
Here is a demo of reading all file*.csv files from a folder
where there are three different layouts, determined by the
first field on the line. I have used specific column names
and simple table names.
data data_layoutA( keep= columns of that layout filename )
data_layoutB( keep= column for structureB filename )
data_layoutC( keep= more_columns used by C filename ) ;
attrib columns length= $3
of length= 8 informat= date9.
that length= $20 layout= $100
column length= $8 for length= 4 structureB length=$20
used informat= yymmdd10. format=date9.
by length= $4 C length= 6
filename filenam length= $200 first $20 ;
retain filename ;
infile "&path\file*.csv" truncover dsd lrecl= 10000
filename= filenam ;
input first @ ;
/* capture filename and drop heading line */
if filename ne filenam then do;
* change of filename implies line 1 of new csv file ;
filename = filenam ;
select (first) ;
when ( "typeA" ) do;
input columns of that layout ;
output data_layoutA ;
when ( "typeB" ) do;
input column for structureB ;
column = substr( column, 5 ); * remove first4 characters;
output data_layoutB ;
when ( "typeC" ) do;
input more_columns used by C ;
output data_layoutc ;
PUT 'problem: unknown data type in' filename= _infile_ ;
end; /* of select group */
It may seem to be more syntax than a proc import, but consider
writing 50 proc imports and then the clean-up you will need.
The statements, like: attrib, informat, infile, select, ...
are all worth reviewing in the SAS doc.