|Date: ||Tue, 28 Aug 2007 20:20:57 -0700|
|Reply-To: ||David L Cassell <davidlcassell@MSN.COM>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||David L Cassell <davidlcassell@MSN.COM>|
|Subject: ||Re: Processing Multiple data files|
|Content-Type: ||text/plain; format=flowed|
randistan69@HOTMAIL.COM wrote back:
>On Thu, 23 Aug 2007 23:40:00 -0400, Howard Schreier <hs AT dc-sug DOT org>
> >On Thu, 23 Aug 2007 13:24:03 -0400, SUBSCRIBE SAS-L Anonymous
> ><randistan69@HOTMAIL.COM> wrote:
> >>On Thu, 23 Aug 2007 03:32:44 -0400, SUBSCRIBE SAS-L Anonymous
> >><randistan69@HOTMAIL.COM> wrote:
> >>>Dear All:
> >>> I have multiple files (about 500) in txt format. They are named
> >>>file2 and so on. I am using the following INFILE statement to read the
> >>>Infile 'C:\Documents and Settings\AAA\Desktop\file1.txt' DLM = ','
> >>>Lrecl = 32000 DSD Truncover;
> >>>FInally I want to save the output as Output1, Output2...etc in
> >>>So the last line of the code is:
> >>>proc sort data = example out = mylibrary.output1 ; by VarA VarB VarC ;
> >>>Can I process the data using a BAT file where I need a wildcard for the
> >>>file1.txt statement and also for the statement out = mylibrary.output1
> >>>Or will a Macro be be preferred?
> >>>Thanx for the help in advance
> >>> Randy
> >> Every time I read the txt files in I have to type test1, test2 and so
> >>in the INFILE statement. Besides, the code for each file takes about 30
> >>minutes to run. That is why I wanted help to determine if I could use a
> >>Macro or use batch processing for these files and save the output as
> >>output1, output2 etc.
> >> Please help.
> >> Randy
> >That's 30 minutes, not 30 seconds?
> >You are in a performance-tuning situation.
> >Perhaps you should explain a bit more. How large is each file? What is
> >dimension distinguishing the 500 from one another? What is the nature of
> >data and the tasks to be performed after you have stored all of the data?
> Each file has approximately 3.5 million to 4 million observations. To
>start there are about 50 variables across and after manipulation of the
>the data there are about 120 variables. There is no difference between
>these 500 files: All have the same dimensions and variables.
> I cannot use a set command because it is much better to manipulate
>individual files than handle one single huge file. It takes about 30-45
>minutes to run the code on each individual files and I need to find a more
>efficient way to run the data. Perhaps one way is to reduce the number of
I'm going to disagree.
I think Howard is right. (Well, duh. Howard's always right.)
You are going to have a miserable time accessing all 500 tables separately,
over and over and over. You are going to have a miserable time storing
all that extra crud from those 70 extra variables. (I have to wonder if
they are really needed.)
You would be way better off using a data step view here. Create a
view that has the new variables (if you insist - I would hesitate on that)
and puts the tables in a tall-and-thin view.
SAS works better with by-processing and tall-and-thin tables.
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
Booking a flight? Know when to buy with airfare predictions on MSN Travel.