| Date: | Tue, 28 Aug 2007 20:20:57 -0700 |
| Reply-To: | David L Cassell <davidlcassell@MSN.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | David L Cassell <davidlcassell@MSN.COM> |
| Subject: | Re: Processing Multiple data files |
| In-Reply-To: | <200708261926.l7QAklKT001283@malibu.cc.uga.edu> |
| Content-Type: | text/plain; format=flowed |
|---|
randistan69@HOTMAIL.COM wrote back:
>
>On Thu, 23 Aug 2007 23:40:00 -0400, Howard Schreier <hs AT dc-sug DOT org>
><nospam@HOWLES.COM> wrote:
>
> >On Thu, 23 Aug 2007 13:24:03 -0400, SUBSCRIBE SAS-L Anonymous
> ><randistan69@HOTMAIL.COM> wrote:
> >
> >>On Thu, 23 Aug 2007 03:32:44 -0400, SUBSCRIBE SAS-L Anonymous
> >><randistan69@HOTMAIL.COM> wrote:
> >>
> >>>Dear All:
> >>> I have multiple files (about 500) in txt format. They are named
>file1,
> >>>file2 and so on. I am using the following INFILE statement to read the
> >>>data:
> >>>
> >>>Infile 'C:\Documents and Settings\AAA\Desktop\file1.txt' DLM = ','
> >>>Lrecl = 32000 DSD Truncover;
> >>>
> >>>FInally I want to save the output as Output1, Output2...etc in
>Mylibrary
> >>>
> >>>So the last line of the code is:
> >>>
> >>>proc sort data = example out = mylibrary.output1 ; by VarA VarB VarC ;
> >>>run;
> >>>
> >>>Can I process the data using a BAT file where I need a wildcard for the
> >>>file1.txt statement and also for the statement out = mylibrary.output1
>.
> >>>
> >>>Or will a Macro be be preferred?
> >>>
> >>>Thanx for the help in advance
> >>> Randy
> >>
> >>All:
> >> Every time I read the txt files in I have to type test1, test2 and so
>on
> >>in the INFILE statement. Besides, the code for each file takes about 30
> >>minutes to run. That is why I wanted help to determine if I could use a
> >>Macro or use batch processing for these files and save the output as
> >>output1, output2 etc.
> >> Please help.
> >> Randy
> >
> >That's 30 minutes, not 30 seconds?
> >
> >You are in a performance-tuning situation.
> >
> >Perhaps you should explain a bit more. How large is each file? What is
>the
> >dimension distinguishing the 500 from one another? What is the nature of
>the
> >data and the tasks to be performed after you have stored all of the data?
>
>Howard:
> Each file has approximately 3.5 million to 4 million observations. To
>start there are about 50 variables across and after manipulation of the
>the data there are about 120 variables. There is no difference between
>these 500 files: All have the same dimensions and variables.
> I cannot use a set command because it is much better to manipulate
>individual files than handle one single huge file. It takes about 30-45
>minutes to run the code on each individual files and I need to find a more
>efficient way to run the data. Perhaps one way is to reduce the number of
>Proc Sorts.
> Randy
I'm going to disagree.
I think Howard is right. (Well, duh. Howard's always right.)
You are going to have a miserable time accessing all 500 tables separately,
over and over and over. You are going to have a miserable time storing
all that extra crud from those 70 extra variables. (I have to wonder if
they are really needed.)
You would be way better off using a data step view here. Create a
view that has the new variables (if you insist - I would hesitate on that)
and puts the tables in a tall-and-thin view.
SAS works better with by-processing and tall-and-thin tables.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Booking a flight? Know when to buy with airfare predictions on MSN Travel.
http://travel.msn.com/Articles/aboutfarecast.aspx&ocid=T001MSN25A07001
|