Date: Mon, 26 Apr 1999 17:43:15 +0100
Reply-To: Peter Crawford <Peter@CRAWFORDSOFTWARE.DEMON.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Peter Crawford <Peter@CRAWFORDSOFTWARE.DEMON.CO.UK>
Subject: Re: How to trick Proc Freq?
I would like you to consider using data steps as acceptable substitute
for proc summary in these and certain other conditions.
738 proc summary data=policy10;
739 class fldr_id;
740 output out=b;
NOTE: The data set WORK.B has 631850 observations and 3 variables.
NOTE: The PROCEDURE SUMMARY used 8 minutes 20.8 seconds.
Correct me if I'm wrong, but I assume you are proposing Alex use this
kind of solution - this proc summary, followed by some processing to
generate the cumulative frequencies...
The summary step taking 8 minutes above was the foreground task on my
win95 machine. There is ample memory, and disk space.
It would be followed by some more processing run time.
You wish to ban my suggestion of a datastep based process, which took
less than one minute to complete the whole task !
I thought you might be wrong, but others may convince me my proc summary
test above was not eight wasted minutes...
david pider <dpider@HOTMAIL.COM> writes
>Alex Martchenko wrote:
>> I have a credit policy SAS data set with almost 100 million
>>observations. It has a numeric variable FLDR_ID that can be any
>>integer number from -500,000 to +500,000. So on the average there are
>>about 100 observations with the same FLDR_ID on the file. The data
>>set is sorted by a different variable (account number) and so it's
>>unsorted by FLDR_ID. I'm trying to create a data set with
>>frequencies, cumulative frequencies, percents and cumulative percents
>>for all values of FLDR_ID. I'm trying to use proc freq like that:
>> proc freq data=folders(keep=fldr_id);
>> tables fldr_id / missing out=fldr_fq;
>>But I have 2 problems: 1) proc freq runs out of memory and abends
>>the job 2) if I run it on a much smaller subset so it runs OK, it
>>doesn't create cumulative variables in the OUT= file. I could live
>>with problem 2 because I can read the output data set with frequency
>>and then calculate cumulatives in a data step.
>> Real pain is problem 1. I run it in batch on OS/390, 6.09E. A job
>>can only use 50M of memory (that's all system people say we can have
>>for a single SAS job). But proc freq wants more, so if I specify
>>MEMSIZE=100M it doesn't matter because after 50M limit is exceeded,
>>the job abends anyway.
>If you read SAS manuals you wouldn't be in trouble. About PROC FREQ,
>the manual says 'the maximum number of levels allowed for any one
>variable is 32,767. If you have a variable with more than 32,767
>levels, use the SUMMARY procedure'.
>>I know I can sort it by FLDR_ID and then use first. and last.
>>variables to do what I need. But in our system sorting 100 million
>>records is a big problem itself. Does anyone have suggestions? I will
>>greatly appreciate any input.
>Here is one: if some of self appointed 'gurus' offer you a 'datastep
>algorithm' claiming it is 'more efficient' that SAS procs do yourself
>a favor and use 'delete' button. Trust my 15 years in SAS, none of
>thses 'solutions' run faster or 'more efficiently' than properly
>applied SAS procs. I've just fired one 'expert' as soon as I saw that
>he tried to get smart and summarized in datastep instead of proc
>summary. If I can't ban them 'gurus' from SAS profession, at least I
>can keep them away from my shop. And it's getting worse. Now this
>'datastep approach' have even sneaked into SUGI. I almost fainted at
>seeing a paper by one of them 'gurus' (who I bet have no idea of real
>world programing) trying to replace proc sort with his 'quick sort'
>monstrosity. I wonder why he didn't call it 'blast sort' or something?
>BTW, try SQL too. Personally, I don't like it (it shouldn't of been in
>SAS in the first place) but at least it's a SAS proc and at least will
>beat any 'datastep algorithm' hands down.
>15 years of SAS experience
>Get Free Email and Do More On The Web. Visit http://www.msn.com
Peter Crawford (_knowledge_ is a poor substitute for *real* experience,
but they make a great team)