| Date: | Wed, 20 Jan 2010 11:10:32 -0500 |
| Reply-To: | T J <tj_noreply@YAHOO.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | T J <tj_noreply@YAHOO.COM> |
| Subject: | Re: Hash Question |
| Content-Type: | text/plain; charset=ISO-8859-1 |
Another observation:
if the source datasets happened to be sorted, then SQL would use similar
cpu times as the hash. You can experiment this by sorting and unsorting
your datasets. SAS has made SQL run mush faster from v8 to v9.
Happy merging!
-TJ
On Wed, 20 Jan 2010 04:14:11 -0800, Eli Y. Kling <eli.kling@GMAIL.COM>
wrote:
>I have also experienced similar performances. First I would like to
>point out that in certain situations proc sql uses built in hashing.
>Secondly, you might want to try and speed up the hashing by
>experimenting with the number of bins. And thirdly, I think you will
>find that in your situation the fasted solution is the good old �join
>by Formant and put�.
>
>I expect that if your small table were bigger by a factor of 10 or 100
>than the hash might be faster than the proc sql.
>
>Eli
>
>On 20 Jan, 11:29, ted.ki...@LEWIN.COM ("Kirby, Ted") wrote:
>> I am starting to use Hash Objects more and more and thought I would do a
=
>> little test. I ran code that will generate a dataset of all patient =
>> numbers where the DRG assigned to the case was one in a list of top 75 =
>> DRGs. The log entries are below.
>> =20
>> What surprised me is that the hash solution took longer in real time =
>> than the PROC SQL solution did. Although the hash solution did take =
>> less CPU time. Any thoughts?
>> =20
>> PROC SQL SOLUTION:
>> 85 proc sql;
>> 86 create table out.Top75CRBIPatients as
>> 87 select o.PatientNum,V27DRG from out.FullPAR10GrprOut o
>> 88 right join out.CRBITop75 t on GrouperDRG=3DV27DRG;
>> NOTE: Table OUT.TOP75CRBIPATIENTS created, with 3887478 rows and 2 =
>> columns.
>> 89 quit;
>> NOTE: PROCEDURE SQL used (Total process time):
>> real time 47.06 seconds
>> cpu time 28.53 seconds
>> =20
>> =20
>> HASH SOLUTION:
>> 91 data out.Top75CRBIPatients;
>> 92 if _n_=3D1 then do;
>> 93 declare hash CRBIDRGs(dataset: 'out.CRBITop75');
>> 94 rc=3DCRBIDRGs.defineKey('V27DRG');
>> 95 rc=3DCRBIDRGs.defineDone();
>> 96 end;
>> 97 set out.FullPAR10GrprOut;
>> 98 V27DRG=3DGrouperDRG;
>> 99 rc=3DCRBIDRGs.find();
>> 100 if rc=3D0 then output;
>> 101 run;
>> NOTE: There were 75 observations read from the data set OUT.CRBITOP75.
>> NOTE: There were 13122131 observations read from the data set =
>> OUT.FULLPAR10GRPROUT.
>> NOTE: The data set OUT.TOP75CRBIPATIENTS has 3887478 observations and 31
=
>> variables.
>> NOTE: DATA statement used (Total process time):
>> real time 1:22.95
>> cpu time 22.10 seconds
>>
>> =20
>> Ted Kirby
>> Consultant,
>> The Lewin Group, Inc.
>> 3130 Fairview Park Drive, Suite 800
>> Falls Church, VA 22042
>> Phone: (703)269-5507
>> Fax: (703)269-5501
>> e-mail: ted.ki...@lewin.com <mailto:ted.ki...@lewin.com> =20
>>
>> ************* IMPORTANT - PLEASE READ ********************
>>
>> This e-mail, including attachments, may include confidential and/or
propr=
>> ietary information, and may be used only by the person or entity to
which=
>> it is addressed. If the reader of this e-mail is not the intended
recipi=
>> ent or his or her authorized agent, the reader is hereby notified that
an=
>> y dissemination, distribution or copying of this e-mail is prohibited.
If=
>> you have received this e-mail in error, please notify the sender by
repl=
>> ying to this message and delete this e-mail immediately.
>> =0D
|