LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2011, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 11 Aug 2011 12:58:29 +0000
Reply-To:     Michael Raithel <michaelraithel@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Michael Raithel <michaelraithel@WESTAT.COM>
Subject:      Re: Architectural approach for high performance transpose
In-Reply-To:  <201108110000.p7AKDvmV020461@waikiki.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"

Dear SAS-L-ers,

In this interesting discussion, Scott posted, in part, the following:

> Are there any tricks to make PROC TRANSPOSE run faster? (Our server is > virtualized - I've tested SPDE but it didn't make any performance > difference).

<<Scott's entire original post can be found beneath the Sig line>>

Scott, yea, I was struck by the fact that you essentially asked that question early on in this discussion and that you stated that a hardware upgrade was not an option. When reading your OP, I thought that maybe using a series of data views to chunk up your data through several PROC TRANSPOSEs might work. My reasoning is that perhaps transposing the HUMONGO data set is overwhelming your server's resources. Yes, it can do the work, but it takes longer to flip all of that data at one time.

How about trying the following:

1. Create 4 views, each of which has a WHERE statement that surfaces succeeding 1/4's of your HUMONGO data set. (See my reply to Ravinder on Wed 8/3/2011 3:06 PM).

2. Run 4 separate transposes, each of which inputs one of the four views and outputs a transposed data set.

3. Use PROC APPEND to append output datasets 2 - 4 to the first output data set.

With any luck, the four smaller transposes + the append will run faster than transposing the HUMONGOUS SAS data set. A thought for sure!

Scott, best of luck in all your SAS endeavors!

I hope that this suggestion proves helpful now, and in the future!

Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates. All SAS code and/or methodologies specified in this posting are for illustrative purposes only and no warranty is stated or implied as to their accuracy or applicability. People deciding to use information in this posting do so at their own risk. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Michael A. Raithel "The man who wrote the book on performance" E-mail: MichaelRaithel@westat.com

SAS Press Author Page: http://support.sas.com/publishing/authors/raithel.html

Author: Tuning SAS Applications in the MVS Environment

Author: Tuning SAS Applications in the OS/390 and z/OS Environments, Second Edition http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172

Author: The Complete Guide to SAS Indexes http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Expecting the world to treat you fairly because you are good is like expecting the bull not to charge because you are a vegetarian. - Dennis Wholey +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

<<Scott's entire original post>>

> > Thanks Mr. _null_, much appreciated. > > While I like your tighter, less "convoluted" approach than my macro > code > generation, it doesn't perform as well as my macro approach. > > Below is log output from my "real" data, although this data is only > about > 1.5M records vs. the 50M+ in the final program. > > Are there any tricks to make PROC TRANSPOSE run faster? (Our server is > virtualized - I've tested SPDE but it didn't make any performance > difference). > > I started a PROC FREQ/data step transpose approach, but the code was > even > messier than my original approach. Plus it didn't perform well; I > think the > line: > > if scan(vname(_sdAncBen{_i}),2,"_")=deNormLevel then > _sdAncBen{_i}=sdAncBen; > > is the culprit, since I'm calling multiple functions per array element > * # > of observations. > > Perhaps I can use PROC FREQ + the hash object to code up a > transpose/"flip > flop" algorithm??? > > ================================================ > > 39662 * version 1 ; > 39663 * set by and var variables for proc transpose ; > 39664 * these will be used later for post-processing ; > 39665 %bench(start); > 39666 > 39667 %let by=covno servdate; > 39668 %let var=sdAncBen sdAncClm; > 39669 %let id=deNormLevel; > 39670 > 39671 * the bulk of the run time is in the transpose ; > 39672 proc transpose data=source > 39673 out=transposed; > 39674 by &by; > 39675 var &var; > 39676 id &id; > 39677 run; > > NOTE: There were 1017664 observations read from the data set > WORK.SOURCE. > NOTE: The data set WORK.TRANSPOSED has 1811184 observations and 10 > variables. > NOTE: PROCEDURE TRANSPOSE used (Total process time): > real time 12.56 seconds > user cpu time 10.82 seconds > system cpu time 0.73 seconds > Memory 2170k > OS Memory 17276k > Timestamp 11-Aug-11 09:52:07 > > > 39678 > 39679 * fast ; > 39680 proc contents data=transposed out=columns (keep=name) noprint; > 39681 run; > > NOTE: The data set WORK.COLUMNS has 10 observations and 1 variables. > NOTE: PROCEDURE CONTENTS used (Total process time): > real time 0.01 seconds > user cpu time 0.00 seconds > system cpu time 0.01 seconds > Memory 174k > OS Memory 15224k > Timestamp 11-Aug-11 09:52:07 > > > 39682 > 39683 * fast ; > 39684 proc sql noprint; > 39685 select name into :columns separated by " " > 39686 from columns > 39687 where upcase(name) not in (%seplist(%upcase(&by) _NAME_ > _LABEL_,nest=QQ)) > MPRINT(SEPLIST): "COVNO","SERVDATE","_NAME_","_LABEL_" > 39688 ; > 39689 quit; > NOTE: PROCEDURE SQL used (Total process time): > real time 0.01 seconds > user cpu time 0.01 seconds > system cpu time 0.00 seconds > Memory 91k > OS Memory 15224k > Timestamp 11-Aug-11 09:52:07 > > > 39690 %put &columns; > itmChiro itmGenDental itmMajDental itmOptical itmPhysio itmPrvDental > itm_Other > 39691 > 39692 * macro to build rename list (parses &columns) ; > 39693 %macro rename(prefix); > 39694 &word=&prefix._&word > 39695 %mend; > 39696 > 39697 * macro to build dataset list (parses &var) ; > 39698 %macro datasets; > 39699 transposed (where=(_name_="&word") > 39700 rename=(%loop(&columns,mname=rename(&word)))) > 39701 %mend; > 39702 > 39703 * this is actually pretty fast. ; > 39704 data merged; > 39705 merge > 39706 %loop(&var,mname=datasets) > MPRINT(DATASETS): transposed (where=(_name_="sdAncBen") rename > MPRINT(DATASETS): =( itmChiro=sdAncBen_itmChiro > itmGenDental=sdAncBen_itmGenDental itmMajDental=sdAncBen_itmMajDental > itmOptical=sdAncBen_itmOptical itmPhysio=sdAncBen_itmPhysio > itmPrvDental=sdAncBen_itmPrvDental > itm_Other=sdAncBen_itm_Other)) > MPRINT(DATASETS): transposed (where=(_name_="sdAncClm") rename > MPRINT(DATASETS): =( itmChiro=sdAncClm_itmChiro > itmGenDental=sdAncClm_itmGenDental itmMajDental=sdAncClm_itmMajDental > itmOptical=sdAncClm_itmOptical itmPhysio=sdAncClm_itmPhysio > itmPrvDental=sdAncClm_itmPrvDental > itm_Other=sdAncClm_itm_Other)) > 39707 ; > 39708 by &by; > 39709 drop _name_; > 39710 run; > > NOTE: There were 905592 observations read from the data set > WORK.TRANSPOSED. > WHERE _name_='sdAncBen'; > NOTE: There were 905592 observations read from the data set > WORK.TRANSPOSED. > WHERE _name_='sdAncClm'; > NOTE: The data set WORK.MERGED has 905592 observations and 16 > variables. > NOTE: DATA statement used (Total process time): > real time 3.04 seconds > user cpu time 1.15 seconds > system cpu time 0.42 seconds > Memory 324k > OS Memory 15224k > Timestamp 11-Aug-11 09:52:10 > > > 39711 > 39712 %bench(end); > > Benchmark END: > > Elapsed seconds = 00 hours, 00 minutes, 16 seconds 15.6560001373291 > Total seconds = 00 hours, 00 minutes, 16 seconds 15.6560001373291 > > 39713 > 39714 * version 2 ; > 39715 %bench(start); > 39716 > 39717 proc transpose data=source out=flip; > 39718 by &by &id; > 39719 var &var; > 39720 run; > > NOTE: There were 1017664 observations read from the data set > WORK.SOURCE. > NOTE: The data set WORK.FLIP has 2035328 observations and 5 variables. > NOTE: PROCEDURE TRANSPOSE used (Total process time): > real time 14.37 seconds > user cpu time 7.82 seconds > system cpu time 0.95 seconds > Memory 2164k > OS Memory 17276k > Timestamp 11-Aug-11 09:52:24 > > > 39721 > 39722 proc transpose data=flip out=flop (drop=_name_) DELIMITER=_; > 39723 by &by; > 39724 id _name_ &id; > 39725 var col1; > 39726 run; > > NOTE: There were 2035328 observations read from the data set WORK.FLIP. > NOTE: The data set WORK.FLOP has 905592 observations and 16 variables. > NOTE: PROCEDURE TRANSPOSE used (Total process time): > real time 21.65 seconds > user cpu time 21.17 seconds > system cpu time 0.57 seconds > Memory 2178k > OS Memory 17276k > Timestamp 11-Aug-11 09:52:46 > > > 39727 > 39728 %bench(end); > > Benchmark END: > > Elapsed seconds = 00 hours, 00 minutes, 36 seconds 36.0629999637603 > Total seconds = 00 hours, 00 minutes, 36 seconds 36.0629999637603 > > 39729 > 39730 proc compare base=merged compare=flop note list; > 39731 run; > > NOTE: No unequal values were found. All values compared are exactly > equal. > NOTE: The data sets WORK.MERGED and WORK.FLOP are equal in all > respects. > NOTE: There were 905592 observations read from the data set > WORK.MERGED. > NOTE: There were 905592 observations read from the data set WORK.FLOP. > NOTE: PROCEDURE COMPARE used (Total process time): > real time 2.04 seconds > user cpu time 1.64 seconds > system cpu time 0.28 seconds > Memory 269k > OS Memory 15224k > Timestamp 11-Aug-11 09:52:48


Back to: Top of message | Previous page | Main SAS-L page