LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 1998, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 5 Jun 1998 04:07:29 -0400
Reply-To:     "Kott, Matt" <MKott@MOBILNET.GTE.COM>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         "Kott, Matt" <MKott@MOBILNET.GTE.COM>
Subject:      Re: Re[5]: sort by format

SAS-Lr's

I'm not much of a SAS person (yet), but I have some experience with sql, and thought I might offer another suggestion to what I think was the original issue - being able to sort by state name when the data only included a state code of some sort. I haven't tested this, but I'm sure you'll all have a good idea if it would work any more efficiently than the slower running sql code already offered by a few.

What about creating (if not already available somewhere) another "lookup" dataset that would contain an obs for each state and contain the variables "state_cd" and "state_nm" that could then be used by way of a join for sorting on the name and not code.

Example (states is lookup ds):

proc sql; create table new1 as select test.* from test, states where test.state_cd = states.state_cd order by states.state_nm;

This, of course wouldn't include the name of the state in the result, but could include it if needed. If I'm correct in what I'm thinking, wouldn't this then leave you with a sorted table in state name order? And - more importantly - wouldn't this then run much quicker since there is no reformatting of any vars?

I'm interested to know what you all think.

Matt Kott mkott@mobilnet.gte.com

> -----Original Message----- > From: HERMANS1 [SMTP:HERMANS1@WESTAT.COM] > Sent: Thursday, June 04, 1998 8:11 PM > To: SAS-L@UGA.CC.UGA.EDU > Subject: Re: Re[5]: sort by format > > John: > > In my tests of a SQL "formatted" sort against a data step/proc sort, > the latter runs about 30% faster on both elapsed time and CPU time. > Not as great a difference as you found, but it shows a substantial > difference nonetheless. The Westat SQL side begrudgingly concedes one > to the Mediscience Datasteppers. Sig > > 22 proc sql; > 23 create table new1 as > 24 select * > 25 from test > 26 order by put(x, tryfmt.) > 27 ; > NOTE: The query as specified involves ordering by an item that doesn't > appear in its SELECT clause. > NOTE: Table WORK.NEW1 created, with 1000000 rows and 2 columns. > > 28 quit; > NOTE: PROCEDURE SQL used: > real time 3:30.62 > cpu time 2:04.22 > > > 29 data test2 ; > 30 set test ; > 31 y = put(x, tryfmt.) ; > 32 run ; > > NOTE: The data set WORK.TEST2 has 1000000 observations and 3 > variables. > NOTE: DATA statement used: > real time 28.64 seconds > cpu time 20.42 seconds > > > 33 proc sort data = test2 out = new2 (drop = y) ; > 34 by y ; > 35 run ; > > NOTE: The data set WORK.NEW2 has 1000000 observations and 2 variables. > NOTE: PROCEDURE SORT used: > real time 1:58.44 > cpu time 1:07.42 > > > ______________________________ thread _______________________________ > At 16:10 04/06/98 -0400, HERMANS1 responded to my earlier post with: > > >Please remind us why the simple and ordinary SQL solution to this > >problem of sorting by formatted values does not suffice. > > Sig, I'm sure it probably does suffice; at the time I wrote the message > to > which you are responding, no-one involved in the thread had suggested the > SQL method! > > >Although > >this long thread has produced some interesting observations on > >formatted values in Base SAS, I for one have lost track of the main > >topic. > > Are you sure that you are not confusing different threads? Unless I've > been > missing posts, this thread has not been at all 'long' (not by SSAS-L > standards, anyway!) and hasn't significantly strayed from the initial > question - which was the one sentence query ... " Is there a simple way to > sort a dataset by a specified variable's format instead of it's value? " > > >Anyone jumping into the discussion at this latest posting would > >think that the SAS System has a critical defect. > > I'm not quite clear why you say that. In response to Jack's comment that > he > was surprised that PROC SORT didn't have an option to sort by formatted > values, I was merely suggesting one possible reason why such an option may > not have been implemented as yet. Nothing in my post to which you are > responding mentioned, or even implied, a 'defect', let alone a 'critical > one'. > > >But why would anyone > >actually need either a sort procedure that works on formatted values > >(other than a SQL ORDER BY clause) or a user-written program? > > I'm not sure what you're asking here. If you want to know the specific > need > that resulted in the initial query, you'd obviously have to ask the person > who wrote it. If you are asking why anyone would need this option when > they > can achieve the functionality using SQL, the simply answer is obviously > that > they wouldn't 'need' it - but, come to that, the same argument says that > they don't really need PROC SORT at all! I would also wonderabout the > performance of the SQL method (see below). > > I can but presume that you're not asking why anyone would ever want to > sort > by formatted values, since it's something that I find myself wanting to do > all the time - for example, I often want to PRINT variables ordered by > formatted values, when they primarily exist as numerical codes. The > approach I normally take is to create a second variable containing the > formatted value, and then sort by that. > > As for performance, the following log shows the comparison of the SQL and > DATA/SORT methods for a data set of 1 million observations, using a simple > 3-level format: > > > 707 proc sql; > 708 create table new1 as > 709 select * > 710 from test > 711 order by put(x, tryfmt.) > 712 ; > NOTE: The query as specified involves ordering by an item that doesn't > appear in its > SELECT clause. > NOTE: Table WORK.NEW1 created, with 1000000 rows and 2 columns. > > 713 quit; > NOTE: The PROCEDURE SQL used 3 minutes 36.24 seconds. > > 715 data test2 ; > 716 set test ; > 717 y = put(x, tryfmt.) ; > 718 run ; > > NOTE: The data set WORK.TEST2 has 1000000 observations and 3 variables. > NOTE: The DATA statement used 31.64 seconds. > > 719 proc sort data = test2 out = new2 (drop = y) ; > 720 by y ; > 721 run ; > > NOTE: The data set WORK.NEW2 has 1000000 observations and 2 variables. > NOTE: The PROCEDURE SORT used 47.07 seconds. > > ... so SQL took around 216 seconds and the DATA/SORT about 79 seconds - in > both cases with 7 'lines' of SAS code. As I said in the message to which > you're replying, I would suspect that the reduced speed of the SQL method > (as would be the case for a PROC SORT option which did format lookups 'on > the fly') is that internal->formatted conversions have to be done every > time > a value is 'looked at' during the ordering/sorting process, whereas with > the > DATA/SORT approach, the conversion is only undertake once for each > observation. > > Kind Regards, > > John > > ---------------------------------------------------------------- > Dr John Whittington, Voice: +44 (0) 1296 730225 > Mediscience Services Fax: +44 (0) 1296 738893 > Twyford Manor, Twyford, E-mail: medisci@powernet.com > Buckingham MK18 4EL, UK mediscience@compuserve.com > ----------------------------------------------------------------


Back to: Top of message | Previous page | Main SAS-L page