Date: Tue, 7 Apr 2009 05:54:26 -0700
Reply-To: Jack Hamilton <jfh@STANFORDALUMNI.ORG>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jack Hamilton <jfh@STANFORDALUMNI.ORG>
Subject: Re: proc sql- distinct, macros, etc.
Content-Type: text/plain; charset="iso-8859-1"
The answer to 1) and 2) is that order isn't guaranteed unless you use an ORDER BY clause.
I would use DISTINCT in 3), because I'm not sure SQL would add it for me.
What problem do you think will occur if the 'order by' and 'distinct' fields are not the same?
From: Mark S. <marksmolkin@GMAIL.COM>
Sent: Monday, April 06, 2009 10:30 PM
Subject: [SAS-L] proc sql- distinct, macros, etc.
I am hoping to get some clarification regarding proc sql behavior for
which the manuals ain't cuttin' it. Any advice on helpful references would
be greatly apprecated too. Here are some questions:
1) If you use 'select distinct [vars]', does the outputted result
automatically get sorted by the values of columns in [vars]?
2) When creating macro variables in proc sql using the 'separated by'
clause, is the order of the individual values in the incoming dataset
maintained when 'strung out' to form the values of the macro variable. For
example, if you have a dataset called 'mydata' sorted on the ID field,
and you submit,
select name into :macrovar_name separated by ' ' from mydata;
will proc sql always respect the order of the data on input to give me
'Joe Goober Zeke' as the value of the macro variable, or, will it possibly
change it? For instance, might it instead reorder the values based on
sorted values of the 'name' field (ie. 'Goober Joe Zeke'), or, obey
an 'order by' clause that perhaps tells it to sort by some other field not
shown? I guess I am not to sure about timing in proc sql, especially with
macro variable creation and the 'order by' clause.
3) My spidey sense tells me that this 'distinct' statement might be more
trouble than it is worth for somebody like me without much sql experience
outside of SAS. Is it a no-no to use distinct statements in more
complicated expressions like below (just a nonsense example), where the
result of a 'distinct' statement is used as part of a join?
from mydata1 as A INNER JOIN (select distinct ID,... from mydata2) as B on
Also, what about a similar example with something like 'order by A.ID'
added to it? I know that 'order by' can be a problem with 'distinct' in a
simple sql expression if the 'order by' and 'distinct' fields are not the
same. Is a 'distinct' field/'order by' field discrepancy a problem also
when the 'order by' clause is one-step removed from the 'distinct'
statement? Sorry about the brutal misuse of terminology, I don't know what
I should be calling these things!....
Any help would be greatly appreciated! Thanks!