Date: Wed, 1 May 2002 13:30:51 -0400
Reply-To: "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Subject: Re: [DATA STEP]
Content-Type: text/plain; charset=iso-8859-1
Venky,
Indeed _iorc_ (a "genuiely" retained variable) can be used this way. In this
case, though, placing the sum statement before the DOW (where it logically
belongs) obviates the necessity to initialize _iorc_:
data w ;
_iorc_ ++ 1 ;
do until (last.col) ;
set q ;
by col notsorted ;
colid = trim(col)||"_"||put(_iorc_,best.-L) ;
output ;
end ;
run ;
Using an explicit file-reading loop renders all the retain issues
irrelevant. Within such a loop, the only variables ever changed by a hidden
instruction are those set to missing before a fresh by-group - and even that
can be thought of as a result of a BY statement. Thus, with an explicit
loop, _n_ can be used as well:
data w ;
do _n_ = 1 by 1 ;
do until (last.col) ;
set q ;
by col notsorted ;
colid = trim(col)||"_"||put(_n_,best.-L) ;
output ;
end ;
end ;
run ;
Note that even though the outer loop appears to be infinite, it will stop as
soon as the input from Q has been exhausted. Coding EOF is cleaner, but in
this case, not necessary.
An interesting question is, how would one approach the problem if the
records were not grouped and the user still wanted to retain their original
order - without double-sorting? Then we should somehow memorize the keys we
have already hit. In V9, the perfect tool for this is of course the hash
table. However, because of the simplicity of the situation, it will hardly
require more lines of code even under the current version. Let us, for the
sake of simplicity, limit ourselves with the maximum of 100,000 distinct
keys on the file:
data q ;
input col $ ;
cards ;
one
two
one
three
two
one
two
one
three
one
run ;
%let h = 200003 ;
data w ( drop = j n );
array c (0:&h) $ _temporary_ ;
array x (0:&h) _temporary_ ;
set q ;
do j = mod(input(col,pib6.), &h) until ( c(j) = col ) ;
if j = &h then j = 0 ;
if x(j) = . then do ;
n ++ 1 ;
x(j) = n ;
c(j) = col ;
end ;
end ;
colid = trim(col) || '_' || put(x(j), best.-l) ;
run ;
Here is the output:
Obs col colid
1 one one_1
2 two two_2
3 one one_1
4 three three_3
5 two two_2
6 one one_1
7 two two_2
8 one one_1
9 three three_3
10 one one_1
Kind regards,
================
Paul M. Dorfman
Jacksonville, FL
================
> -----Original Message-----
> From: Chakravarthy, Venky [mailto:Venky.Chakravarthy@PFIZER.COM]
> Sent: Wednesday, May 01, 2002 12:14 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: [DATA STEP]
>
>
> Zubrowska,
>
> The underlying theme in all the replies is to use the
> NOTSORTED option. Mine
> uses the same but I am providing a continuation to
> yesterday's theme on the
> RETAIN. This solution merely demonstrates that an
> automatically retained
> variable (with the exception of _n_ and _error_) can be
> initialized with a
> value in a RETAIN and put to good use in the ubiquitous DOW:
>
> data q ;
> input col $ ;
> cards ;
> one
> one
> one
> two
> two
> two
> one
> one
> three
> three
> run ;
>
> data w ;
> retain _iorc_ 1 ;
> do until (last.col) ;
> set q ;
> by col notsorted ;
> colid = trim(col)||"_"||put(_iorc_,best.-L) ;
> output ;
> end ;
> _iorc_ + 1 ;
> run ;
>
> Kind Regards,
>
> Venky
> #****************************************#
> # E-mail: swovcc@hotmail.com #
> # Phone: (734) 622-1963 #
> #****************************************#
>
>
> -----Original Message-----
> From: zubrowka [mailto:zubrowka@gmx.net]
> Sent: Wednesday, May 01, 2002 11:39 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: [DATA STEP]
>
>
> Hi all,
>
> here is my small problem.
> I have a table like that.
>
> obs col
> 1 one
> 2 one
> 3 one
> 4 two
> 5 two
> 6 two
> 7 one
> 8 one
> 9 three
> 10 three
>
> I want to obtain this
>
>
> obs col colid
> 1 one one_1
> 2 one one_1
> 3 one one _1
> 4 two two_2
> 5 two two_2
> 6 two two_2
> 7 one one_3
> 8 one one_3
> 9 three three_4
> 10 three three_4
> etc
>
> Obviously i cant do a proc sort by col because i will loose the order
> of data, which is important. I didn't manage to find a solution. How
> can i solve that.
>
> Thanxs in advance for replying.
>
>
> Zubrowka
>
Blue Cross Blue Shield of Florida, Inc., and its subsidiary and
affiliate companies are not responsible for errors or omissions in this e-mail message. Any personal comments made in this e-mail do not reflect the views of Blue Cross Blue Shield of Florida, Inc.
|