Date: Tue, 28 Sep 1999 11:39:49 +0200
Reply-To: Winston Groenewald <jacov@ABSA.CO.ZA>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Winston Groenewald <jacov@ABSA.CO.ZA>
Subject: Re: Interesting problem
Content-Type: text/plain; charset="iso-8859-1"
Thanks to everyone who set aside time to develop code for my tree data set,
specifically the two Bobs (Bullock and Krajcik) who provided the best
solutions.
This list is a valuable source of information - I just wish that I had more
time at my disposal to contribute to some of the questions posed.
Winston Groenewald
SAS Software & Data Mining Consultant
USKO Project House
Rivonia, Sandton
South Africa.
-----Original Message-----
From: R P Bullock <{$news$}@NOSPAM.DEMON.CO.UK>
Newsgroups: bit.listserv.sas-l
To: SAS-L@LISTSERV.UGA.EDU <SAS-L@LISTSERV.UGA.EDU>
Date: Friday, September 24, 1999 3:03 PM
Subject: Re: Interesting problem
>In article <000f01bf04f9$6df231a0$b103460a@winstong.absa.co.za>, Winston
>Groenewald <jacov@ABSA.CO.ZA> writes
>>I have a data set for which I have to produce an expanded version showing
>>all the possible hierarchies that may exists among two fields. My data
set
>>looks like this
>>
><snip>
>>
>>Can anyone provide a more elegant solution?
>>
>>Regards
>>Winston Groenewald
>>Johannesburg, South Africa
>
>Winston,
>
>This is one of those little nuggets that you think you've solved, and
>then it comes back and bites you where it hurts.
>
>I've had many attempts at dealing with this, and it seems to me that you
>need to know your data. The easy one is where you have a true hierarchy
>and hence can be confident that each chain will resolve cleanly. If the
>following occurs:
>
>3209 APTY
>APTY 3209
>
>you have a problem.
>
>I have used variations on the following with some success where the data
>is clean. It uses a format to map var1 onto var2 and repeatedly resolve
>the mapping until you come to the end of the chain. The size of the
>format required will be a limiting factor.
>
>data source;
> length var1 $4
> var2 $4
> ;
> input var1 var2;
> cards;
>3209 APTY
>3210 APTY
>3212 APTY
>3213 ACOP
>APTY AIMS
>ARAF AIMS
>;
>
>data cntlin;
> length fmtname $8
> type $1
> hlo $1
> ;
> retain fmtname 'MAPSTO'
> type 'C'
> hlo ' '
> ;
> set source(rename=(var1=start var2=label))
> end=last
> ;
> output;
> if last;
> start=' ';
> hlo='O';
> label='ff'x;
> output;
> run;
>
>proc format cntlin=cntlin;
> run;
>
>data result;
> set source(keep=var1);
> length answer $200;
> mapsto=var1;
> do until (mapsto='ff'x);
> mapsto=put(mapsto,$mapsto.);
> if mapsto^='ff'x then answer=trim(answer)||' '||mapsto;
> end;
> run;
>
>
>Notes:
>
>. a format control data set is created from the source data as
>input to the format procedure.
>
>. the 'other' value of 'ff'x is to stop processing when
>encountering orphans.
>
>. there is no trap to stop looping if the chain resolves to
>itself, or worse, resolves to another 'self' in the middle of the chain.
>You may want to put a counter in the loop as an alternative escape (plus
>message).
>
>. the code is not particularly pretty and suffers from the double
>condition of the 'until' and the 'if'.
>
>. I'm not sure what you want as output - this code just writes the
>character variable 'answer'. It may be more appropriate to write the
>answer to an external file, or write 'var1' and 'mapsto' to a SAS data
>set for further manipulation or presentation.
>
>Worth a dabble?
>
>Bob
>
>--
>R P Bullock
|