Date: Tue, 18 Mar 2008 20:53:35 -0500
Reply-To: "Richard A. DeVenezia" <rdevenezia@WILDBLUE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Richard A. DeVenezia" <rdevenezia@WILDBLUE.NET>
Subject: Re: How to link all children in a family to the parent?
Content-Type: text/plain; charset="iso-8859-15"
Yang Xiang wrote:
> I have a dataset in which every parent has a missing "linkto" ID, but
> every child has a "linkto" ID pointed to the previous sibling. Only
> the oldest child in the family has a "linkto" ID pointed to the
> parent. How do I assign the parent ID to every child in the family?
The terms parent and child threw me at first, I was thinking geneology where
a child has one or more parents.
This problem is more clear to me when the terms 'trails' (or paths) and
'segments' are used. Each record in the data defines a segment.
Here is a cleaner hash solution that should operate very fast. Note that
there are no guards in place to check if the segment data defines a circular
loop. Loops never reach the root id and will run infinitely. You had
mentioned very long runtime on your first solution try, this might indicate
that you do indeed have loops in your data.
input id pid;
data _null_ / debug;
retain id pid rootid .;
declare hash segments (ordered:'a');
segments.definedata ('id', 'pid', 'rootid');
do until (lastrow);
set segments end=lastrow;
declare hiter hi('segments');
do while (hi.next()=0);
id0 = id;
pid0 = pid;
* loop until rootid found or invalid link;
* there are no checks for looping;
do while (1);
if pid = . then do;
rootid = id;
if rootid ne . then
id = pid;
* look for earlier segments;
if segments.find() ne 0 then do;
rootid = -1; *unknown;
id = id0;
pid = pid0;
Richard A. DeVenezia