LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2007, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 12 Jul 2007 03:54:20 -0700
Reply-To:   Hari <excel_hari@YAHOO.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Hari <excel_hari@YAHOO.COM>
Organization:   http://groups.google.com
Subject:   Re: Finding parent-child relationship within ID variable
Comments:   To: sas-l@uga.edu
In-Reply-To:   <941871A13165C2418EC144ACB212BDB04E132B@dshsmxoly1504g.dshs.wa.lcl>
Content-Type:   text/plain; charset="us-ascii"

On Jul 12, 3:38 am, Nord...@DSHS.WA.GOV ("Nordlund, Dan (DSHS/RDA)") wrote: > > -----Original Message----- > > From: SAS(r) Discussion [mailto:S...@LISTSERV.UGA.EDU] On > > Behalf Of Hari > > Sent: Wednesday, July 11, 2007 2:39 PM > > To: S...@LISTSERV.UGA.EDU > > Subject: Finding parent-child relationship within ID variable > > > Hi, > > > I have a SAS data set A with 2 fields. Both are ID fields and all the > > digits in the ID are "numbers" and stored in $16 format. > > > The data set A was originally created by using a join of a data set B > > with itself (B*B) and only those records which met certain conditions > > where only retained in the output. The original data set B had 80 > > million obs and right now the final output data set A has close to 90 > > Million observations. > > > By doing the join operation Im trying to see as to which ID's are > > related to each other. (basically, the data B is customer level data > > which was house-holded using address logic. Presently I want to do > > additonal house-holding using email/name etc...hence the join was done > > to see which customers have same email/name but diff house-hold level > > ID's) > > > *21 sample records (for simplicity sake im putting the length of > > fields as just 2); > > > ID_1 ID_2 > > 11 67 > > 11 55 > > 11 98 > > 67 55 > > 55 67 > > 98 11 > > 98 55 > > 55 98 > > 67 98 > > 55 11 > > 98 67 > > 67 11 > > 23 45 > > 92 35 > > 45 23 > > 92 29 > > 29 35 > > 35 92 > > 29 92 > > 35 29 > > 29 35 > > > Now, for me distinction between ID_1 and ID_2 is meaningless. As long > > as ID 11 is related to ID 67, it means the same as another record > > which says ID 67 is related to ID 11. Hence, I want to reduce the > > above data set to the below format, such that only unique information > > is retained > > > ID_modified1 ID_modified_2 > > 11 67 > > 11 55 > > 11 98 > > 23 45 > > 29 35 > > 29 92 > > > (In above output am retaining the minimum ID in left and the rest > > higher ID's in right, but I can keep it anyway as long as information > > isnt duplicated) > > > Whats an efficient way to arrive at this output? (I hope to learn some > > nice techniques!) > > > Btw, I do know one method which is to Transpose data set A by ID_1 > > (and do Var ID_2) and after transposing rearranging the ID values in > > each record individually such that minimum ID is in left most variable > > and maximum in right most and so on.Finally, do a SQL distinct on the > > rearranged data set and this distinct data set can again be transposed > > by the left most ID variable to get the final output. > > > The steps that I dont really like in my method is rearranging because: > > - > > a) I really dont know in advance as to how many new transposed > > variables will be created. It can be even as big as 200 or even 400! I > > can probably do a proc contents and store the number of resulting > > fields in a macro variable > > > b) More bigger problem is rearrangng itself. Since, num of fields > > might even be 400 or so, I will usually create that many dummy data > > set variables, but again I dont know an efficient way to sort all > > variables within each record individually. > > > Please guide me. > > > regards, > > HP > > India > > Hari, > > You could try something like: > > data temp / view = temp; > set have; > if id_1 LT ID_2 then do; > t=id_1; > id_1=id_2; > id_2=t; > end; > run; > proc sql ; > create table want as > select distinct id_1, id_2 > from temp > order by id_1 > ; > quit; > > Hope this is helpful, > > Dan > > Daniel J. Nordlund > Research and Data Analysis > Washington State Department of Social and Health Services > Olympia, WA 98504-5204- Hide quoted text - > > - Show quoted text -

Daniel,

That wouldnt work in the present case, because after the first data (view) step, the temp would like (I have dropped the variable t as it is unneccesary)

ID_1 ID_2 11 67 11 55 11 98 55 67 55 67 11 98 55 98 55 98 67 98 11 55 67 98 11 67 23 45 35 92 23 45 29 92 29 35 35 92 29 92 29 35 29 35

Now when we do distinct using the "Want" data set result would like

ID_1 ID_2 11 67 11 55 11 98 23 45 29 35 29 92 35 92 55 67 55 98 67 98

Which is not equivalent to the result Im looking for. if you notice above there is a "information" duplication above wherein 11 is mapped to 67 and 55 and in a later record we have 55 mapped to 67, which is unnecessary for my case.

hp


Back to: Top of message | Previous page | Main SAS-L page