LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2010, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 18 Jan 2010 09:35:00 -0800
Reply-To:     Ai Hua Wang <aihuawang@YAHOO.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ai Hua Wang <aihuawang@YAHOO.COM>
Subject:      Re: assign a unique random integer to each unique id - thank you
Comments: To: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
In-Reply-To:  <2fc7f3341001180859y5abd0a80x960f4147292f9e9d@mail.gmail.com>
Content-Type: text/plain; charset=iso-8859-1

Hi Everybody who helped me:   Please accept my big thank you.   I have tried every solution. They all work very well. You guys are very good at SAS.   Thank you very much again. Aihua

--- On Mon, 1/18/10, Muthia Kachirayan <muthia.kachirayan@GMAIL.COM> wrote:

From: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM> Subject: Re: assign a unique random integer to each unique id To: SAS-L@LISTSERV.UGA.EDU Received: Monday, January 18, 2010, 4:59 PM

Aihua,

Your clarification to Dan makes your need understood.  My earlier array solution permutes the observation numbers  in random order to get the urands. There is some chance that both of them may be same. The urands will suggest  the similarity between observation numbers and urands. This can be rectified by choosing a multiplier to the number of observations to get a random sequence of urands.

The test data set is:

data have; do key = 1 to 10;    sat = put(key * 100 + key, z4.);    output; end; run;

The data set, HAVE, has KEY as the primary key in ascending order with some SAT data.

                                           key    sat                                              1    0101                                              2    0202                                              3    0303                                              4    0404                                              5    0505                                              6    0606                                              7    0707                                              8    0808                                              9    0909                                             10    1010

The following program creates new UNIQUE IDs(uid) based on the number of observations in the data set and note that the KEY is not used in the process. Let us use a number, NUM_FOLD, to multiply the observation number, say 1000.

%let num_fold = 1000;

data need; if _n_ = 1 then do;    declare hash h(hashexp:16);    h.definekey('uid');    h.definedata('RID','key','uid');    h.definedone(); end; do RID = 1 to num;    set have nobs = num ;    uid = ceil(ranuni(123) * num * &num_fold);    do rc = h.check() by 0 while (rc = 0);       uid = ceil(ranuni(123) * num * &num_fold);  ** Try another random number ;       rc = h.check();    end;    h.add();    output; end; h.output(dataset:'LOOKUP'); stop; drop rc key; run;

proc print data = need; run;

The data set, NEED, gives the UID for the corresponding Record ID(RID) and the KEY is dropped to keep the secrecy of data set.

                                       RID    sat      uid                                          1    0101    7504                                          2    0202    3210                                          3    0303    1784                                          4    0404    9061                                          5    0505    3572                                          6    0606    2212                                          7    0707    7865                                          8    0808    3981                                          9    0909    1247                                         10    1010    1877

My earlier array solution is a special case when NUM_FOLD = 1.

This program also gives another data set, LOOKUP, giving links to RID/KEY. When it is sorted by UID, the reverse process of getting RID/KEY becomes easy.

There is another possiblity of creating UIDs based on the KEYs and not based on the observation number. The following program does it based on MOD() function. However, KEYs, have to nemeric but this restriction can be removed in some circumstances when the character-type KEYs can be changed to numeric by the use of functions like, PIBw. For further details refer to Dorfman(Key indexing, Bitmapping and Hashing).

data need; if _n_ = 1 then do;    declare hash h(hashexp:16);    h.definekey('uid');    h.definedata('RID','key','uid');    h.definedone(); end; do RID = 1 to num;    set have nobs = num ;    uid = mod(key, num) + 1;    do rc = h.check() by 0 while (rc = 0);    ** Try another random number ;       uid = uid + 1;       if uid > num then uid = 1;       rc = h.check();    end;    h.add();    output; end; h.output(dataset:'LOOKUP'); stop; drop rc key; run;

Do you find this program useful to solve your issue ? Your feedback will be useful to SAS-Lers to give alternate solutions.

Kind regards, Muthia Kachirayan

On Sat, Jan 16, 2010 at 5:30 PM, Ai Hua Wang <aihuawang@yahoo.com> wrote:

> Hi Dan: > > Thank you very much for your thoughtful follow up. Please see my answers > below. > > Why does your multiplier need to be proportional to dataset size? > That is just my thought after I tried. Because when I use the smaller > multiplier I got much more duplicates. When I increase it I got less. > Eventually I found that it should be at least propotional to the size of the > data set. > > Why do you want random integers assigned to your data? > I need to use the assigned random integers as the unique id to allow the > data users to identify each unique record. I thought it is better to use the > integer than the decimal numbers. > > And why do they need to be unique? > See above description and plus: > It is used as the replacement of the sensitive information (unique id) for > the privacy and confidentialiy concern. > > I hope this is helpful when you provide more insightful answers. > > Best Regards, > Aihua > > __________________________________________________________________ Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now http://ca.toolbar.yahoo.com.


Back to: Top of message | Previous page | Main SAS-L page