LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2003, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 24 Apr 2003 11:07:12 -0400
Reply-To:   "Gerstle, John" <yzg9@CDC.GOV>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Gerstle, John" <yzg9@CDC.GOV>
Subject:   Re: Non-sequential unique numbers
Comments:   To: Mike Rhoads <RHOADSM1@WESTAT.com>
Content-Type:   text/plain

Mike,

I understand the method you were suggesting and agree that it would work. I was just thinking about, given a large enough sample, even an event with a very small probability of occurring has decent chance to be observed. I'm extrapolating from the Central Limit Theorem where if you have enough of a sample, you'll definitely find 'significant' findings, regardless if they are true meaningful findings.

John Gerstle CDC Information Technological Support Contract (CITS) Biostatistician

>> -----Original Message----- >> From: Mike Rhoads [mailto:RHOADSM1@WESTAT.com] >> Sent: Thursday, April 24, 2003 10:59 AM >> To: 'Gerstle, John'; SAS-L@LISTSERV.UGA.EDU >> Subject: RE: Non-sequential unique numbers >> >> John, >> >> Given that the random numbers generated are floating point, I'm not sure >> what the probability of duplication is. Note that I was not using the >> random numbers themselves as the ID, but was just sorting by the random >> number and then assigning the record number of the re-sorted file as the >> ID. >> For that approach, it doesn't matter whether there are duplicates >> (although >> it turned out that I had misunderstood what Ralph was really asking for). >> >> Mike Rhoads >> Westat >> RhoadsM1@Westat.com >> >> -----Original Message----- >> From: Gerstle, John [mailto:yzg9@cdc.gov] >> Sent: Thursday, April 24, 2003 9:33 AM >> To: Mike Rhoads; SAS-L@LISTSERV.UGA.EDU >> Subject: RE: Non-sequential unique numbers >> >> >> Mike, >> >> Wouldn't you agree, though, that even if you've create 90,000 random >> values, >> each with equal probability, that you have a good probability of creating >> at >> least one pair of duplicate id numbers? Seems you'd want to create a list >> of >> 90,000 unique random numbers and then assign each, without replacement, >> to >> each of the records in the dataset. >> >> Just a thought... >> >> John Gerstle >> CDC Information Technological Support Contract (CITS) >> Biostatistician >> >> >> >> -----Original Message----- >> >> From: Mike Rhoads [mailto:RHOADSM1@WESTAT.COM] >> >> Sent: Wednesday, April 23, 2003 6:14 PM >> >> To: SAS-L@LISTSERV.UGA.EDU >> >> Subject: Re: Non-sequential unique numbers >> >> >> >> Ralph, >> >> >> >> If by "non-sequential" you mean that it "loses" the original order of >> the >> >> records, I would just assign a random number to each record in a DATA >> >> step, >> >> sort the output by the random number, then read the sorted file back >> in >> >> and >> >> assign the record number as the ID. Something like (untested), >> >> >> >> DATA Temp; >> >> SET OriginalFile; >> >> RandomNumber = RANUNI(12345); >> >> RUN; >> >> >> >> PROC SORT DATA=Temp; >> >> BY RandomNumber; >> >> RUN; >> >> >> >> DATA Final; >> >> SET Temp; >> >> IDVAR = _N_; >> >> DROP RandomNumber; * Or don't ...; >> >> RUN; >> >> >> >> Mike Rhoads >> >> Westat >> >> RhoadsM1@Westat.com >> >> >> >> -----Original Message----- >> >> From: Ralph [mailto:rpk0524@YAHOO.COM] >> >> Sent: Wednesday, April 23, 2003 5:15 PM >> >> To: SAS-L@LISTSERV.UGA.EDU >> >> Subject: Non-sequential unique numbers >> >> >> >> >> >> I need to create a unique indentifier for 90,000 records that is >> >> non-sequential. So far, the best solution I have come up with is: >> >> >> >> a = ranuni(345)+ (ranuni(123)+ int(time())); >> >> b = int(reverse(ar_seqnum))*a; >> >> >> >> Using b as my indentifier, I can (most times) come up with unique >> >> numbers, but the real challenge is this number can be no longer than 8 >> >> bytes. Using this code, my b(s) are 12 bytes. Using SUBSTR of b for >> >> a length of 8, I get major dups. Can anyone help?


Back to: Top of message | Previous page | Main SAS-L page