Date: Thu, 24 Apr 2003 11:07:12 -0400 "Gerstle, John" "SAS(r) Discussion" "Gerstle, John" Re: Non-sequential unique numbers To: Mike Rhoads text/plain

Mike,

I understand the method you were suggesting and agree that it would work. I was just thinking about, given a large enough sample, even an event with a very small probability of occurring has decent chance to be observed. I'm extrapolating from the Central Limit Theorem where if you have enough of a sample, you'll definitely find 'significant' findings, regardless if they are true meaningful findings.

John Gerstle CDC Information Technological Support Contract (CITS) Biostatistician

>> -----Original Message----- >> From: Mike Rhoads [mailto:RHOADSM1@WESTAT.com] >> Sent: Thursday, April 24, 2003 10:59 AM >> To: 'Gerstle, John'; SAS-L@LISTSERV.UGA.EDU >> Subject: RE: Non-sequential unique numbers >> >> John, >> >> Given that the random numbers generated are floating point, I'm not sure >> what the probability of duplication is. Note that I was not using the >> random numbers themselves as the ID, but was just sorting by the random >> number and then assigning the record number of the re-sorted file as the >> ID. >> For that approach, it doesn't matter whether there are duplicates >> (although >> it turned out that I had misunderstood what Ralph was really asking for). >> >> Mike Rhoads >> Westat >> RhoadsM1@Westat.com >> >> -----Original Message----- >> From: Gerstle, John [mailto:yzg9@cdc.gov] >> Sent: Thursday, April 24, 2003 9:33 AM >> To: Mike Rhoads; SAS-L@LISTSERV.UGA.EDU >> Subject: RE: Non-sequential unique numbers >> >> >> Mike, >> >> Wouldn't you agree, though, that even if you've create 90,000 random >> values, >> each with equal probability, that you have a good probability of creating >> at >> least one pair of duplicate id numbers? Seems you'd want to create a list >> of >> 90,000 unique random numbers and then assign each, without replacement, >> to >> each of the records in the dataset. >> >> Just a thought... >> >> John Gerstle >> CDC Information Technological Support Contract (CITS) >> Biostatistician >> >> >> >> -----Original Message----- >> >> From: Mike Rhoads [mailto:RHOADSM1@WESTAT.COM] >> >> Sent: Wednesday, April 23, 2003 6:14 PM >> >> To: SAS-L@LISTSERV.UGA.EDU >> >> Subject: Re: Non-sequential unique numbers >> >> >> >> Ralph, >> >> >> >> If by "non-sequential" you mean that it "loses" the original order of >> the >> >> records, I would just assign a random number to each record in a DATA >> >> step, >> >> sort the output by the random number, then read the sorted file back >> in >> >> and >> >> assign the record number as the ID. Something like (untested), >> >> >> >> DATA Temp; >> >> SET OriginalFile; >> >> RandomNumber = RANUNI(12345); >> >> RUN; >> >> >> >> PROC SORT DATA=Temp; >> >> BY RandomNumber; >> >> RUN; >> >> >> >> DATA Final; >> >> SET Temp; >> >> IDVAR = _N_; >> >> DROP RandomNumber; * Or don't ...; >> >> RUN; >> >> >> >> Mike Rhoads >> >> Westat >> >> RhoadsM1@Westat.com >> >> >> >> -----Original Message----- >> >> From: Ralph [mailto:rpk0524@YAHOO.COM] >> >> Sent: Wednesday, April 23, 2003 5:15 PM >> >> To: SAS-L@LISTSERV.UGA.EDU >> >> Subject: Non-sequential unique numbers >> >> >> >> >> >> I need to create a unique indentifier for 90,000 records that is >> >> non-sequential. So far, the best solution I have come up with is: >> >> >> >> a = ranuni(345)+ (ranuni(123)+ int(time())); >> >> b = int(reverse(ar_seqnum))*a; >> >> >> >> Using b as my indentifier, I can (most times) come up with unique >> >> numbers, but the real challenge is this number can be no longer than 8 >> >> bytes. Using this code, my b(s) are 12 bytes. Using SUBSTR of b for >> >> a length of 8, I get major dups. Can anyone help?

Back to: Top of message | Previous page | Main SAS-L page