LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 1998, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 8 May 1998 02:10:28 -0400
Reply-To:   Richard A DeVenezia <radevenz@IX.NETCOM.COM>
Sender:   "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:   Richard A DeVenezia <radevenz@IX.NETCOM.COM>
Organization:   Netcom
Subject:   Re: Simulation Problem

Garry:

The sort/univariate/append scheme, as you well know, is doable but very i/o intensive (and hence time consuming).

I submit for you two approaches, both that allow you to process all 1,000 'faux' committees in one fell swoop. Each approach builds a dataset containing 1,000 committees of 35 random legislators in a single data step. Proc univariate can than be used using a by statement to get medians of each committee.

The first creates 1,000 'faux' committees of 35 members, but has to loop within each committee if it chooses a member already chosen. * Pentium 100->160 overdrive, * takes about 10 seconds and averages around 600 retries * (don't ask me to prove what the expected mean retry count is :)

The second is like your original approach of assigning each legislator a random value and then choosing those with the lowest random values for the 'faux' committee. This method uses a hashing scheme to allow rapid selection of lowest random valued members (i.e. sort order). * Pentium 100->160 overdrive, takes about 50 seconds

---- snip ---- * create a dummy pool of 1,000 legislators named BOB; * and their derived numerical position on some policy; * gender warning: legislators are referred to as 'him';

data slators; do i=1 to 1000; who='BOB'||trim(left(put(i,4.))); policy1 = round (500 * ranuni (0)); policy2 = round (500 * ranuni (0)); output; end; label who = 'Legislator'; label policy1 = 'Position on policy 1'; label policy2 = 'Position on policy 2'; drop i; run;

* create 1,000 groups of 35 random legislators; * as the ratio of group size to pool size approaches unity * this technique takes longer and longer due to number of * retries necessary to obtain unique group members;

%let groups=1000; %let groupsiz=35;

proc sql; drop table monte;

data monte; array slot [&groupsiz] _temporary_; retries = 0; do group = 1 to &groups; do i = 1 to hbound(slot); slot [i] = 0; end; sample = 1; do while (sample <= hbound(slot)); pickhim = 1 + floor (nobs * ranuni (0)); slot [sample] = pickhim; * make sure a person is in this group only once; tryagain = 0; do j = 1 to sample-1 while (tryagain=0); if slot [j] = pickhim then tryagain = -1; end; sample = sample + 1 + tryagain; retries = retries - tryagain; end; do sample = 1 to hbound(slot); pickhim = slot [sample]; set slators point=pickhim nobs=nobs; output; end; end; put retries=; _error_=0; stop; drop i j tryagain retries; run;

proc univariate noprint data=monte; by group; var policy1 policy2; output out=medians median=policy1 policy2; run;

proc print data=medians; run;

* create 1,000 groups of 35 random legislators; * this technique is identical in concept to the original post * but handles the sorting of random ordering internal to a data step * using a temporary array and a form of hashing; * this should be much faster than disk intensive dataset sorting and appending;

%let groups=1000; %let groupsiz=35;

proc sql; drop table monte2; reset noprint; select count(*) into :NOBS from slators;

data monte2; array order [&NOBS,3] _temporary_; do group = 1 to &groups; do i = 1 to &NOBS; order [i,1] = 0; order [i,2] = 0; order [i,3] = 0; end; * assign random value to each legislator; do i = 1 to &NOBS; random = 1 + floor (&NOBS * ranuni (0)); * store random number in sorted traversable structure; lastslot = 0; slot = order [random, 3]; if slot = 0 then slot = random; * find last slot used for this random number ; do while (order [slot, 2] ne 0); slot = order [slot, 2]; end; lastslot = slot; * find next open slot; do while (order [slot, 1] ne 0); slot + 1; if slot > &NOBS then slot = 1; end; * store random number and associated observation number; order [slot, 1] = i; * store starting slot of random number or how to reach next slot; if order [random, 3] = 0 then order [random, 3] = slot; else order [lastslot, 2] = slot; end;

* extract groupsize legislators having lowest random values; sample = 0; do i = 1 to &NOBS while (sample < &groupsiz); random = i; slot = order [i, 3]; if slot ne 0 then do; sample + 1; pickhim = order [slot, 1]; set slators point=pickhim ; output; do while (order [slot, 2] ne 0 and sample < &groupsiz); slot = order [slot, 2]; sample + 1; pickhim = order [slot, 1]; set slators point=pickhim ; output; end; end; end; end; _error_=0; stop; drop i random lastslot slot ; run;

proc univariate noprint data=monte2; by group; var policy1 policy2; output out=medians2 median=policy1 policy2; run;

proc print data=medians2; run; ---- snip ----

Garry Young wrote in message <355135DE.61A@showme.missouri.edu>... >I'm working on a difference between medians problem where I want to >compare the median of an actual legislative committee with a series of >simulated committees drawn randomly from the larger legislature. So >the basic problem is to create a series of randomly drawn committees. >As shown below I did this by assigning each individual in the >legislature a random number, sort, then take obs=35 (the number of >members on the committee). Unfortunately I've come across two problems: > >(1) The program is slow; > >(2) The program always uses up all memory -- thus locking up the machine >-- by about the 700th iteration. > >Regarding (1) each iteration takes about 2 seconds. I need to do 5000 >or so iterations and about twenty different simulations so this is a >problem. Of course, until it can be solved, problem (2) makes problem >(1) a moot point. > >I'm doing this on a Pentium 166 with 32 meg. I tried a Proc Dataset >Delete;. This helped some but not alot. I also tried running it in >batch mode with nolog. Problem (2) struck at about the same time. > >Any suggestions would be appreciated. Thanks. > >The basics of the code I'm using follows: > >data one; >infile statement >input statement > >%Macro Monte; > %Let I = 1; > %Do %While (&I<1000); > >data M_two; >set one; >x = ranuni(0); >proc sort; by x; >run; > >data M_three; >set M_two (obs = 35); >proc univariate data = M_three noprint; var var1; > output out = D_med median=D_Med; >run; > > >data M_four; >proc append base = sasdata.Ag90a data = D_Med; >run; > >%Let I = (&I + 1); > %End ; > > >%Mend Monte; > > >data two; >set one; >x = ranuni(0); >proc sort; by x; >run; > >data three; >set two (obs = 35); >proc univariate data = three noprint; var var1; > output out = D_med1 median=D_Med; >run; > >libname statement; > >DATA sasdata.Ag90a; >set D_Med1 ; >run; > >data run; >%Monte; > >data finish; >set sasdata.Ag90a; >/* proc sort; by median; */ >proc print; >run; > > > >-- >------------------------------------------------------------------ >Garry Young Phone: (573) 882-0056 >Assistant Professor FAX: (573) 884-5131 >Dept. of Political Science Email: polsgy@showme.missouri.edu >113 Professional Bldg. Web: http://www.missouri.edu/~polsgy >University of Missouri >Columbia, MO 65211 >------------------------------------------------------------------


Back to: Top of message | Previous page | Main SAS-L page