Date: Fri, 8 May 1998 02:10:28 -0400 Richard A DeVenezia "SAS(r) Discussion" Richard A DeVenezia Netcom Re: Simulation Problem

Garry:

The sort/univariate/append scheme, as you well know, is doable but very i/o intensive (and hence time consuming).

I submit for you two approaches, both that allow you to process all 1,000 'faux' committees in one fell swoop. Each approach builds a dataset containing 1,000 committees of 35 random legislators in a single data step. Proc univariate can than be used using a by statement to get medians of each committee.

The first creates 1,000 'faux' committees of 35 members, but has to loop within each committee if it chooses a member already chosen. * Pentium 100->160 overdrive, * takes about 10 seconds and averages around 600 retries * (don't ask me to prove what the expected mean retry count is :)

The second is like your original approach of assigning each legislator a random value and then choosing those with the lowest random values for the 'faux' committee. This method uses a hashing scheme to allow rapid selection of lowest random valued members (i.e. sort order). * Pentium 100->160 overdrive, takes about 50 seconds

---- snip ---- * create a dummy pool of 1,000 legislators named BOB; * and their derived numerical position on some policy; * gender warning: legislators are referred to as 'him';

data slators; do i=1 to 1000; who='BOB'||trim(left(put(i,4.))); policy1 = round (500 * ranuni (0)); policy2 = round (500 * ranuni (0)); output; end; label who = 'Legislator'; label policy1 = 'Position on policy 1'; label policy2 = 'Position on policy 2'; drop i; run;

* create 1,000 groups of 35 random legislators; * as the ratio of group size to pool size approaches unity * this technique takes longer and longer due to number of * retries necessary to obtain unique group members;

%let groups=1000; %let groupsiz=35;

proc sql; drop table monte;

data monte; array slot [&groupsiz] _temporary_; retries = 0; do group = 1 to &groups; do i = 1 to hbound(slot); slot [i] = 0; end; sample = 1; do while (sample <= hbound(slot)); pickhim = 1 + floor (nobs * ranuni (0)); slot [sample] = pickhim; * make sure a person is in this group only once; tryagain = 0; do j = 1 to sample-1 while (tryagain=0); if slot [j] = pickhim then tryagain = -1; end; sample = sample + 1 + tryagain; retries = retries - tryagain; end; do sample = 1 to hbound(slot); pickhim = slot [sample]; set slators point=pickhim nobs=nobs; output; end; end; put retries=; _error_=0; stop; drop i j tryagain retries; run;

proc univariate noprint data=monte; by group; var policy1 policy2; output out=medians median=policy1 policy2; run;

proc print data=medians; run;

* create 1,000 groups of 35 random legislators; * this technique is identical in concept to the original post * but handles the sorting of random ordering internal to a data step * using a temporary array and a form of hashing; * this should be much faster than disk intensive dataset sorting and appending;

%let groups=1000; %let groupsiz=35;

proc sql; drop table monte2; reset noprint; select count(*) into :NOBS from slators;

data monte2; array order [&NOBS,3] _temporary_; do group = 1 to &groups; do i = 1 to &NOBS; order [i,1] = 0; order [i,2] = 0; order [i,3] = 0; end; * assign random value to each legislator; do i = 1 to &NOBS; random = 1 + floor (&NOBS * ranuni (0)); * store random number in sorted traversable structure; lastslot = 0; slot = order [random, 3]; if slot = 0 then slot = random; * find last slot used for this random number ; do while (order [slot, 2] ne 0); slot = order [slot, 2]; end; lastslot = slot; * find next open slot; do while (order [slot, 1] ne 0); slot + 1; if slot > &NOBS then slot = 1; end; * store random number and associated observation number; order [slot, 1] = i; * store starting slot of random number or how to reach next slot; if order [random, 3] = 0 then order [random, 3] = slot; else order [lastslot, 2] = slot; end;

* extract groupsize legislators having lowest random values; sample = 0; do i = 1 to &NOBS while (sample < &groupsiz); random = i; slot = order [i, 3]; if slot ne 0 then do; sample + 1; pickhim = order [slot, 1]; set slators point=pickhim ; output; do while (order [slot, 2] ne 0 and sample < &groupsiz); slot = order [slot, 2]; sample + 1; pickhim = order [slot, 1]; set slators point=pickhim ; output; end; end; end; end; _error_=0; stop; drop i random lastslot slot ; run;

proc univariate noprint data=monte2; by group; var policy1 policy2; output out=medians2 median=policy1 policy2; run;

proc print data=medians2; run; ---- snip ----

Garry Young wrote in message <355135DE.61A@showme.missouri.edu>... >I'm working on a difference between medians problem where I want to >compare the median of an actual legislative committee with a series of >simulated committees drawn randomly from the larger legislature. So >the basic problem is to create a series of randomly drawn committees. >As shown below I did this by assigning each individual in the >legislature a random number, sort, then take obs=35 (the number of >members on the committee). Unfortunately I've come across two problems: > >(1) The program is slow; > >(2) The program always uses up all memory -- thus locking up the machine >-- by about the 700th iteration. > >Regarding (1) each iteration takes about 2 seconds. I need to do 5000 >or so iterations and about twenty different simulations so this is a >problem. Of course, until it can be solved, problem (2) makes problem >(1) a moot point. > >I'm doing this on a Pentium 166 with 32 meg. I tried a Proc Dataset >Delete;. This helped some but not alot. I also tried running it in >batch mode with nolog. Problem (2) struck at about the same time. > >Any suggestions would be appreciated. Thanks. > >The basics of the code I'm using follows: > >data one; >infile statement >input statement > >%Macro Monte; > %Let I = 1; > %Do %While (&I<1000); > >data M_two; >set one; >x = ranuni(0); >proc sort; by x; >run; > >data M_three; >set M_two (obs = 35); >proc univariate data = M_three noprint; var var1; > output out = D_med median=D_Med; >run; > > >data M_four; >proc append base = sasdata.Ag90a data = D_Med; >run; > >%Let I = (&I + 1); > %End ; > > >%Mend Monte; > > >data two; >set one; >x = ranuni(0); >proc sort; by x; >run; > >data three; >set two (obs = 35); >proc univariate data = three noprint; var var1; > output out = D_med1 median=D_Med; >run; > >libname statement; > >DATA sasdata.Ag90a; >set D_Med1 ; >run; > >data run; >%Monte; > >data finish; >set sasdata.Ag90a; >/* proc sort; by median; */ >proc print; >run; > > > >-- >------------------------------------------------------------------ >Garry Young Phone: (573) 882-0056 >Assistant Professor FAX: (573) 884-5131 >Dept. of Political Science Email: polsgy@showme.missouri.edu >113 Professional Bldg. Web: http://www.missouri.edu/~polsgy >University of Missouri >Columbia, MO 65211 >------------------------------------------------------------------

Back to: Top of message | Previous page | Main SAS-L page