|
Garry:
The sort/univariate/append scheme, as you well know, is
doable but very i/o intensive (and hence time consuming).
I submit for you two approaches, both that allow you to
process all 1,000 'faux' committees in one fell swoop.
Each approach builds a dataset containing 1,000 committees
of 35 random legislators in a single data step. Proc univariate
can than be used using a by statement to get medians of
each committee.
The first creates 1,000 'faux' committees of 35 members,
but has to loop within each committee if it chooses a member
already chosen.
* Pentium 100->160 overdrive,
* takes about 10 seconds and averages around 600 retries
* (don't ask me to prove what the expected mean retry count is :)
The second is like your original approach of assigning each
legislator a random value and then choosing those with the
lowest random values for the 'faux' committee.
This method uses a hashing scheme to allow rapid selection
of lowest random valued members (i.e. sort order).
* Pentium 100->160 overdrive, takes about 50 seconds
---- snip ----
* create a dummy pool of 1,000 legislators named BOB;
* and their derived numerical position on some policy;
* gender warning: legislators are referred to as 'him';
data slators;
do i=1 to 1000;
who='BOB'||trim(left(put(i,4.)));
policy1 = round (500 * ranuni (0));
policy2 = round (500 * ranuni (0));
output;
end;
label who = 'Legislator';
label policy1 = 'Position on policy 1';
label policy2 = 'Position on policy 2';
drop i;
run;
* create 1,000 groups of 35 random legislators;
* as the ratio of group size to pool size approaches unity
* this technique takes longer and longer due to number of
* retries necessary to obtain unique group members;
%let groups=1000;
%let groupsiz=35;
proc sql;
drop table monte;
data monte;
array slot [&groupsiz] _temporary_;
retries = 0;
do group = 1 to &groups;
do i = 1 to hbound(slot);
slot [i] = 0;
end;
sample = 1;
do while (sample <= hbound(slot));
pickhim = 1 + floor (nobs * ranuni (0));
slot [sample] = pickhim;
* make sure a person is in this group only once;
tryagain = 0;
do j = 1 to sample-1 while (tryagain=0);
if slot [j] = pickhim then tryagain = -1;
end;
sample = sample + 1 + tryagain;
retries = retries - tryagain;
end;
do sample = 1 to hbound(slot);
pickhim = slot [sample];
set slators point=pickhim nobs=nobs;
output;
end;
end;
put retries=;
_error_=0;
stop;
drop i j tryagain retries;
run;
proc univariate noprint data=monte;
by group;
var policy1 policy2;
output out=medians median=policy1 policy2;
run;
proc print data=medians;
run;
* create 1,000 groups of 35 random legislators;
* this technique is identical in concept to the original post
* but handles the sorting of random ordering internal to a data step
* using a temporary array and a form of hashing;
* this should be much faster than disk intensive dataset sorting and
appending;
%let groups=1000;
%let groupsiz=35;
proc sql;
drop table monte2;
reset noprint;
select count(*) into :NOBS from slators;
data monte2;
array order [&NOBS,3] _temporary_;
do group = 1 to &groups;
do i = 1 to &NOBS;
order [i,1] = 0;
order [i,2] = 0;
order [i,3] = 0;
end;
* assign random value to each legislator;
do i = 1 to &NOBS;
random = 1 + floor (&NOBS * ranuni (0));
* store random number in sorted traversable structure;
lastslot = 0;
slot = order [random, 3];
if slot = 0 then slot = random;
* find last slot used for this random number ;
do while (order [slot, 2] ne 0);
slot = order [slot, 2];
end;
lastslot = slot;
* find next open slot;
do while (order [slot, 1] ne 0);
slot + 1;
if slot > &NOBS then slot = 1;
end;
* store random number and associated observation number;
order [slot, 1] = i;
* store starting slot of random number or how to reach next slot;
if order [random, 3] = 0 then
order [random, 3] = slot;
else
order [lastslot, 2] = slot;
end;
* extract groupsize legislators having lowest random values;
sample = 0;
do i = 1 to &NOBS while (sample < &groupsiz);
random = i;
slot = order [i, 3];
if slot ne 0 then do;
sample + 1;
pickhim = order [slot, 1];
set slators point=pickhim ;
output;
do while (order [slot, 2] ne 0 and sample < &groupsiz);
slot = order [slot, 2];
sample + 1;
pickhim = order [slot, 1];
set slators point=pickhim ;
output;
end;
end;
end;
end;
_error_=0;
stop;
drop i random lastslot slot ;
run;
proc univariate noprint data=monte2;
by group;
var policy1 policy2;
output out=medians2 median=policy1 policy2;
run;
proc print data=medians2;
run;
---- snip ----
Garry Young wrote in message <355135DE.61A@showme.missouri.edu>...
>I'm working on a difference between medians problem where I want to
>compare the median of an actual legislative committee with a series of
>simulated committees drawn randomly from the larger legislature. So
>the basic problem is to create a series of randomly drawn committees.
>As shown below I did this by assigning each individual in the
>legislature a random number, sort, then take obs=35 (the number of
>members on the committee). Unfortunately I've come across two problems:
>
>(1) The program is slow;
>
>(2) The program always uses up all memory -- thus locking up the machine
>-- by about the 700th iteration.
>
>Regarding (1) each iteration takes about 2 seconds. I need to do 5000
>or so iterations and about twenty different simulations so this is a
>problem. Of course, until it can be solved, problem (2) makes problem
>(1) a moot point.
>
>I'm doing this on a Pentium 166 with 32 meg. I tried a Proc Dataset
>Delete;. This helped some but not alot. I also tried running it in
>batch mode with nolog. Problem (2) struck at about the same time.
>
>Any suggestions would be appreciated. Thanks.
>
>The basics of the code I'm using follows:
>
>data one;
>infile statement
>input statement
>
>%Macro Monte;
> %Let I = 1;
> %Do %While (&I<1000);
>
>data M_two;
>set one;
>x = ranuni(0);
>proc sort; by x;
>run;
>
>data M_three;
>set M_two (obs = 35);
>proc univariate data = M_three noprint; var var1;
> output out = D_med median=D_Med;
>run;
>
>
>data M_four;
>proc append base = sasdata.Ag90a data = D_Med;
>run;
>
>%Let I = (&I + 1);
> %End ;
>
>
>%Mend Monte;
>
>
>data two;
>set one;
>x = ranuni(0);
>proc sort; by x;
>run;
>
>data three;
>set two (obs = 35);
>proc univariate data = three noprint; var var1;
> output out = D_med1 median=D_Med;
>run;
>
>libname statement;
>
>DATA sasdata.Ag90a;
>set D_Med1 ;
>run;
>
>data run;
>%Monte;
>
>data finish;
>set sasdata.Ag90a;
>/* proc sort; by median; */
>proc print;
>run;
>
>
>
>--
>------------------------------------------------------------------
>Garry Young Phone: (573) 882-0056
>Assistant Professor FAX: (573) 884-5131
>Dept. of Political Science Email: polsgy@showme.missouri.edu
>113 Professional Bldg. Web: http://www.missouri.edu/~polsgy
>University of Missouri
>Columbia, MO 65211
>------------------------------------------------------------------
|