Date: Tue, 16 Dec 2008 16:29:36 -0500
Reply-To: "Simon, Lorna" <Lorna.Simon@UMASSMED.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Simon, Lorna" <Lorna.Simon@UMASSMED.EDU>
Subject: Setting up data to do mixed proceedure
Content-Type: text/plain; charset=us-ascii
I am trying to create a dataset to use with the mixed proceedure. As far
as I understand it, you need to have several lines of data for each
person, like this:
Person gender age totalcosts
1 F 21 2000
1 F 21 300
2 M 40 0
3 M 35 100
4 F 25 200
4 F 25 0
4 F 25 2000
I have a dataset which contains 1 line of data for each person for the
initial interview, and
A dataset with several lines of data for each of the follow-up
interviews.
So, to get the data into the form I need, I first merge the initial
dataset with the follow-up dataset to get all of the independent
variables onto each line of data in the follow-up dataset. Here is my
log:
276 data followup_full;
277 merge hhg.initial_interview2(keep=clientid gender
no_health_insurance substance_abuse racew racebl raceas raceai
277! racenh housing_type time_homeless age)
278 hhg.follow_up_interview2(keep=clientid interview__
er_visits hospitalization_days ambulance_times
278! McInnis_House_days
279 detox_days shelter_nights
incaceration_days);
280 by clientid;
281 run;
NOTE: There were 180 observations read from the data set
HHG.INITIAL_INTERVIEW2.
NOTE: There were 1631 observations read from the data set
HHG.FOLLOW_UP_INTERVIEW2.
NOTE: The data set WORK.FOLLOWUP_FULL has 1646 observations and 20
variables.
NOTE: DATA statement used (Total process time):
real time 0.26 seconds
cpu time 0.03 seconds
Then I set the initial interview data along with the follow-up data
created in my 1st datastep, and perform some calculations needed for the
mixed procedure. The log follows:
284 data hhg.init_fu_mixed_costs;
285 set followup_full
286 hhg.initial_interview2 (rename=(er_visits1=er_visits
hospitalization_days1=hospitalization_days
287 ambulance_times1=ambulance_times
McInnis_House_days1=McInnis_House_days
288 detox_center_days1=detox_days
er_shelter_nights1=shelter_days
289
incaceration_nights1=incaceration_days));
290 ercost=640;
291 hospcost=1895;
292 ambulancecost=230;
293 respitecost=400;
294 detoxcost=198;
295 sheltercost=32;
296 jailcost=118;
297
298 array numbers {*} er_visits hospitalization_days ambulance_times
shelter_days incaceration_days detox_days
298! McInnis_house_days;
299 array costs {*} ercost hospcost ambulancecost sheltercost jailcost
detoxcost respitecost;
300 array totcosts {*} tercost thospcost tambulancecost tsheltercost
tjailcost tdetoxcost trespitecost;
301 do i=1 to dim(numbers);
302 totcosts{i}=numbers{i}*costs{i};
303 end;
304
305 totalcost=sum(tercost, thospcost, tambulancecost, tsheltercost,
tjailcost, tdetoxcost, trespitecost);
306
307 if raceai=1 or raceas=1 or racebl=1 or racenh=1 then white=0;
308 else if raceai=0 and raceas=0 and racebl=0 and racenh=0 and racew=1
then white=1;
309 else white=.;
310
311 if gender="Male" then male=1;
312 else if gender="Female" then male=0;
313 else gender=" ";
314
315 if housing_type=1 then scattered_housing=1;
316 else if housing_type=. then scattered_housing=.;
317 else scattered_housing=0;
318
NOTE: Missing values were generated as a result of performing an
operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1832 at 302:27 29 at 305:11
NOTE: There were 1646 observations read from the data set
WORK.FOLLOWUP_FULL.
NOTE: There were 180 observations read from the data set
HHG.INITIAL_INTERVIEW2.
NOTE: The data set HHG.INIT_FU_MIXED_COSTS has 1826 observations and 97
variables.
NOTE: DATA statement used (Total process time):
real time 0.53 seconds
cpu time 0.03 seconds
I get the resulting dataset:
Obs clientid Interview__ totalcost male insurance abuse
white housing homeless age
1765 B006 296 30804 1 0 0
1 1 312 48
1766 B006 2036 18990 1 0 0
1 1 312 48
1767 B006 2402 15615 1 0 0
1 1 312 48
1768 B007 297 17500 1 0 0
1 0 60 59
1769 B007 2426 1600 1 0 0
1 0 60 59
1770 B008 . . 0 0 1
1 0 72 41
1771 B008 298 5274 0 0 1
1 0 72 41
1772 B009 . . 0 0 1
1 0 180 47
1773 B009 304 22227 0 0 1
1 0 180 47
1774 B010 311 12740 1 0 1
1 0 84 43
1775 B010 2227 12000 1 0 1
1 0 84 43
1776 B012 318 14000 1 0 0
1 1 108 55
1777 B012 2440 7596 1 0 0
1 1 108 55
1778 B013 319 15988 1 0 1
1 1 228 50
1779 B013 2403 0 1 0 1
1 1 228 50
1780 B014 . . 1 0 0
1 1 360 59
1781 B014 339 74050 1 0 0
1 1 360 59
1782 B016 . . 1 0 1
1 1 144 59
1783 B016 340 39712 1 0 1
1 1 144 59
1784 B017 . . 1 0 1
1 1 168 53
1785 B017 341 69660 1 0 1
1 1 168 53
1786 BH00 257 1088 1 0 0
1 1 24 23
1787 BH00 1689 0 1 0 0
1 1 24 23
1788 BH00 1807 0 1 0 0
1 1 24 23
1789 BH00 2023 0 1 0 0
1 1 24 23
1790 BH00 2104 0 1 0 0
1 1 24 23
1791 BH00 2154 0 1 0 0
1 1 24 23
1792 BH00 2384 0 1 0 0
1 1 24 23
1793 BH00 2482 0 1 0 0
1 1 24 23
1794 BH03 259 0 1 0 1
1 1 312 46
1795 BH03 1809 0 1 0 1
1 1 312 46
1796 BH03 1810 0 1 0 1
1 1 312 46
1797 BH03 1811 0 1 0 1
1 1 312 46
1798 BH03 2025 0 1 0 1
1 1 312 46
1799 BH03 2026 0 1 0 1
1 1 312 46
1800 BH03 2107 2026 1 0 1
1 1 312 46
The variable interview__ (interview number) is not in the dataset for
the initial interview, so the interview number for the first observation
for each person should be missing. As you can see, for some people it is
missing, for others it is not.
I hope this is clear. Can anyone figure out what I'm doing wrong? Any
help would be appreciated.