|
James Young
03/16/98 03:28 PM
I've encountered a very strange and hard to reproduce data step problem.
After creating a file which I use to initialize a future PDV and then
overlay results (to set the order and create placeholders for missing
variables), the resultant values of variables that were initialized is
altered - sometimes.
I have a categorical variable from which I create a series of dummy
variables - one for each value of the categorical value. The is
accomplished through the use of a SAS-L macro (%gendummy) and appears to do
exactly what I need. This is my 'Initialization File". When the combined
betas from three different regressions are overlayed on this initialization
file the values of the betas change between the execution of the data step
and the writing of the data set. The initialization file does not contain
the variable INTERCEP and this variable is not effected. Strangely, if I
manually create the initialization file, and do not make any other changes
to the program, the beta values are not altered.
Here is a little teaser of what I mean:
Printout of Parameter Estimates During Execution (BETA.SSD01)
CCI0002=-1469.274747 CCI0003=-1320.497763 CCI0004=670.48080808
INTERCEP=1630.3191919 EPTYPE=AST _ERROR_=0 _N_=1
CCI0002=-150.6156476 CCI0003=-88.44336357 CCI0004=-93.00689434
INTERCEP=310.72294372 EPTYPE=BKM _ERROR_=0 _N_=2
CCI0002=-2713.470426 CCI0003=-2886.145981 CCI0004=-965.4654255
INTERCEP=3330.5904255 EPTYPE=DM_ _ERROR_=0 _N_=3
Printout of Combined Dataset After Execution(BETA.SSD01) - V1
(NOTE: ALTERED VALUES PAST THE SECOND DECIMAL POINT!!)
AST|-1469.274414|-1320.49707|670.48046875|1630.3191919|
BKM|-150.6156006|-88.44335938|-93.00683594|310.72294372|
DM_|-2713.46875|-2886.144531|-965.465332|3330.5904255|
Printout of Combined Dataset After Execution(BETA.SSD01) - V2
(NOTE: VALUES MATCH EXECUTION PHASE)
AST|-1469.274747|-1320.497763|670.48080808|1630.3191919|
BKM|-150.6156476|-88.44336357|-93.00689434|310.72294372|
DM_|-2713.470426|-2886.145981|-965.4654255|3330.5904255|
Please hit delete now if this is uninteresting. If, however, you're so
inclined to give me a hand with figuring this out, the LOG and LST files
from the run that alters the values are included below. You'll see in the
LOG file a commented alternate data step which produces the expected
results - values are not altered. I apologize for not being able to
reproduce the alteration in a simpler example.
----------------------------------------------------------------
LOG FILE:
----------------------------------------------------------------
1 libname sasdata '../../sasdata/risk_adjust';
NOTE: Libref SASDATA was successfully assigned as follows:
Engine: V612
Physical Name: /stats/st7/champ_eoc/sasdata/risk_adjust
2 filename lbl '../../data/deerfield/1231run/catcci.lbl';
3
4
5 data sasdata.ccilbl(drop=catccit);
6 infile lbl dlm=',' dsd;
7 length catcci $ 4 inever agecatl agecath cattot sevtotl sevtoth
8 prcom dxcom prsev maxacci maxccci maxmcci maxpcci maxcci
$
1;
9 input catccit inever agecatl agecath cattot sevtotl sevtoth
10 prcom dxcom prsev maxacci maxccci maxmcci maxpcci maxcci;
11 catcci=put(catccit,z4.);
12 run;
NOTE: The infile LBL is:
File Name=/stats/st7/champ_eoc/data/deerfield/1231run/catcci.lbl,
Owner Name=jyoun,Group Name=hcia,
Access Permission=rwxrwxr-x,
File Size (bytes)=76094
NOTE: 1784 records were read from the infile LBL.
The minimum record length was 29.
The maximum record length was 50.
NOTE: The data set SASDATA.CCILBL has 1784 observations and 15 variables.
NOTE: DATA statement used:
real time 0.580 seconds
cpu time 0.395 seconds
13 proc sort data=sasdata.ccilbl;
14 by catcci;
15 run;
NOTE: The data set SASDATA.CCILBL has 1784 observations and 15 variables.
NOTE: PROCEDURE SORT used:
real time 0.230 seconds
cpu time 0.095 seconds
16
17 options mprint;
18
19 *create CCI dummy variables as beta placeholders;
20 data cci;
21 merge sasdata.ccilbl(in=a)
22 sasdata.ccimap(in=b keep=catcci cci);
23 by catcci;
24 %gendummy(dsn=cci, var=cci, prefix=cci, paste=yes);
NOTE: The data set WORK.CCI has 1784 observations and 16 variables.
NOTE: DATA statement used:
real time 0.170 seconds
cpu time 0.168 seconds
MPRINT(GENDUMMY): PROC SUMMARY DATA = CCI NWAY ;
MPRINT(GENDUMMY): CLASS CCI ;
MPRINT(GENDUMMY): OUTPUT OUT = __CNTS ( KEEP = CCI ) ;
NOTE: The data set WORK.__CNTS has 70 observations and 1 variables.
NOTE: PROCEDURE SUMMARY used:
real time 0.050 seconds
cpu time 0.045 seconds
MPRINT(GENDUMMY): DATA _NULL_ ;
MPRINT(GENDUMMY): SET __CNTS NOBS = NUMVALS ;
MPRINT(GENDUMMY): IF _N_ = 1 THEN CALL SYMPUT ( 'num', TRIM ( LEFT ( PUT
(
NUMVALS, BEST. ) ) ) ) ;
MPRINT(GENDUMMY): CALL SYMPUT ( 'c' || TRIM ( LEFT ( PUT ( _N_, BEST. ) )
), TRIM ( LEFT ( CCI ) ) ) ;
MPRINT(GENDUMMY): RUN ;
NOTE: DATA statement used:
real time 0.070 seconds
cpu time 0.067 seconds
MPRINT(GENDUMMY): DATA CCI ( DROP = J ) ;
MPRINT(GENDUMMY): SET CCI ;
MPRINT(GENDUMMY): LENGTH DEFAULT=4;
MPRINT(GENDUMMY): ARRAY __D ( 70 ) 4 CCI0002 CCI0003 CCI0004 CCI0007
CCI0023 CCI0035 CCI0036 CCI0041 CCI0044 CCI0047 CCI0065 CCI0068 CCI0071
CCI0074 CCI0083 CCI0086 CCI0095 CCI0096 CCI0097 CCI0101 CCI0104 CCI0113
CCI0125 CCI0135 CCI0297 CCI0312 CCI0412 CCI0413 CCI0421 CCI0459 CCI0462
CCI0483 CCI0514 CCI0521 CCI0524 CCI0545 CCI0548 CCI0901 CCI0904 CCI0944
CCI1018 CCI1039 CCI1042 CCI1044 CCI1045 CCI1046 CCI1047 CCI1048 CCI1051
CCI1052 CCI1053 CCI1054 CCI1055 CCI1073 CCI1074 CCI1075 CCI1079 CCI1080
CCI1081 CCI1083 CCI1102 CCI1109 CCI1270 CCI1277 CCI1490 CCI1494 CCI1501
CCI1522 CCI1529 CCI9999 ;
MPRINT(GENDUMMY): DO J = 1 TO 70 ;
MPRINT(GENDUMMY): __D(J) = 0 ;
MPRINT(GENDUMMY): END ;
MPRINT(GENDUMMY): IF CCI = "0002" THEN __D ( 1 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0003" THEN __D ( 2 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0004" THEN __D ( 3 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0007" THEN __D ( 4 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0023" THEN __D ( 5 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0035" THEN __D ( 6 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0036" THEN __D ( 7 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0041" THEN __D ( 8 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0044" THEN __D ( 9 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0047" THEN __D ( 10 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0065" THEN __D ( 11 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0068" THEN __D ( 12 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0071" THEN __D ( 13 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0074" THEN __D ( 14 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0083" THEN __D ( 15 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0086" THEN __D ( 16 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0095" THEN __D ( 17 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0096" THEN __D ( 18 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0097" THEN __D ( 19 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0101" THEN __D ( 20 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0104" THEN __D ( 21 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0113" THEN __D ( 22 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0125" THEN __D ( 23 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0135" THEN __D ( 24 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0297" THEN __D ( 25 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0312" THEN __D ( 26 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0412" THEN __D ( 27 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0413" THEN __D ( 28 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0421" THEN __D ( 29 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0459" THEN __D ( 30 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0462" THEN __D ( 31 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0483" THEN __D ( 32 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0514" THEN __D ( 33 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0521" THEN __D ( 34 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0524" THEN __D ( 35 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0545" THEN __D ( 36 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0548" THEN __D ( 37 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0901" THEN __D ( 38 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0904" THEN __D ( 39 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="0944" THEN __D ( 40 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1018" THEN __D ( 41 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1039" THEN __D ( 42 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1042" THEN __D ( 43 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1044" THEN __D ( 44 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1045" THEN __D ( 45 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1046" THEN __D ( 46 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1047" THEN __D ( 47 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1048" THEN __D ( 48 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1051" THEN __D ( 49 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1052" THEN __D ( 50 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1053" THEN __D ( 51 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1054" THEN __D ( 52 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1055" THEN __D ( 53 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1073" THEN __D ( 54 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1074" THEN __D ( 55 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1075" THEN __D ( 56 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1079" THEN __D ( 57 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1080" THEN __D ( 58 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1081" THEN __D ( 59 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1083" THEN __D ( 60 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1102" THEN __D ( 61 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1109" THEN __D ( 62 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1270" THEN __D ( 63 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1277" THEN __D ( 64 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1490" THEN __D ( 65 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1494" THEN __D ( 66 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1501" THEN __D ( 67 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1522" THEN __D ( 68 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="1529" THEN __D ( 69 ) = 1 ;
MPRINT(GENDUMMY): ELSE IF CCI="9999" THEN __D ( 70 ) = 1 ;
MPRINT(GENDUMMY): RUN ;
NOTE: The data set WORK.CCI has 1784 observations and 86 variables.
NOTE: DATA statement used:
real time 1.040 seconds
cpu time 1.041 seconds
25 run;
26
27 options nomprint;
28
29 *data set option 1;
30
31 data cci(keep=cci0002-cci0004);
32 set cci(obs=1);
33 run;
NOTE: The data set WORK.CCI has 1 observations and 3 variables.
NOTE: DATA statement used:
real time 0.070 seconds
cpu time 0.067 seconds
34
35 *data set option 2;
36 /*
37 data cci;
38 cci0002=0;
39 cci0003=0;
40 cci0004=0;
41 run;
42 */
43
44 proc print;
45 title 'Printout of Initialization File (CCI.SSD01)';
46 run;
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used:
real time 0.040 seconds
cpu time 0.020 seconds
47
48 title 'Unload Parameter Estimates for Outcome=TPD';
49
50 data tempbeta;
51 set sasdata.asttpdb(in=a keep=intercep cci0002--cci0004)
52 sasdata.bkmtpdb(in=b keep=intercep cci0002--cci0004)
53 sasdata.dm_tpdb(in=c keep=intercep cci0002--cci0004);
54 if a then eptype='AST';
55 else if b then eptype='BKM';
56 else if c then eptype='DM_';
57 run;
NOTE: The data set WORK.TEMPBETA has 3 observations and 5 variables.
NOTE: DATA statement used:
real time 0.080 seconds
cpu time 0.087 seconds
58 title 'Printout of Parameter Estimates (TEMPBETA.SSD01)';
59 proc print data=tempbeta; run;
NOTE: The PROCEDURE PRINT printed page 2.
NOTE: PROCEDURE PRINT used:
real time 0.010 seconds
cpu time 0.010 seconds
60
61 title 'Printout of Parameter Estimates During Execution
(BETA.SSD01)';
62 data beta;
63 if _N_=1 then set cci;
64 set tempbeta;
65 file print;
66 put _all_;
67 run;
NOTE: 6 lines were written to file PRINT.
NOTE: The data set WORK.BETA has 3 observations and 5 variables.
NOTE: The DATASTEP printed page 3.
NOTE: DATA statement used:
real time 0.060 seconds
cpu time 0.056 seconds
68
69 title 'Printout of Combined Dataset After
Execution(BETA.SSD01)';
70 data report;
71 set beta;
72 file print;
73 put eptype +(-1) '|' (_numeric_) ($ +(-1) '|') +(-1) '|';
74 run;
NOTE: 3 lines were written to file PRINT.
NOTE: The data set WORK.REPORT has 3 observations and 5 variables.
NOTE: The DATASTEP printed page 4.
NOTE: DATA statement used:
real time 0.060 seconds
cpu time 0.053 seconds
NOTE: The SAS System used:
real time 2.980 seconds
cpu time 2.632 seconds
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
-------------------------------------------------------------------
LST FILE:
-------------------------------------------------------------------
Printout of Initialization File (CCI.SSD01)
1
14:52 Monday, March 16,
1998
OBS CCI0002 CCI0003 CCI0004
1 0 0 0
Printout of Parameter Estimates (TEMPBETA.SSD01)
2
14:52 Monday, March 16,
1998
OBS INTERCEP CCI0002 CCI0003 CCI0004 EPTYPE
1 1630.32 -1469.27 -1320.50 670.481 AST
2 310.72 -150.62 -88.44 -93.007 BKM
3 3330.59 -2713.47 -2886.15 -965.465 DM_
Printout of Parameter Estimates During Execution (BETA.SSD01)
3
14:52 Monday, March 16,
1998
CCI0002=-1469.274747 CCI0003=-1320.497763 CCI0004=670.48080808
INTERCEP=1630.3191919 EPTYPE=AST _ERROR_=0 _N_=1
CCI0002=-150.6156476 CCI0003=-88.44336357 CCI0004=-93.00689434
INTERCEP=310.72294372 EPTYPE=BKM _ERROR_=0 _N_=2
CCI0002=-2713.470426 CCI0003=-2886.145981 CCI0004=-965.4654255
INTERCEP=3330.5904255 EPTYPE=DM_ _ERROR_=0 _N_=3
Printout of Combined Dataset After Execution(BETA.SSD01)
4
14:52 Monday, March 16,
1998
AST|-1469.274414|-1320.49707|670.48046875|1630.3191919|
BKM|-150.6156006|-88.44335938|-93.00683594|310.72294372|
DM_|-2713.46875|-2886.144531|-965.465332|3330.5904255|
--------------------------------------------------------------------
Thanks in advance for any insight you can provide.
Jim Young / jyoun@hcia.com / Ann Arbor, MI
|