|
I agree using Proc format is a simple and good choice, if we only need
to make Visa look OK. In this case, I need an extra variable Ok
anyway.
When I was writing the code, my mind was stuck with a piece of psedo
logic the analyst gave me in plain English, such as
if -6.04 <= Visa <= -4.25 then OK=1;
I quickly replaced the <= with GT, without realizing I should check
the SAs logic from left to right. So "if -6.04 GT Visa LE -4.25 then
OK=1;" actually is the same as 'Visa LT -6.04'. That explains
output1.lst.
Here is the reason why I tried to put the cutoffs back on the code
line. Very often I randomly pick a recently finished credit scoring
model and request the analyst send over a copy of psedo logic so I can
use SAS to duplicate, if not emulate their results. Call it sanity
check, if you would. This specific model uses proc logistic. I saw in
the SAS data set that contains the estimated coefficients for the
predictors(using the score= option), that the estimates have long
decimals, such as -2.02988399, instead of 2.03. Then the predicted
score on each account/observation is like -5.03092982 or 'worse'.
How the brackets are cut is not the issue here. The cutoff values are
produced by proc means with maxdec=2, which led to values like -6.04.
If a cutoff value happens to be -5.03, if I round up the score
-5.03092982 may put the account into different Ok groups. On the other
hand, I don't want to increase the decimal of the cutoff values to be
6 or larger as if I really believe the difference at those decimals
make difference. Therefore, I decided to do some testing to see if
quantitatively the membership shift is worth more investitgation or
not, on the scale of 5 to 12 million records per project. If the
difference is not siginificant, I would let the analyst continue to
run the project the way he has been. Otherwise, proposing non-mutually
exclusive bracketing algorithm may be a worthy course for this
project.
When I came back to do If-else if in ways I normally don't do by
explicitly writing out boundary for each bracket, instead of just
running code3.sas, I wasted no time hitting the GT snag. I still don't
understand why Ok=2 is missing in output2.lst.
Thank you all for your attention.
Paula D
greg.woolridge@TAP.COM (Greg Woolridge) wrote in message news:<OFE9E4B05E.CF84D413-ON86256B9D.006E39BE@tap.abbott.com>...
> Paula,
>
> I believe the problem exists in the way you are setting up your conditions
> with respect to the negative number. If you keep in mind that as the
> absolute value of a negative number increases, the actual value decreases
> it appears that you should set up the conditions like this:
>
> code1.sas
> if -6.04 LT Visa LE -4.25 then OK=1;
> else if -4.25 LT Visa LE -3.86 then OK=2;
> else if -3.86 Lt Visa LE -3.56 then OK=3;
> else if -3.56 Lt Visa LE -2.03 then OK=4;
> else if -2.03 lt Visa LE -1.60 then OK=5;
> else if -1.60 lt Visa LE -1.43 then OK=6;
> else if -1.43 lt Visa LE -1.25 then OK=7;
> else if -1.25 lt Visa LE -1.06 then OK=8;
> else if -1.06 lt Visa LE -0.83 then OK=9;
> else if -0.83 lt Visa LE 0.48 then OK=10;
> else OK=11;
>
> I believe this will give the correct counts.
>
> Greg M. Woolridge
> Manager, Study Programming
> TAP Pharmaceutical Products Inc.
> e-mail: greg.woolridge@tap.com
> phone: 847-582-2332
> fax: 847-582-2403
>
>
>
> paula D
> <sophe@USA.NET> To: SAS-L@LISTSERV.UGA.EDU
> Sent by: "SAS(r) cc:
> Discussion" Subject: Counting irregularity?
> <SAS-L@LISTSERV.
> UGA.EDU>
>
>
> 04/16/02 02:29
> PM
> Please respond
> to paula D
>
>
>
>
>
>
> I ran code like this (code1.sas)
> "
> data m3.Intel;
> set m3.final;
> if -6.04 GT Visa LE -4.25 then OK=1;
> else if -4.25 GT Visa LE -3.86 then OK=2;
> else if -3.86 gt Visa LE -3.56 then OK=3;
> else if -3.56 gt Visa LE -2.03 then OK=4;
> else if -2.03 gt Visa LE -1.60 then OK=5;
> else if -1.60 gt Visa LE -1.43 then OK=6;
> else if -1.43 gt Visa LE -1.25 then OK=7;
> else if -1.25 gt Visa LE -1.06 then OK=8;
> else if -1.06 gt Visa LE -0.83 then OK=9;
> else if -0.83 gt Visa LE 0.48 then OK=10;
> else OK=11;
> proc freq;
> table OK/missing;
> run; "
>
> I got freq table like this (output1.lst)
> "
> The FREQ Procedure
>
> Cumulative
> Cumulative
> OK Frequency Percent Frequency
> Percent
>
> ???????????????????????????????????????????????????????????
> 1 1 0.00 1
> 0.00
> 2 78288 7.65 78289
> 7.65
> 3 124198 12.14 202487
> 19.79
> 4 121118 11.83 323605
> 31.62
> 5 115008 11.24 438613
> 42.86
> 6 65751 6.42 504364
> 49.28
> 7 76623 7.49 580987
> 56.77
> 8 123066 12.03 704053
> 68.79
> 9 114739 11.21 818792
> 80.01
> 10 100261 9.80 919053
> 89.80
> 11 104360 10.20 1023413
> 100.00
>
> "
>
> Then I ran code like this (code2.sas)
> "
>
> data m3.scoremodelA_OK;
> set m3.scoremodelA_final;
> if Visa LE -4.25 then OK=1;
> else if -4.25 GT Visa LE -3.86 then OK=2;
> else if -3.86 gt Visa LE -3.56 then OK=3;
> else if -3.56 gt Visa LE -2.03 then OK=4;
> else if -2.03 gt Visa LE -1.60 then OK=5;
> else if -1.60 gt Visa LE -1.43 then OK=6;
> else if -1.43 gt Visa LE -1.25 then OK=7;
> else if -1.25 gt Visa LE -1.06 then OK=8;
> else if -1.06 gt Visa LE -0.83 then OK=9;
> else if -0.83 gt Visa LE 0.48 then OK=10;
> else OK=11;
> proc freq;
> table OK/missing;
> run; "
>
> Then I got freq table like this (output2.lst)
>
> "
>
> Cumulative Cumulative
> Ok Frequency Percent Frequency
> Percent
>
> ???????????????????????????????????????????????????????????
> 1 78289 7.65 78289
> 7.65
> 3 124198 12.14 202487
> 19.79
> 4 121118 11.83 323605
> 31.62
> 5 115008 11.24 438613
> 42.86
> 6 65751 6.42 504364
> 49.28
> 7 76623 7.49 580987
> 56.77
> 8 123066 12.03 704053
> 68.79
> 9 114739 11.21 818792
> 80.01
> 10 100261 9.80 919053
> 89.80
> 11 104360 10.20 1023413
> 100.00
> "
>
> Visa is a numeric field with format Best12. and informat 12. with
> length=8. Visa does not have missing value. Typical values of Visa are
> like this
> -0.298322223. Yes, I did not care to round Visa up to 2 decimal places
> because I did not think this should be a problem.
>
> This is happening on V8.2 on Windows 98 SE. I have not tested it on
> other Windows yet. Logs in both cases do not turn up any warning, erro
> messages, flag and etc.. They are plainly clean.
>
> I have not found in SAS documents or anywhere that says GT should be
> paired with LE? or LT should be used with GE? I doubt this is the
> cause here, since from OK=3 to Ok=11, there is regular consistency
> (valid or not, I don't know).
>
> I ran yet another variation (code3.sas)
> "
> data m3.scoremodelA_OK;
> set m3.scoremodelA_final;
> if Visa LE -4.25 then OK=1;
> else if Visa LE -3.86 then OK=2;
> else if Visa LE -3.56 then OK=3;
> else if Visa LE -2.03 then OK=4;
> else if Visa LE -1.60 then OK=5;
> else if Visa LE -1.43 then OK=6;
> else if Visa LE -1.25 then OK=7;
> else if Visa LE -1.06 then OK=8;
> else if Visa LE -0.83 then OK=9;
> else if Visa LE 0.48 then OK=10;
> else OK=11;
> proc freq;
> table OK/missing;
> run; "
> It gives me output like this (output3.lst)
> "
> Ok Frequency Percent Frequency
> Percent
>
> ???????????????????????????????????????????????????????????
> 1 78289 7.65 78289
> 7.65
> 2 124198 12.14 202487
> 19.79
> 3 121118 11.83 323605
> 31.62
> 4 115008 11.24 438613
> 42.86
> 5 65751 6.42 504364
> 49.28
> 6 76623 7.49 580987
> 56.77
> 7 123066 12.03 704053
> 68.79
> 8 114739 11.21 818792
> 80.01
> 9 100261 9.80 919053
> 89.80
> 10 104355 10.20 1023408
> 100.00
> 11 5 0.00 1023413
> 100.00
>
> "
>
> I am scared. I have not got the right results from other ways yet.
> Frankly, I don't know how to get the right freq without using SAS.
>
> Paula D
|