```Date: Tue, 4 Aug 2009 17:19:15 -0400 Reply-To: Michael Bryce Herrington Sender: "SAS(r) Discussion" From: Michael Bryce Herrington Subject: Re: knotted regression Comments: To: Robin R High In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 WOW, thanks that does exactly what I want. Now I just have to figure out HOW it is doing it. :) Thank you! On Tue, Aug 4, 2009 at 4:44 PM, Robin R High wrote: > > Bryce, > > Ignoring the variable 'freq', here is a method to connect line segments at > a specified number of knots (see Chapter 3 of "Semiparametric Regression" by > Ruppert, Ward, and Carol) for two variables of interest. The process here > produces a special x matrix which then computes predicted values that > intersect at (or close to) the knots. > > > %LET y=relerror; ** vertical axis variable;* > %LET x=actual; ** horizontal axis;* > %LET knts = .07 .15 .26 .4; ** knot locations;* > %LET nnts = 4; ** number of knots;* > > *DATA* indat; SET bias1st; DROP ii jj ; > ARRAY rng{%EVAL(&nnts.+*1*)} x1-x%EVAL(&nnts.+*1*) ; > ARRAY knt{&nnts.} _temporary_ (&knts.); > x1=&x.; > DO jj = *2* to %EVAL(&nnts.+*1*); > rng{jj} =(&x. -knt{jj-*1*})*(&x. ge knt{jj-*1*}); > END; > *RUN*; > > *PROC* *PRINT* DATA=indat; options ps=*199* ls=*132*; *run*; > > > in essence, you want to produce an x matrix below, where each successive > column of x2 .. x5 takes on (x1-knot_k) when x1 > knot_k > > < many rows deleted > > > Obs actual error prd x1 x2 x3 x4 > x5 > > 1 0.00189 0.001587 0.63280 0.002 0 0 0 > 0 > 13 0.05972 0.002729 0.04837 0.060 0 0 0 > 0 > 18 0.08729 0 204 -0.04366 0.087 0.017 0 0 > 0 knot_1 = .07 > 23 0.11586 -0.003377 -0.02399 0.116 0.046 0 0 > 0 > 26 0.12186 0.005627 -0.01986 0.122 0.052 0 0 > 0 > 35 0.17712 -0.004613 -0.00462 0.177 0.107 0.027 0 > 0 knot_2 = .15 > 38 0.20391 -0.016420 -0.00872 0.204 0.134 0.054 0 > 0 > 52 0.24615 0.011380 -0.01519 0.246 0.176 0.096 0 > 0 > 60 0.28883 0.008576 -0.01995 0.289 0.219 0.139 0.029 > 0 knot_3 = .26 > 76 0.38284 -0.005295 -0.02856 0.383 0.313 0.233 0.123 > 0 > 79 0.39113 0.001288 -0.02932 0.391 0.321 0.241 0.131 > 0 > 80 0.42041 -0.023001 -0.03671 0.420 0.350 0.270 0.160 > 0.020 knot_4 = .4 > 82 0.45109 -0.014046 -0.04660 0.451 0.381 0.301 0.191 > 0.051 > 87 0.73196 -0.090499 -0.13712 0.732 0.662 0.582 0.472 > 0.332 > > > then fit a model.. > K+1 > f(x) = b0 + b1*x1 + SUM b_k*(x_k - knot_k)+ > 2 > > where K = number of knots > > If you enter 5 knots, then the DATA step produces 6 x columns. > > * compute predicted values; > > *PROC* *REG* data=indat; > MODEL &y. = x1-x%EVAL(&nnts.+*1*) ; > OUTPUT out=rmns pred=prd; > *run*; *quit*; > > *PROC* *PRINT* DATA=rmns(where=(ranuni(*929*)> *.5*)); > VAR &x. &y. prd x1-x%eval(&nnts.+*1*) ; > format x: *prd 5.3*; > *run*; > > goptions reset=all; > > symbol1 v=dot i=none color=blue h=*1*; > symbol2 v=none i=join color=black line=*1* w=*2*; > > *proc* *gplot* data=rmns ; > plot &y.*&x.=*1* prd*&x.=*2* / noframe overlay haxis = *0* to *1* by *.1* > hm=*1* href=(&knts.) lhref=*33*; > *run* ; *quit*; > > > Robin High > UNMC > > > > > > *Michael Bryce Herrington * > Sent by: "SAS(r) Discussion" > > 08/04/2009 02:25 PM Please respond to > Michael Bryce Herrington > > To > SAS-L@LISTSERV.UGA.EDU cc > Subject > knotted regression > > > > > Hey, > > I need some help correcting bias in a model I am working on. The data set > below represents my estimates and the actuals. The "estimates" value is > found by averaging the estimated percentages for all observations in a > small > interval, the actual is the percentage of these observations for which the > event actually occurs. You can see from some of the quick plots I have > included that we have some bias for the very small percentages and large > percentages. I would like to try to use a knotted linear regression to > correct this. I do not know how to do this in SAS. I would play around > with knot locations but would like them around: .07, .17, .26, .4. The > only > requirements I have is that the regression equation will be monotonic and > continuous. > > "freq" is the number of observations in each interval. > "error" is the difference between "estimate" and "actual." > "relerror" is the relative error between estimate and actual. > > Thanks for any help you can provide. > > * > > data* bias1st; > > input estimate actual freq error relerror; > > datalines; > > 0.003473099 0.001885903 2121 0.001587196 0.841610668 > > 0.007699692 0.004836028 6617 0.002863664 0.592151961 > > 0.012613618 0.007609715 9593 0.005003903 0.657567667 > > 0.017529609 0.011300992 10884 0.006228617 0.551156645 > > 0.022516198 0.014978602 11216 0.007537596 0.503224253 > > > .... > > > 0.486851702 0.519018405 815 -0.032166703 -0.061976036 > > 0.521874933 0.579868709 914 -0.057993776 -0.100011908 > > 0.571919876 0.650574713 435 -0.078654837 -0.120900545 > > 0.64145977 0.731958763 194 -0.090498993 -0.12363947 > > ; > * > > run*; > > symbol1 v=plus i=none c=blue; > > symbol2 v=none i=j c=r; > * > > proc* *gplot* data=bias1st; > > plot estimate*actual actual*actual/overlay; > * > > run*; > > plot error*actual; > * > > run*; > > plot relerror*actual; > * > > run*; > > > -- > Bryce Herrington > Clemson University > mherrin@g.clemson.edu > (863) 258-4758 > > -- Bryce Herrington Clemson University mherrin@g.clemson.edu (863) 258-4758 ```

Back to: Top of message | Previous page | Main SAS-L page