Date: Tue, 4 Aug 2009 17:19:15 -0400
Reply-To: Michael Bryce Herrington <mherrin@G.CLEMSON.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Michael Bryce Herrington <mherrin@G.CLEMSON.EDU>
Subject: Re: knotted regression
In-Reply-To: <OF0CD55A00.5733F0A5-ON86257608.0071B656-86257608.0071F594@unmc.edu>
Content-Type: text/plain; charset=ISO-8859-1
WOW, thanks that does exactly what I want. Now I just have to figure out
HOW it is doing it. :) Thank you!
On Tue, Aug 4, 2009 at 4:44 PM, Robin R High <rhigh@unmc.edu> wrote:
>
> Bryce,
>
> Ignoring the variable 'freq', here is a method to connect line segments at
> a specified number of knots (see Chapter 3 of "Semiparametric Regression" by
> Ruppert, Ward, and Carol) for two variables of interest. The process here
> produces a special x matrix which then computes predicted values that
> intersect at (or close to) the knots.
>
>
> %LET y=relerror; ** vertical axis variable;*
> %LET x=actual; ** horizontal axis;*
> %LET knts = .07 .15 .26 .4; ** knot locations;*
> %LET nnts = 4; ** number of knots;*
>
> *DATA* indat; SET bias1st; DROP ii jj ;
> ARRAY rng{%EVAL(&nnts.+*1*)} x1-x%EVAL(&nnts.+*1*) ;
> ARRAY knt{&nnts.} _temporary_ (&knts.);
> x1=&x.;
> DO jj = *2* to %EVAL(&nnts.+*1*);
> rng{jj} =(&x. -knt{jj-*1*})*(&x. ge knt{jj-*1*});
> END;
> *RUN*;
>
> *PROC* *PRINT* DATA=indat; options ps=*199* ls=*132*; *run*;
>
>
> in essence, you want to produce an x matrix below, where each successive
> column of x2 .. x5 takes on (x1-knot_k) when x1 > knot_k
>
> < many rows deleted >
>
> Obs actual error prd x1 x2 x3 x4
> x5
>
> 1 0.00189 0.001587 0.63280 0.002 0 0 0
> 0
> 13 0.05972 0.002729 0.04837 0.060 0 0 0
> 0
> 18 0.08729 0 204 -0.04366 0.087 0.017 0 0
> 0 knot_1 = .07
> 23 0.11586 -0.003377 -0.02399 0.116 0.046 0 0
> 0
> 26 0.12186 0.005627 -0.01986 0.122 0.052 0 0
> 0
> 35 0.17712 -0.004613 -0.00462 0.177 0.107 0.027 0
> 0 knot_2 = .15
> 38 0.20391 -0.016420 -0.00872 0.204 0.134 0.054 0
> 0
> 52 0.24615 0.011380 -0.01519 0.246 0.176 0.096 0
> 0
> 60 0.28883 0.008576 -0.01995 0.289 0.219 0.139 0.029
> 0 knot_3 = .26
> 76 0.38284 -0.005295 -0.02856 0.383 0.313 0.233 0.123
> 0
> 79 0.39113 0.001288 -0.02932 0.391 0.321 0.241 0.131
> 0
> 80 0.42041 -0.023001 -0.03671 0.420 0.350 0.270 0.160
> 0.020 knot_4 = .4
> 82 0.45109 -0.014046 -0.04660 0.451 0.381 0.301 0.191
> 0.051
> 87 0.73196 -0.090499 -0.13712 0.732 0.662 0.582 0.472
> 0.332
>
>
> then fit a model..
> K+1
> f(x) = b0 + b1*x1 + SUM b_k*(x_k - knot_k)+
> 2
>
> where K = number of knots
>
> If you enter 5 knots, then the DATA step produces 6 x columns.
>
> * compute predicted values;
>
> *PROC* *REG* data=indat;
> MODEL &y. = x1-x%EVAL(&nnts.+*1*) ;
> OUTPUT out=rmns pred=prd;
> *run*; *quit*;
>
> *PROC* *PRINT* DATA=rmns(where=(ranuni(*929*)> *.5*));
> VAR &x. &y. prd x1-x%eval(&nnts.+*1*) ;
> format x: *prd 5.3*;
> *run*;
>
> goptions reset=all;
>
> symbol1 v=dot i=none color=blue h=*1*;
> symbol2 v=none i=join color=black line=*1* w=*2*;
>
> *proc* *gplot* data=rmns ;
> plot &y.*&x.=*1* prd*&x.=*2* / noframe overlay haxis = *0* to *1* by *.1*
> hm=*1* href=(&knts.) lhref=*33*;
> *run* ; *quit*;
>
>
> Robin High
> UNMC
>
>
>
>
>
> *Michael Bryce Herrington <mherrin@G.CLEMSON.EDU>*
> Sent by: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
>
> 08/04/2009 02:25 PM Please respond to
> Michael Bryce Herrington <mherrin@G.CLEMSON.EDU>
>
> To
> SAS-L@LISTSERV.UGA.EDU cc
> Subject
> knotted regression
>
>
>
>
> Hey,
>
> I need some help correcting bias in a model I am working on. The data set
> below represents my estimates and the actuals. The "estimates" value is
> found by averaging the estimated percentages for all observations in a
> small
> interval, the actual is the percentage of these observations for which the
> event actually occurs. You can see from some of the quick plots I have
> included that we have some bias for the very small percentages and large
> percentages. I would like to try to use a knotted linear regression to
> correct this. I do not know how to do this in SAS. I would play around
> with knot locations but would like them around: .07, .17, .26, .4. The
> only
> requirements I have is that the regression equation will be monotonic and
> continuous.
>
> "freq" is the number of observations in each interval.
> "error" is the difference between "estimate" and "actual."
> "relerror" is the relative error between estimate and actual.
>
> Thanks for any help you can provide.
>
> *
>
> data* bias1st;
>
> input estimate actual freq error relerror;
>
> datalines;
>
> 0.003473099 0.001885903 2121 0.001587196 0.841610668
>
> 0.007699692 0.004836028 6617 0.002863664 0.592151961
>
> 0.012613618 0.007609715 9593 0.005003903 0.657567667
>
> 0.017529609 0.011300992 10884 0.006228617 0.551156645
>
> 0.022516198 0.014978602 11216 0.007537596 0.503224253
>
>
> ....
>
>
> 0.486851702 0.519018405 815 -0.032166703 -0.061976036
>
> 0.521874933 0.579868709 914 -0.057993776 -0.100011908
>
> 0.571919876 0.650574713 435 -0.078654837 -0.120900545
>
> 0.64145977 0.731958763 194 -0.090498993 -0.12363947
>
> ;
> *
>
> run*;
>
> symbol1 v=plus i=none c=blue;
>
> symbol2 v=none i=j c=r;
> *
>
> proc* *gplot* data=bias1st;
>
> plot estimate*actual actual*actual/overlay;
> *
>
> run*;
>
> plot error*actual;
> *
>
> run*;
>
> plot relerror*actual;
> *
>
> run*;
>
>
> --
> Bryce Herrington
> Clemson University
> mherrin@g.clemson.edu
> (863) 258-4758
>
>
--
Bryce Herrington
Clemson University
mherrin@g.clemson.edu
(863) 258-4758
|