Date: Wed, 27 Dec 2006 14:15:57 -0800
Reply-To: ssegall@GMAIL.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: ssegall@GMAIL.COM
Organization: http://groups.google.com
Subject: Re: Problem involving Lag Function/ Do Loops
In-Reply-To: <16FD64291482A34F995D2AF14A5C932C015A6F33@MAIL002.prod.ds.russell.com>
Content-Type: text/plain; charset="us-ascii"
Thanks. Yeah its actually closer to 4 million rows at the end of the
day 8GBs, but yeah I am lucky enought to not have to worry too much
about efficency in my SAS coding. Writing efficently might save me a
minute or two, but not a big deal.
Any way thanks again for your help.
"Terjeson, Mark" wrote:
> Hi,
>
> - where I used to work, that would be considered a small file.... :o)
>
> - :o) should have gone with my first instinct.....PD=0
>
> - if you want to add -2 to the mix, here are a couple options:
>
> - if any negative, then
> if PD eq -1 then
> could become
> if PD lt 0 then
>
> - if only -1 and -2, then
> if PD eq -1 then
> could become
> if PD in(-1,-2) then
>
>
>
>
> Hope this is helpful.
>
>
> Mark Terjeson
> Senior Programmer Analyst, IM&R
> Russell Investment Group
>
>
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> ssegall@GMAIL.COM
> Sent: Wednesday, December 27, 2006 9:08 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: Problem involving Lag Function/ Do Loops
>
> Mark,
> The final step wasn't exactly what I wanted but I can get to it from the
> intermediate table temp 2 with the following code.
> Data final;
> set temp2;
> if delme=1 then PD=0;
> run;
>
> Basically, I need to conserve all the rows in my final data set
> including the ones with no PD assesed or reverasals so that I can build
> a logistic model off of it. I want to conserve all the rows and just
> change the PD to 0 where there has been a reversal. Either way, thanks
> and I'm good. You saved me a lot of time.
> As far as sorting goes, its a big data set 400k, but its not like I am
> fixing this problem in our whole database. Sorting it twice should take
> some time, but nothing that my machine can't handle. I can also get the
> data pre-sorted from Teradata if I am really inclined.
>
> I would say the only other kink that might occur is that in some cases
> the reversal might be -2. In other words the previous two PD fees need
> to be reversed. In all honesty I know that this is exceedingly rare so
> I am not that concerned about it, but if its an easy change to the code
> then let me know, but I am not sure its so easy so don't bother.
> Thanks again,
> Steve
>
> "Terjeson, Mark" wrote:
> > Hi Steve,
> >
> > A couple of quick questions.
> >
> > Is it okay to sort this file first or must it be done without sorting
> > first (or after)?
> >
> > You mention in note 3 and 4 below about pulling out the reversal. Does
>
> > that mean just the reversal row or the corresponding assessment row
> too?
> >
> > When you say pulled out, do you mean delete the row?
> >
> > If this is going to be working on very large datasets, is sorting or a
>
> > couple passes going to be a performance or real estate issue (which
> > would likely promote taking different, more sophisticated approach)?
> >
> > Within each account group, do you want to associate the first reversal
>
> > with the first assessment and the second reversal with the second
> > assessment (in order found)? or as mentioned in note 4 you don't
> care
> > which assessment a reversal cancels as long as the assessment is prior
>
> > to the reversal?
> >
> > I added another sample of data to exploit the later mentioned in the
> > last paragraph, and code below to see if this works for you.
> >
> >
> >
> >
> >
> >
> >
> > data sample;
> > input Acct T PD;
> > cards;
> > 1 1 1
> > 1 2 -1
> > 1 3 0
> > 1 4 0
> > 1 5 0
> > 1 6 1
> > 2 1 1
> > 2 2 1
> > 2 3 0
> > 2 4 -1
> > 2 5 0
> > 3 1 0
> > 3 2 0
> > 3 3 0
> > 3 4 0
> > 3 5 0
> > 3 6 1
> > 3 7 0
> > 3 8 0
> > 3 9 -1
> > 3 10 0
> > 3 11 1
> > 4 1 1
> > 4 2 0
> > 4 3 1
> > 4 4 0
> > 4 5 0
> > 4 6 1
> > 4 7 -1
> > 4 8 0
> > 4 9 -1
> > 4 10 0
> > 4 11 1
> > ;
> > run;
> >
> > ******************************************************;
> > * Approach: as mentioned in note 4 you do not care *
> > * which assessment a reversal cancels as long as *
> > * the assessment is prior to the reversal. *
> > ******************************************************;
> >
> > * prep order for flagging ;
> > proc sort data=sample out=temp1;
> > by Acct descending T;
> > run;
> >
> >
> > * flag reversals and assessments ; data temp2;
> > set temp1;
> > by Acct descending T;
> > retain flag 0;
> > if first.Acct then flag=0; * reset ;
> > if PD eq -1 then
> > do;
> > flag = flag + 1;
> > delme = 1;
> > end;
> > if PD eq 1 and flag gt 0 then
> > do;
> > flag = flag - 1;
> > delme = 1;
> > end;
> > run;
> > * temp2 above is for you to see ;
> > * how flagging is working... ;
> >
> >
> > * here we have the same as above ;
> > * except we add the separating ;
> > * of the results... ;
> > data remaining
> > reversals;
> > set temp1;
> > by Acct descending T;
> > retain flag 0;
> > if first.Acct then flag=0; * reset ;
> > if PD eq -1 then
> > do;
> > flag = flag + 1;
> > output reversals;
> > end;
> > else if PD eq 1 and flag gt 0 then
> > do;
> > flag = flag - 1;
> > output reversals;
> > end;
> > else
> > do;
> > output remaining;
> > end;
> > drop flag;
> > run;
> > * now of course you may wish ;
> > * to sort either of these ;
> > * back to ascending order... ;
> >
> >
> >
> >
> >
> > Did I interpret correctly, or are any of
> > the questions above in need of further
> > assessment?
> >
> >
> >
> >
> >
> > Hope this is helpful.
> >
> >
> > Mark Terjeson
> > Senior Programmer Analyst, IM&R
> > Russell Investment Group
> >
> >
> > Russell
> > Global Leaders in Multi-Manager Investing
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > ssegall@GMAIL.COM
> > Sent: Wednesday, December 27, 2006 7:12 AM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: Problem involving Lag Function/ Do Loops
> >
> > All,
> > I have been thinking about the following problem for awhile. I have
> > gathered some useful information, but mostly I have learned what won't
> > work as opposed to what will. If anyone can help me out it will be
> much
> > appreciated.
> > I work for a major credit card company (I suppose this will cause half
> > of you not to answer) and we are building statistical models for Past
> > Due Fee (noted as a 1 in our database). Occasionally, however, Past
> > Due fees are reversed. When they are reversed, the original entry is
> > not changed to 0, instead the row for a later month field is changed
> to
> > -1. The table looks like what we see below:
> >
> > Acct # T PD
> > 1 1 1
> > 1 2 -1
> > 1 3 0
> > 1 4 0
> > 1 5 0
> > 1 6 1
> > 2 1 1
> > 2 2 1
> > 2 3 0
> > 2 4 -1
> > 2 5 0
> > 3 1 0
> > 3 2 0
> > 3 3 0
> > 3 4 0
> > 3 5 0
> > 3 6 1
> > 3 7 0
> > 3 8 0
> > 3 9 -1
> > 3 10 0
> > 3 11 1
> >
> > There are a few things that I would like to point out they may or may
> > not complicate things:
> > 1. Different accounts are open for different numbers of months and
> thus
> > have more rows of data.
> > 2. Reversals do not always immediately follow assessment.
> > 3. Accounts may have one Assessed one Reversal and then another
> > Assessed (I would only want to pull out the first reversal.
> > 4. Accoutns may have 2 Assesseds followed by one Reversal. I would
> only
> > want to pull out one of the reversals, but I wouldn't much care which
> > one.
> > If anyone wants to help me out it would be much appreciated. I wish I
> > could figure it out, but it seems to be a little complex for my
> current
> > understanding.
> > Thanks
> > -Steve
|