Date: Tue, 4 Sep 2007 14:43:42 -0000
Reply-To: sassql@GMAIL.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: sassql@GMAIL.COM
Organization: http://groups.google.com
Subject: Re: LOCF question still not resolved
In-Reply-To: <2fc7f3340709030959w7778c0ccu41e2dabd2e89c5d1@mail.gmail.com>
Content-Type: text/plain; charset="us-ascii"
On Sep 3, 12:59 pm, muthia.kachira...@GMAIL.COM (Muthia Kachirayan)
wrote:
> On 9/1/07, sas...@gmail.com <sas...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Dear all,
>
> > Sorry to bother you guys again. But i have still LOCF issue
> > unresolved. Let me describe the situation again. I really appreciate
> > your help and time. Thanks again.
>
> > Hi,
>
> > Actually i have a data with around 100+ variables in it. And i need to
> > make sure that there should be atleast 6 visits per patient. So if in
> > case a patient is missing any visit, then i need to create a visit
> > which is missing and then carry over the data from the previous visit
> > for all the variables except few variables. I have two variables in
> > the dataset named flag and ontreatment. So if a patient is missing any
> > visit, i have to create a record for the missing visit, and carry over
> > all the data from the previous visit for most of the variables, only
> > where flag = 1 and ontreatment = 1.
> > Example:
>
> > Patient visit FLAG ontreatment ecogscore OC_LOCF
> > tumormeasurements
> > investi
> > 101
> > -1 3 OC
> > 20 ABC
> > 101 0
> > 2 OC
> > 20 ABC
> > 101 1 1 1 4
> > OC 30 ABC
> > 101 1 1
> > 3 OC
> > 30 ABC
> > 101 2 1 1 4
> > LOCF . ABC
> > 101 3 1 1 2
> > OC 34 ABC
> > 101 4 1 1 2
> > LOCF . ABC
> > 101 5 1 1 2
> > LOCF . ABC
> > 101 6 1 1 2
> > LOCF . ABC
>
> > So there are character and numeric variables both in the dataset whose
> > value needs to be carry over. In the above example, for patient 101,
> > its missing visit 2, 4 5 and6. So i have to carry over data from the
> > visit 1 for visit 2 where flag = 1and ontreatment = 1 for all the
> > variables except the tumormeasurements. For visit 4, 5 and 6, carry
> > over the data from the visit 3 where flag = 1 and ontreatment = 1 for
> > all variables except the tumor measurements. That's the reason the
> > tumor measurements values are missing for the LOCF records.
>
> > I would really appreciate if you can let me know how i can implement
> > the above LOCF. Just want to remind again that there are more than 100
> > variables in the dataset and they are both character and numeric.
>
> > Thanks in advance.
>
> This is a special LOCF problem, a CONDITIONAL LOCF. Unlike regular LOCF, the
> LAG() function can not be used. We can use arrays to store and retrieve rows
> when a given CONDTION is met. But in view of 100 + variables it may require
> more care.
>
> The ARRAY and POINT= option is very elegant and straight forward for this
> problem.
>
> The code is given below. It is not tested on a large data set.
>
> The array, k[], is used to save the Record Number when VISIT(vis) is present
> and a missing value when it is not for each of PATIENT(pt) . With this
> information, the POINTER= option is used to directly access the ROW from the
> data set. To meet your last requirement, recPtr, is used to indicate when a
> ROW meets the condition , FLAG = 1 and ONTREATMENT(ot) = 1.
>
> The mixture of numeric and character variables can be handled by passing
> numVars and strVars as macro variables separately. The number of visits can
> be varied through the use of lastVisit and passing it as macro variable.
>
> The use of LENGTH statement is used to impact the PDV to keep them in that
> order for OUTPUT. The 100 plus variables can be listed here at this
> statement or passed as a macro variable. The variables that do not take part
> in LOCF, the unused, can be kept in a separate list and passed to the
> program. The miss2Record-code need be taken care of for these unused
> variables.
>
> %let numVars = vis flag ot tum eco;
> %let strVars = oc_locf inv;
> %let lastVisit = 6;
> data a;
> ** impact the PDV;
> length pt vis flag ot 8 oc_locf $8 tum 8 inv $8 eco 8;
> array numv[*] &numVars;
> array strv[*] $ &strVars;
> array k[-1:&lastVisit] _temporary_;
> ** Fill k[] with Record ID when VIS is present;
> do _n_ = 1 by 1 until(last.pt);
> set given end = eof;
> by pt;
> k[vis] = _n_;
> end;
>
> do m = -1 to &lastVisit;
> if k[m] then do;
> p = nRecs + k[m];
> link ptr2Record;
> output;
> if (flag > 0) * (ot > 0) then recPtr = p;
> end;
> else if k[m] = . then do;
> if recPtr = . then do;
> link miss2Record;
> vis = m;
> oc_locf = ' ';
> output;
> end;
> else do;
> p = recPtr;
> link ptr2Record;
> vis = m;
> oc_locf = 'LOCF';
> output;
> end;
> end;
> end;
> nRecs ++ _n_;
> link initArray;
> return;
>
> ptr2Record:
> set given point = p;
> return;
>
> miss2Record:
> do ii = 1 to dim(numv);
> numv[ii] = .;
> end;
> do ii = 1 to dim(strv);
> strv[ii] = ' ';
> end;
> return;
>
> initArray:
> do ii = 1 to dim(numv);
> numv[ii] = .;
> end;
> do ii = 1 to dim(strv);
> strv[ii] = ' ';
> end;
> do ii = -1 to &lastVisit;
> k[ii] = .;
> end;
> return;
>
> if eof then stop;
> drop m nRecs ii recPtr;
> run;
>
> This was tested on this sample data:
>
> data given;
> input pt vis flag ot oc_locf $ tum inv $ eco;
> cards;
> 101 -1 . . oc 20 abc 1
> 101 0 . . oc 20 abc 1
> 101 1 1 1 oc 20 abc 1
> 101 3 . 1 oc 20 abc 1
> 102 -1 . . oc 20 xyz 2
> 102 0 1 . oc 20 xyz 2
> 102 2 1 1 oc 20 xyz 2
> 102 5 1 1 oc 20 xyz 2
> ;
> run;
>
> The data set is sorted by PT and VIS before using the program.
>
> proc sort data = given;
> by pt vis;
> run;
>
> The output is:
>
> Obs pt vis flag ot oc_locf tum inv eco
>
> 1 101 -1 . . oc 20 abc 1
>
> 2 101 0 . . oc 20 abc 1
>
> 3 101 1 1 1 oc 20 abc 1
>
> 4 101 2 1 1 LOCF 20 abc 1
>
> 5 101 3 . 1 oc 20 abc 1
>
> 6 101 4 1 1 LOCF 20 abc 1
>
> 7 101 5 1 1 LOCF 20 abc 1
>
> 8 101 6 1 1 LOCF 20 abc 1
>
> 9 102 -1 . . oc 20 xyz 2
>
> 10 102 0 1 . oc 20 xyz 2
>
> 11 102 1 . . . .
>
> 12 102 2 1 1 oc 20 xyz 2
>
> 13 102 3 1 1 LOCF 20 xyz 2
>
> 14 102 4 1 1 LOCF 20 xyz 2
>
> 15 102 5 1 1 oc 20 xyz 2
>
> 16 102 6 1 1 LOCF 20 xyz 2
>
> The LINK statements, ptr2Record, miss2Record and initArray, keep the
> program neat for understanding the flow as well as for re-use. ptr2Record
> fetches a ROW for a given POINTER(P). miss2Record fills a ROW with missing
> values. initArray initializes the arrays at the end of BY-Group processing
> to handle the next PATIENT. Though initArray is called once, it is not a
> perfect candidate for LINK statement but for the understanding, it is given
> as separate.
>
> I would appreciate your feedback when you use this with a very large data
> set.
>
> Regards,
>
> Muthia Kachirayan- Hide quoted text -
>
> - Show quoted text -
Dear,
I really appreciate all your time and effort in doing this. I will try
this and let you know if i have any questions. Regards,
|