Date: Sun, 24 Aug 2008 10:32:42 -0700
Reply-To: David <david@OFFLODE.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David <david@OFFLODE.COM>
Organization: http://groups.google.com
Subject: Re: how to copy a row
Content-Type: text/plain; charset=ISO-8859-1
Here is a slightly different way of doing it that is a little bit
simpler and avoids the transpose. This assumes that there is an id
variable and that the visit variable is next in the dataset (as in the
example provided), and that all the measurement variables are
numeric. If this isn't the case then one of the more general
solutions already provided may be preferred.
I've used the number of measurement variables (3) throughout the
code... this could be replaced with a macro variable to make it more
general.
data wide;
input id:$3. visit adm dose wt;
datalines;
001 1 52 32 78
001 2 25 54 78
001 3 . . .
002 1 56 54 90
002 2 . . .
003 1 52 32 78
003 2 . 54 78
003 3 25 . 79
;
run;
proc sort data=wide;
by id visit;
run;
data test;
set wide;
by id;
*Create an array that refers to;
*all the measurement variables of;
*interest;
array v{0:3} _numeric_;
*Create an array of temporary variables;
*which are retained;
array tmp{3};
retain tmp1-tmp3;
*If it is the first row per patient then;
*reset the temporary variables;
if first.id then do i=1 to 3;
tmp{i} = v{i};
end;
*Otherwise load the temporary variable with;
*the current rows value. This will be;
*retained until the next row;
else do i=1 to 3;
if v{i} ne . then tmp{i} = v{i};
end;
*Replace missing values with the value;
*from the previous row;
do i=1 to 3;
if v{i} = . then v{i} = tmp{i};
end;
drop i tmp1-tmp3;
run;
David
On Aug 24, 3:57 pm, iebup...@GMAIL.COM ("./ ADD NAME=Data _null_,")
wrote:
> On 8/22/08, Mary <mlhow...@avalon.net> wrote:
>
>
>
> > Thanks much. I just tried it on some of my data, and it did work. A couple
> > of suggestions (just to make it better than the already GREAT it is now):
>
> > 1. Have it be able to work in the case where there the data is either all
> > character or all numeric (I tried it with all character variables, and it
> > balked because it couldn't find an ID on the numeric table with 0 rows).
>
> It was not my intention to write an general solution, that would work
> as you describe. One could call upon the macro language to package
> the code, checking for existence of char/num in the list of variables
> but as I said that was not my intention. I did think it interesting
> to see if I could alter the existing code to function given a scenario
> where ether the char or num list was null.
>
> The code below accomplishes that task to some degree, and it is
> moderately interesting to me.
>
> > 2. Have some way to sort the variables back in the original order of the
> > data set. I ran a proc sql; create table ... select var1, var2, etc., to
> > get it back in original order but it seems there ought to be a better way to
> > do this, as the best part of using your solution is not having to name the
> > variable names!
>
> This is already accomplshed using the non-executed set. There
> variables are redefined in the same order as the original input data
> because the original input data set is use to define the variables. I
> suppose one could think of variable order as another kind of meta
> data.
>
> data wide;
> infile cards missover;
> informat adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
> format adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
> input id:$3. visit adm dose wt drug_name/*OP mentioned many more*/;
> *drop drug_name; *switch these DROPs on and off to test;
> drop adm dose wt;
> cards;
> 001 1 52 32 78.2 Prozac
> 001 2 25 54 78.5
> 001 3 . . .
> 002 1 56 54 90.0 Zoloft
> 002 2 . . .
> 003 1 52 32 78.5 Wellbutrin
> 003 2 . 54 78.2 Prozac
> 003 3 25 . 79.4
> ;;;;
> run;
>
> %let varlist = drug_name--drug_name;
> /*%let varlist = adm--wt;*/
> /*%let varlist = adm--drug_name;*/
>
> proc transpose data=wide out=nums;
> by id visit;
> var %sysfunc(tranwrd(&varlist,--,-numeric-));
> run;
>
> proc sort; by id _name_ visit; run;
> proc transpose data=wide out=char(rename=(col1=char1));
> by id visit;
> var %sysfunc(tranwrd(&varlist,--,-character-));
> run;
> proc sort; by id _name_ visit; run;
>
> data locf;
> attrib _type_ length=$1;
> attrib _name_ length=$32;
> if 0 then set char(keep=char1 rename=(char1=locfC));
> call missing(locfC);
> do until(last._name_);
> set nums(in=in1) char(in=in2);
> by id _name_;
> _type_ = substr('NC',sum(in1,in2*2),1);
> if in1 and not missing(col1) then locf = col1;
> if in2 and not missing(char1) then locfC = char1;
> output;
> end;
> run;
> proc print;
> run;
>
> /* back to wide if you like */
> proc sort data=locf;
> by id visit;
> run;
> proc transpose data=locf(where=(_type_='N')) out=locfN(drop=_:);
> by id visit;
> var locf;
> id _name_;
> run;
> data locfNV / view=locfNV;
> if _n_ eq 1 and eof then stop;
> set locfN;
> return;
> set locf(where=(_type_='N') keep=_type_) end=eof;
> drop _type_;
> run;
> proc transpose data=locf(where=(_type_='C')) out=locfC(drop=_:);
> by id visit;
> var locfC;
> id _name_;
> run;
> data locfCV / view=locfCV;
> if _n_ eq 1 and eof then stop;
> set locfC;
> return;
> set locf(where=(_type_='C') keep=_type_) end=eof;
> drop _type_;
> run;
> data work.locfWide;
> if 0 then set wide; /* recover attributes */
> merge locfNV locfCV;
> by id visit;
> run;
> proc contents varnum;
> proc print;
> run;
>
>
>
> > But otherwise, it worked great. Thanks much.
>
> > -Mary
> > ----- Original Message -----
> > From: ./ ADD NAME=Data _null_,
> > To: SA...@LISTSERV.UGA.EDU
> > Sent: Friday, August 22, 2008 2:54 PM
> > Subject: Re: how to copy a row
>
> > On 8/22/08, Mary <mlhow...@avalon.net> wrote:
>
> > > Fascinating, code and does indeed work; it is nice because it is more
> > > generic in terms of not having to name specific variables. I have some
> > > questions on it:
>
> > > 1. How is the transpose working to pick up only the character or
> > variables?
> > > Is it that you just put character in the middle of the range of variable?
>
> > Yes I suppose you could call character/numeric Name Range qualifiers.
> > Seehttp://tinyurl.com/6zyodgfor details of SAS Variable Lists. This
> > is a very powerful SAS feature.
>
> > > proctranspose data=wide out=char(rename=(col1=char1));
> > > by id visit;
> > > var adm-character-drug_name;
> > > run;
>
> > > 2. What is this statement for? I see that you have got a set nums char
> > > with an in clause to get which type you are doing, so why would you need
> > > this?
>
> > > if 0 then set char(keep=char1 rename=(char1=locfC));
>
> > I need to define locfC character variable. I could have used ATTRIB
> > or LENGTH but I knew that I wanted the same attributes as CHAR1 so I
> > used the non-executed set statement to "copy" the variable attributes
> > from data work.char. Note also the explicit "initialize to missing"
> > this is used because I need LOCFC to be initialized to missing before
> > each new ID. Variables defined with SET/MERGE/UPDATE are not
> > "initialized to missing".
>
> > > 3. How does this statement work to retrieve the formats?
>
> > > if0 then set wide; /* recover attributes */
>
> > Recall that variable attributes are defined when the data step is
> > compiled. Here I refer to all variables in WIDE which I also know are
> > the same variables that have come from the flip-flop using the
> > transpose out and back.
>
> > This action defines all meta data attributes TYPE LENGTH IN/FORMAT and
> > LABEL.
>
> > > -Mary
>
> > > ----- Original Message -----
> > > From: ./ ADD NAME=Data _null_,
> > > To: SA...@LISTSERV.UGA.EDU
> > > Sent: Friday, August 22, 2008 12:12 PM
> > > Subject: Re: how to copy a row
>
> > > I believe I mentioned that I did not address Character variables.
> > > This is easy enough however, see example below, and retrieving the
> > > attributes is easy with a non-executed SET. For a long time I to
> > > worried about "losing attributes" until I realized they can be easily
> > > re-attached.
>
> > > I could not figure out what you were referring to with regards to the
> > > many IF statements you included. Also you don't want to use LAG
> > > because the last non missing value of interest may not be returned by
> > > LAG. You want to "retain" non-missing until it is replace by new
> > > non-missing value. In general this is the kind of coding I am trying
> > > to avoid by transposing the data, I like my code to refer directly (by
> > > name) to as few variables as possible.
>
> > > data wide;
> > > infile cards missover;
> > > informat adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
> > > format adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
> > > input id:$3. visit adm dose wt drug_name/*OP metioned many more*/;
> > > cards;
> > > 001 1 52 32 78.2 Prozac
> > > 001 2 25 54 78.5
> > > 001 3 . . .
> > > 002 1 56 54 90.0 Zoloft
> > > 002 2 . . .
> > > 003 1 52 32 78.5 Wellbutrin
> > > 003 2 . 54 78.2 Prozac
> > > 003 3 25 . 79.4
> > > ;;;;
> > > run;
> > > proc transpose data=wide out=nums;
> > > by id visit;
> > > var adm-numeric-drug_name;
> > > run;
> > > proc sort; by id _name_ visit; run;
> > > proc transpose data=wide out=char(rename=(col1=char1));
> > > by id visit;
> > > var adm-character-drug_name;
> > > run;
> > > proc sort; by id _name_ visit; run;
> > > data locf;
> > > attrib type length=$1;
> > > attrib _name_ length=$32;
> > > if 0 then set char(keep=char1 rename=(char1=locfC));
> > > call missing(locfC);
> > > do until(last._name_);
> > > set nums(in=in1) char(in=in2);
> > > by id _name_;
> > > type = substr('NC',sum(in1,in2*2),1);
> > > if in1 then locf = ifn(missing(col1) ,locf ,col1);
> > > if in2 then locfC =
> > ifc(missing(char1),locfC,char1);
> > > output;
> > > end;
> > > run;
> > > proc print;
> > > run;
>
> > > /* back to wide if you like */
> > > proc sort data=locf;
> > > by id visit;
> > > run;
> > > proc transpose data=locf(where=(type='N')) out=locfN(drop=_:);
> > > by id visit;
> > > var locf;
> > > id _name_;
> > > run;
> > > proc transpose data=locf(where=(type='C')) out=locfC(drop=_:);
> > > by id visit;
> > > var locfC;
> > > id _name_;
> > > run;
> > > data work.locfWide;
> > > if 0 then set wide; /* recover attributes */
> > > merge locfN locfC;
> > > by id visit;
> > > run;
> > > proc contents varnum;
> > > proc print;
> > > run;
>
> > > On 8/22/08, Mary <mlhow...@avalon.net> wrote:
>
> > > > The problem is that your approach loses all formats and informats on the
> > > > original variables, and with mixed character/numeric data, it drops all
> > > the
> > > > character variables. When I try an original data set like this:
>
> > > > datawide;
>
> > > > informat adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
>
> > > > format adm 4.0 dose 4.0 wt 4.1 drug_name $40.;
>
> > > > input id:$3. visit adm dose wt drug_name/*OP metioned many more*/;
>
> > > > cards;
>
> > > > 001 1 52 32 78.2 Prozac
>
> > > > 001 2 25 54 78.5
>
> > > > 001 3 . . .
>
> > > > 002 1 56 54 90.0 Zoloft
>
> > > > 002 2 . . .
>
> > > > 003 1 52 32 78.5 Wellbutrin
>
> > > > 003 2 . 54 78.2 Prozac
>
> > > > 003 3 25 . 79.4
>
> > > > ;
>
> > > > run;
>
> > > > It drops my drug name, and the resulting data set has no informats or
> > > > formats. Also, you've got no control on individual variables, as you
> > > might
> > > > want to update some and not update others, or ask more complicate IF
> > > > statements, like the length of the time of the lag. Here's a longer
> > > > approach, but in my mind clearer, and also it allows for mixed variable
> > > > types such as integer, decimal, character, and date, and retains all
> > > > informats and formats of the original variables:
>
> > > > datawide;
>
> > > > infile cards missover;
>
> > > > informat adm
>
> ...
>
> read more »
|