LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2012, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 13 Mar 2012 07:04:43 -0700
Reply-To:   Jim Groeneveld <jim.1stat@yahoo.com>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Jim Groeneveld <jim.1stat@YAHOO.COM>
Subject:   Re: question about macros, SHOULD BE: question about imputing values
Comments:   To: Regina Dopfel <rdopfel@gmail.com>
Content-Type:   text/plain; charset=iso-8859-1

Dear Regina, Yes, learning and relearning SAS, especially after having done something quite different, may be quite difficult and seem illogical sometimes. There is a reasonable large SAS community here on earth. You may notice the existence of its members via many internet sites about SAS from SAS Institute (www.sas.com) and from users. Users founded a large forum on SAS, called SAS-L (http://listserv.uga.edu/cgi-bin/wa?A0=sas-l&D=1&H=0&O=D&T=1). It can be read and contributed to via your internet browser, but you may also read and write to it via email (I prefer the internet browser option); yet SAS-L does not allow attachments. Another important internet site is www.sascommunity.org founded by both SAS and users. Both sites are intended to provide additional and individual, yet voluntary and without obligations information on SAS to other SAS users. Posing your question(s) there is much more efficient than sending them to some selected individual(s). Besides, anyone may freely decide to pick up your case or not and does not need to feel obliged to respond. (Another such site is the Usenet Newsgroup comp.sys-soft.sas, also via Google Groups as http://groups.google.com/group/comp.soft-sys.sas/topics; it used to be a mirror of SAS-L many years ago, but not anymore.) So I'll also send my response to SAS-L enabling other SAS users to read it and reply to it if wanted. Now to your questions. I also have some questions to you. 1. Why would you want to use arrays and macros? It may be fun but that is not what those are designed for. Only use arrays or macros if there is a clear need to do so. 2. From SAS 8 (or even 7) on variable names are no longer limited to 8 characters; the limit now is 32 characters. 3. To summarise your problem (making it clear to myself): you have (100 observations with) several weekly variables (in dataset Fitness1). Those weekly variables (e.g. hr_exer1 - hr_exer52) may have missing values that you want to change into the mean (exer_mean) of the rest if their number (num_wks_missing) is less than or equal to 5. In the other case you would leave them missing _and_ transfer the observations to another dataset (Alt_Fitness1). Why would you impute after all? Can't you leave and live with the mean as is? 4. Imputing missing values with a mean is not so much the best solution. You artificially decrease the standard deviation of the whole set of weekly values. There are many other techniques to impute missing values, one of which is to replace missings by varying values such that both the mean and standard deviation remain the same. Search via Google for the terms: sas impute missing. 5. Your pseudo code and the idea of arrays are not bad, but your idea of macro language is. Macro language, variables and values do not, never handle datasets and data in there. Macro language manipulates text, SAS code that in its turn handles the data. So, a macro may contain a piece of often used SAS code that is executed every time the macro is called. (If you know C, SAS macro code is somewhat comparable to the C preprocessing language.) 6. Without going into your code in much detail I'll just give some example code that might do your job: * First of all let's generate some example data; DATA ExampleData (DROP=I J RandomNr:); * output dataset (dropped 54 variables);   ARRAY Hr_Exer [52]; * define array of 52 variables Hr_Exer1 to Hr_Exer52;   ARRAY RandomNr [52]; * likewise define array of 52 (other) random numbers;   DO I = 1 TO 5; * repeat 5 times to create 5 observations;     DO J = 1 TO 52; * for every week;       Hr_Exer[J] = RANUNI(1); * Assign random number between 0 and 1, uniformly distributed;       RandomNr[J] = RANUNI(-1); * Assign independent other random number likewise; * Create some 10% missings;       IF (RandomNr[J] LE .1) THEN Hr_Exer[J] = .; * in 10% of the case Hr_Exer[J] gets missing;     END;     OUTPUT; * write observation I;   END; * 5 observations written; RUN; * ready creating example data; * Count number of missing values and calculate mean per observation; DATA Fitness1 (DROP=J) Alt_Fitness1 (DROP=J); * 2 output datasets, driven by code below;   SET ExampleData; * read the input dataset; * Create 2 additional variables, Num_Wks_Missing and Exer_Mean;   Num_Wks_Missing = NMISS (OF Hr_Exer1 - Hr_Exer52); * number of missing values in weekly values;   Exer_Mean = MEAN (OF Hr_Exer1 - Hr_Exer52); * Mean of non-missing weekly values;   ARRAY Hr_Exer [52]; * define array of 52 variables Hr_Exer1 to Hr_Exer52;   IF (Num_Wks_Missing LE 5) THEN DO; * if there are maximally 5 missing values;     DO J = 1 TO 52; * for every week;       IF (MISSING(Hr_Exer[J])) THEN Hr_Exer[J] = Exer_Mean; * substitute Mean for missing;     END; * (end of every week);     OUTPUT Fitness1; * Write observations with 0-5 imputed missings to Fitness1;   END; * (end of 5 or less missings);   ELSE OUTPUT Alt_Fitness1; * Write observations with over 5 missings to Alt_Fitness1; RUN; That is about it! You see, no macro code, not necessary. The code has been tested. Good luck understanding this code and (re)learning SAS. Regards - Jim. -- Jim Groeneveld, Netherlands Statistician/SAS consultant http://jim.groeneveld.eu.tf <quote who="r dopfel"> > Dear Jim, > Over the past couple of weeks I've been 'relearning SAS'.   After working > in technology, I decided to return to school because I learned that here > in the United States, the drop out rate in urban schools was 60+%. My > objective was to teach high school science to this demographic. Alas, > after a couple of years teaching chemistry, biology and environmental > science, I was laid off with 160 other junior teachers. > > While working on my masters, I was required to do a Master's thesis (to > teach). I ended up back in the tech world coding in SAS and within the > field of epidemiology. > > Over the past two weeks, I have been online relearning SAS (reading, > listening and going thru tutorials - good stuff.  Somehow, I came to a > thread to which you responded and it is obvious that you code succinctly > and logically. > > So I am a bit bogged down.  I would like to send you a section of my > code. It is just 1 page.   I am missing some crucial steps (in my mind and > in the code). If you can help, I would greatly appreciate this.  I've > noticed from your thread that you are very kind and willing to help out. > > This is what I am trying to do. > It is actually very simple and basic. > I am writing sample code using very generic variables. > My example study follows 100 individuals who have values recorded during > 52 > weeks: (variables: hours of weekly exercise, resting heart rate.... etc. > etc.) Initially, I am coding with the stipulation - 'no missing values' for > hours of weekly exercise. > > BUT I want to use arrays and macros in this code too. > (I like them. They are fun. As a matter of fact this entire project for me >  is fun. ) note: > I know that names in SAS are limited to 8 characters but I am using longer >  names so that this is easier to understand. > > Here is my strategy to handle observations/individuals with missing > values; code these variables: hr_exer1 - hr_exer52 1. use an array. find > the 'mean' of the number of hours of exercise for each individual over 52 > weeks. code this variable exer_mean. > > 2. use an array.   'count' the number of 'missing values' for that > individual; that is how many instances of '.' are found for variables > hr_exer1 - hr_exer52 ? code this variable num_wks_missing > > 3. IF the number of missing values is <= 5 THEN I want to replace those > missing values with the mean hours of exercise for that individual. code > this variable use_mean > > 4. CALL a macro (named assign_mean) WHEN the condition is met that the > value of missing values is <= 5. I want the macro to replace the 5 or fewer > missing values with the mean hours of exercise for that individual. > > I know that you are busy. Any form of communication works for me. There > are so many choices. > > I've included the code below and attached it as well because the > formatting will be lost. Thanks in any case. > Regina > > /* part1:  the variables hr_exer1 - hr_exer52 hold the values for 52 > weeks; each has the number of weekly hours of exercise */ > > data fitness1; >   exer_mean = 0; /*initialize the variable exer_mean to zero */ >   array a-exer_mean(52)  hr_exer1-hr_exer52; > /*arrayname is a-exer_mean and there are 52 elements*/ >   do [ i] 1 to 52; /*iterate through all 52 weeks */ >     mean(of hr_exer1 to hr_exer52) = exer_mean; >   end; >   drop i; > run; > > proc print data=fitness1; > run; > /* fitness1: included variables:  hr_exer1 - hr_exer52   exer_mean  */ > > /* part2:   count the number of weeks that have missing values for hr_exer */ > > data fitness1; >   num_wks_missing = 0; > /*initialize the NEW variable num_wks_missing to zero */ >   array a-hr_ex_miss(52)  hr_exer1-hr_exer52; >   do [ i ]  1 to 52; >     if  a-hr_exmiss[i] =  '.'  then num_wks_missing +1; >   end; > drop i; > run; > > num_wks_missing = wk_exer_miss(i);    /*create new variable wk_exer_miss > where value is > proc print data=fitness1; > stored for the below macro which i want to call when value >= 5.  */ > run; > > /*QUESTION. > why do I think that I need that 'i'  here: hr_exer_miss(i)*/ > /* fitness1: included variables: hr_exer1  hr_exer52  wk_exer_miss */ > /*exer_mean    num_wks-missing    */ > > /* part3:  IF  number of weeks with missing values is <=5' THEN replace > the 'missing value' with the mean(hr_exer1-hr_exer52) of that individual; > IF  > 5 missing values within the variable group hr_exer1-hr_exer52 THEN > delete from dataset fitness1 and/or output to a different dataset called > data=alt_fitness1 */ > > data fitness1; >   array a-assign_mean(100)  num_wks_missing; >   do [ i ]  1 to 100; >     if num_wks_missing =< 5  &use_mean   /*CALL MACRO; >     else out=alt_fitness1; >   end; >   drop i; > run; > > proc print  data=fitness1; > run; > > /* part4: define the MACRO. */ > %macro use_mean; >   %LET use_mean = exer_mean; > /*use_mean is name of macro */ >   %DO  i  = hr_exer1  %TO  hr_exer52; >     %IF hr_exer(i) = '.   %THEN   %DO  hr_exer(i ) = exer_mean; >   %end; >   drop i; > %mend   use_mean; > > %put %    use_mean; > -- > balance  harmony  green  coexist


Back to: Top of message | Previous page | Main SAS-L page