|
On Thu, 4 Sep 2008 16:18:42 -0700, approximatelynormal
<jefhuntington@GMAIL.COM> wrote:
>On Sep 4, 11:03 am, keith_w_lar...@YAHOO.COM ("Keith W. Larson")
>wrote:
>> Dear List,
>>
>> I have a data set where there are multiple records (observations) that I
would like summarized into a single record (observation). If there is more
than one encounter with a species on a given day, I would like the
observations merged so that there is only one record for each species each
day. The field "status" should allow for more than one status code versus a
single one the way the record is currently stored. I have attached a sample
data set below and an example of what I would like after it is summarized.
Thank you for you assistance!
>>
>> Cheers,
>> Keith
>>
>> data test;
>> input date species $ status $;
>> cards;
>> 20080801 AUWA B
>> 20080801 AMRO E
>> 20080801 AMRO S
>> 20080802 AUWA E
>> 20080803 OCWA E
>> 20080803 AUWA S
>> 20080803 YWAR S
>> 20080804 AUWA S
>> 20080804 AUWA B
>> 20080804 AUWA C
>> 20080804 FOSP E
>> 20080805 GCSP S
>> 20080805 GCKI E
>> 20080805 TRES B
>> 20080805 TRES E
>> run;
>>
>> What I would like is the data summarized as:
>>
>> 20080801 AUWA B
>> 20080801 AMRO ES
>> 20080802 AUWA E
>> 20080803 OCWA E
>> 20080803 AUWA S
>> 20080803 YWAR S
>> 20080804 AUWA SBC
>> 20080804 FOSP E
>> 20080805 GCSP S
>> 20080805 GCKI E
>> 20080805 TRES BE
>
>You could also transpose the data:
>
>proc sort; by date species status; run;
>
>proc transpose data = test out=t_test;
> by date species;
> var status;
>run;
>
>data t_test2;
> set t_test;
> length new_status $ 8;
> new_status = compress(col1,' ') || compress(col2,' ') ||
>compress(col3,' ');
>run;
In the data sample there were at most three status codes per group, but it's
probably not good to assume that limitation. A more general formula is
new_status = catt (of col : );
Of course this has its own implicit assumption: absence of any unrelated
variables starting with COL.
|