Date: Mon, 27 Aug 2007 17:13:15 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: Matching Variables
On Sun, 26 Aug 2007 18:55:58 -0400, Randy <randistan69@HOTMAIL.COM> wrote:
>All:
>
>Here is my Basic data Set.
>
>VarA VarB VarC Time ONumber VarD VarE
>1 A B 10:55:01 100
>1 A S 10:55:01 101
>1 A S 10:55:01 102
>1 A B 10:55:01 103
> B 10:55:01 102 103
>2 A S 10:55:02 101
> B 10:55:02 101 103
>
>I want to create two new variables, VarF and VarG.
>When VarB = B, VarD should be matched with the ONumber and VarF should be
>populated with the Time of the ONumber under VarD. Similarly, when VarB=
>B, VarE should be matched with the ONumber and VarG should be populated
>with the Time of the ONumber under VarE. My manipulated data set should
>therefore look like this:
>
>
>VarA VarB VarC Time ONumber VarD VarE VarF VarG
>1 A B 10:55:01 100
>1 A S 10:55:01 101
>1 A S 10:55:01 102
>1 A B 10:55:01 103
> B 10:55:01 102 103 10:55:01 10:55:01
>2 A S 10:55:02 101
> B 10:55:02 101 103 10:55:02 10:55:01
>Note: ONumber 101 has been modified at 10:55:02 (indicated by VarA=2) so
>the time for 0Number 101 = 10:55:02 and not 10:55:01.
> Please Help.
> Randy
I agree with Sig about the desirability of splitting this table into two,
based on the VarB values (A,B). I'll go a step beyond and suggest that the
VarB=B part should then be transposed so that each Vard/VarE pair becomes
two rows x 1 column.
Nevertheless, the computations can be made from the given structure.
First load the data:
data basic;
infile cards missover;
informat VarA 1.VarB VarC $1. Time time8.;
format Time time8.;
input
VarA VarB VarC Time ONumber VarD VarE; cards;
1 A B 10:55:01 100
1 A S 10:55:01 101
1 A S 10:55:01 102
1 A B 10:55:01 103
. B . 10:55:01 . 102 103
2 A S 10:55:02 101
. B . 10:55:02 . 101 103
;
Now derive the new columns using SQL CASE structures and correlated subqueries:
proc sql;
create table want as
select *
, case when VarB='B'
then (select Time
from basic
where ONumber = outer.VarD
having VarA = max(VarA)
)
else .
end format=time8. as VarF
, case when VarB='B'
then (select Time
from basic
where ONumber = outer.VarE
having VarA = max(VarA)
)
else .
end format=time8. as VarG
from basic as outer;
quit;
The code is repetitious because of the denormalized structure.
Output:
VarA VarB VarC Time ONumber VarD VarE VarF VarG
1 A B 10:55:01 100 . . . .
1 A S 10:55:01 101 . . . .
1 A S 10:55:01 102 . . . .
1 A B 10:55:01 103 . . . .
. B 10:55:01 . 102 103 10:55:01 10:55:01
2 A S 10:55:02 101 . . . .
. B 10:55:02 . 101 103 10:55:02 10:55:01