Date: Thu, 26 Oct 2000 01:53:16 -0700
Reply-To: kmself@IX.NETCOM.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: kmself@IX.NETCOM.COM
Subject: Re: data step merge --> sql join
In-Reply-To: <39F7BA2A.29011A81@ix.netcom.com>; from julierog@IX.NETCOM.COM on
Thu, Oct 26, 2000 at 05:00:35AM +0000
Content-Type: multipart/signed; micalg=pgp-sha1;
protocol="application/pgp-signature";
on Thu, Oct 26, 2000 at 05:00:35AM +0000, Roger Lustig (julierog@IX.NETCOM.COM) wrote:
> Hi, Karsten!
>
> Not quite. The DATA step is doing a left join, not an inner join.
> (See annotation in DATA step below.) Also, the match-type counters
> require a full join, so a full join with a WHERE= in the
> data set options for set_c might come a little closer. Even then,
> doing a PUT of record counts to the log would probably require a
> separate query, one with the INTO :<macro_var> feature.
Thanks. I've not been using SAS for about a year, still playing with
SQL elsewhere. Getting rusty. Thought I'd post a non-rant for a change
(but don't get used to it <g>). Many people aren't aware of the
CASE/WHEN/ELSE/END construct in PROC SQL (and surprisingly many don't
use SQL...). Yet another reason not to take SAS-L responses quite at
face value. Surprised someone (named Sig?) didn't step in on this one.
Good catch on the left v. full join, I miss that frequently until I go
about reconciling records -- data step is rather better at accounting
than SQL.
My rule of thumb is that first. and last. processing are often kludges
around something else. Frequently they're proxies for GROUP BY coupled
with a SELECT DISTINCT, to summarize data. Looks as if someone's trying
to do some funkiness with the "var2_a" bit. I might mop up that and a
few other details with a data step view before or after the SQL, if they
were truly necessary.
Combined effects (X1, X2) can be used by computing an intermediate
value, though I'd have to check to see whether CASE/WHEN/ELSE supports
the CALCULATED keyword.
> This does a little more of the job:
>
> proc sql;
> CREATE TABLE set_c AS
> SELECT
> *
> , CASE var1_a
> WHEN 0 THEN var1_b
> WHEN 1 THEN var2_b
> WHEN 2 THEN var3_b
> ELSE .
> END /* <--------------------- gotta have one of these!
> */
> AS newvar
> /* etc. */
> FROM
> set_a
> LEFT JOIN
> set_b
> ON set_a.byvar1 eq set_b.byvar1
> and
> set_a.byvar2 eq set_b.byvar2
>
> ;
> quit;
>
> But that /* etc. */ is the tough part!
>
> There's no explicit way to do FIRST.<var> and LAST.<var> processing
> in a SQL query, because tables are (conceptually) unordered. The only
> solution is to make an identifier variable for the record that will be
> "first.".
>
> Also, unless there's a variable in each input table that's never
> missing,
> it's hard to test for the presence of one or the other input table
> in a particular row of the joined table (the full-join bit that
> increments
> the counters).
>
> This sort of join-plus-other-stuff is a fine example of the flexibility
> of the DATA step with the MERGE statement, if you ask me...when you
> learn
> SQL, you get those cool examples of one join doing the same thing as
> five DATA steps, four PROC SORTs, and two PROC SUMMARYs. To do all the
> things Michael accomplishes in his DATA step, you need two DATA steps
> to set up the identifiers for FIRSTness, then a left join to make the
> data
> set, then a full join to get the counts, possibly some macro stuff to
> get
> the counts into and out of macro variables and into the log, etc.
>
> Unless I'm missing something really obvious...
>
> Roger
>
> kmself@IX.NETCOM.COM wrote:
> >
> > on Mon, Oct 23, 2000 at 01:21:15PM -0400, Schulingkamp, Michael E (michael.e.schulingkamp@LMCO.COM) wrote:
> > > I'm looking for some help with translating a data step merge into an sql
> > > join. I'm not too familiar with proc sql but I've taken a stab at putting a
> > > shell together. I just need help filling in the pieces. Here is a
> > > simplified version of the code that I'm working with...
> > >
> > > set_a set_b
> > > -------- --------
> > > byvar1 byvar1
> > > byvar2 byvar2
> > > var1_a var1_b
> > > var2_a var2_b
> > > var3_b
> > >
> > > data set_c ;
> > > merge set_a (in=x1) set_b (in=x2) end=last ;
> > > by byvar1 byvar2 ;
> > >
> > > if (var1_a=0) then newvar = var1_b ;
> > > else if (var1_a=1) then newvar = var2_b ;
> > > else if (var1_a=2) then newvar = var3_b ;
> > >
> > > if (x1=1 & x2=1) then match+1 ; /* < --- This part is full-joinish! */
> > > else if (x1=1 & x2=0) then in_a+1 ;
> > > else if (x1=0 & x2=1) then in_b+1 ;
> > > else if (x1=0 & x2=0) then nomatch+1 ; /* hopefully this never happens! */
> > >
> > > if (first.byvar2=1) then var2_a = var2_a ;
> > > else var2_a = . ;
> > >
> > > if (x1=1) then output set_c ; /* <----------------Left join! */
> > >
> > > if (last=1) then put match= nomatch= in_a= in_b= ;
> > > run;
> >
> > proc sql;
> > create table c as
> > select
> > *
> > , case var1_a
> > when 0 then var1_b
> > when 1 then var2_b
> > when 3 then var3_b
> > else .
> > as newvar
> > /* etc. */
> > from set_a, set_b
> > where set_a.byvar eq set_b.byvar
> > ;
> > quit;
> >
> > --
> > Karsten M. Self <kmself@ix.netcom.com> http://www.netcom.com/~kmself
> > Evangelist, Opensales, Inc. http://www.opensales.org
> > What part of "Gestalt" don't you understand? There is no K5 cabal
> > http://gestalt-system.sourceforge.net/ http://www.kuro5hin.org
> > GPG fingerprint: F932 8B25 5FDD 2528 D595 DC61 3847 889F 55F2 B9B0
> >
> > ------------------------------------------------------------------------
> > Part 1.2Type: application/pgp-signature
--
Karsten M. Self <kmself@ix.netcom.com> http://www.netcom.com/~kmself
Evangelist, Opensales, Inc. http://www.opensales.org
What part of "Gestalt" don't you understand? There is no K5 cabal
http://gestalt-system.sourceforge.net/ http://www.kuro5hin.org
GPG fingerprint: F932 8B25 5FDD 2528 D595 DC61 3847 889F 55F2 B9B0
[application/pgp-signature]
|