| Date: | Wed, 31 Jul 2002 11:43:44 -0700 |
| Reply-To: | Pete Lund <pete.lund@NWCSR.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Pete Lund <pete.lund@NWCSR.COM> |
| Subject: | Re: Match Merging on Index function |
|
| In-Reply-To: | <2AE36192DC1E24479B195A804DE811B53D6B3D@at0exc00.autoconnect> |
| Content-Type: | text/plain; charset="iso-8859-1" |
Hi Brad-
There are blank spaces at the end of URL_STRING that are getting in the
way. Just add a TRIM() to your INDEX function and all is well:
proc sql;
create table done as
select y.*, d.*
from y.decoder d left join y.june y
on index(y.request,trim(d.url_string))>0
;quit;
----------------------------------------------------------------------------
---
Pete Lund
Northwest Crime and Social Research, Inc.
A SAS Alliance Partner
215 Legion Way SW
Olympia, WA 98501
(360) 528-8970 - voice
(360) 280-4892 - cell
(360) 570-7533 - fax
pete.lund@nwcsr.com
www.nwcsr.com
----------------------------------------------------------------------------
---
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
Goldman, Brad (AT-Atlanta)
Sent: Wednesday, July 31, 2002 11:29 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Match Merging on Index function
Dear friends:
I have what I think is an interesting problem in match-merging.
Dataset A (y.june, the log data, ~50,000 rows):
request count
------- -----
/mandash/dealer_services/report0205 15
/mandash/eggsaregreen/15.html 1
/mandash/dealer_reports/selectcas.html 3609
Dataset B (y.decoder, the lookup table, ~300 rows):
url_string Friendly name
--------- ---------------
/dealer_services/report0205 May 2002 Report
/mandash/eggsaregreen/15.html Green Eggs Report
selectcas.html Selection Screen
The idea is that if the request variable from A contains the url_string in
B, it's a match. So the final dataset should look like:
Friendly name: count:
-------------- ------
May 2002 Report 15
Green Eggs Report 1
Selection Screen 3609
Here is my code:
proc sql;
create table done as
select y.*, d.*
from y.decoder d left join y.june y
on index(y.request,d.url_string)>0
;quit;
There were no errors. But the only matches were when url_string actually
equalled request. In the example above, only the Green Eggs report would
have been a hit, the others got missing values. The index function didn't
work as expected.
So, I muttered to myself, maybe index is used differently within Proc SQL or
something... who knows. Why not do a dataset merge. It'll be good
practice, I muttered. (I do a lot of muttering.)
Except -- what "by varaible" to merge on. The variable aren't ever equal,
how can I merge on the index of another variable?
Any ideas? Am I missing some SQL trick?
-Brad
P.S. It is safe to assume that anything from B will match on either zero or
one rows of A.
|