Date: Wed, 8 Feb 2012 21:41:20 +0000
Reply-To: "Zdeb, Michael S" <mzdeb@ALBANY.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Zdeb, Michael S" <mzdeb@ALBANY.EDU>
Subject: Re: Compare One Variable to All Records of Another
In-Reply-To: <81F8139F381BE844AE05CA6525FF2AAE04C3797F@tpwd-mx9.tpwd.state.tx.us>
Content-Type: text/plain; charset="us-ascii"
hi ... that will take longer than the direct addressing in the code I suggested
if you use Google to search for 'dorfman sas key indexing' you'll come up with lots of hits for papers where Paul Dorfman discusses using arrays as I did in the array code, plus comparisons to using hash objects, plus a whole lot more
ps if you read Paul's papers, you'll inevitably run into the only SAS-related papers that I've read that include the word "propaedeutics"
Mike Zdeb
U@Albany School of Public Health
One University Place (Room 119)
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
________________________________________
From: Warren Schlechte [Warren.Schlechte@tpwd.state.tx.us]
Sent: Wednesday, February 08, 2012 1:06 PM
To: Zdeb, Michael S
Cc: SAS-L@LISTSERV.UGA.EDU
Subject: RE: Compare One Variable to All Records of Another
Mike,
This is an interesting approach.
I was thinking of something like this, but instead of placing a 1 in the
array vector, I was thinking of simply placing the numbers themselves
within an array in order of appearance, then looping through and
checking.
This looks like it works. I combined your idea with Joe's to get this.
data want_array2 (keep=curr);
array x(100) _temporary_ (100*0);
retain i 0;
/* Load the array */
do i=1 to nobsi;
set test nobs=nobsi;
x(i) = input(ref_doc,10.);
put x(i);
end;
/* For each observation load cur_doc */
do _r=1 to nobsr;
set test nobs=nobsr point = _r;
curr=cur_doc;
put curr;
/* Now compare to each element of the loaded array */
do _t=1 to nobst;
set test nobs=nobst point= _t;
if x(_t) eq input(curr,10.) then do;
put x(_t) curr;
output;
end;
end;
end;
run;
Warren Schlechte
Learn how you can help Texas State Parks
-----Original Message-----
From: Zdeb, Michael S [mailto:mzdeb@albany.edu]
Sent: Tuesday, February 07, 2012 9:47 PM
To: Warren Schlechte
Subject: RE: Compare One Variable to All Records of Another
hi ... avoiding the hash solution, but similar ...
make the look up table an array
(little simpler if REF_DOC and CUR_DOC are numeric variables)
data want (keep=cur_doc);
array x(100) _temporary_ (100*0);
do until (done1);
set test (where=(^missing(ref_doc))) end=done1;
x(input(ref_doc,10.)) = 1;
end;
do until (done2);
set test (where=(^missing(cur_doc))) end=done2;
if x(input(cur_doc,10.)) eq 1 then output;
end;
run;
you could also make the look up table a delimited character string ...
data want (keep=cur_doc);
length ref $100;
do until (done1);
set test (where=(^missing(ref_doc))) end=done1;
ref = catx('|',ref,ref_doc);
end;
do until (done2);
set test (where=(^missing(cur_doc))) end=done2;
if find(ref,trim(cur_doc)) then output;
end;
run;
Mike Zdeb
U@Albany School of Public Health
One University Place (Room 119)
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
________________________________________
From: SAS(r) Discussion [SAS-L@LISTSERV.UGA.EDU] on behalf of Warren
Schlechte [Warren.Schlechte@TPWD.STATE.TX.US]
Sent: Tuesday, February 07, 2012 5:47 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Compare One Variable to All Records of Another
Amazing! Thanks Joe!
I was trying to avoid the hash solution, because I have yet to
understand hash code.
I had created the merge code previously, so what I was truly looking for
was the one datastep solution.
Warren Schlechte
Learn how you can help Texas State Parks <http://bit.ly/sVdilb>
From: Joe Matise [mailto:snoopy369@gmail.com]
Sent: Tuesday, February 07, 2012 4:44 PM
To: Warren Schlechte
Cc: SAS-L@listserv.uga.edu
Subject: Re: Compare One Variable to All Records of Another
What are the rules for 'data step' solution?
A hash (data step) solution:
data want;
if _n_ = 1 then do;
format ref_doc $8.;
declare hash h(dataset:'test');
h.defineKey('ref_doc');
h.defineDone();
call missing(ref_doc);
end;
set test;
rc = h.find(key:cur_doc);
if rc=0 and not missing(cur_doc) then output;
keep cur_doc;
run;
Presumably you'd want to PROC SORT NODUPKEY it if you wanted only one
from your presumably larger real dataset per value, or else use a
hiter() to actually find the records and remove them from matching.
Merge solution of course is trivial:
proc sort data=test;
by cur_doc;
run;
proc sort data=test out=test1;
by ref_doc;
run;
data want;
merge test(in=a) test1(in=b drop=cur_doc rename=ref_doc=cur_doc);
by cur_doc;
if a and b;
if not missing(ref_doc);
run;
If you want a pure, one data step solution, which roughly mirrors what
you suggested, you can use
data want;
set test;
if not missing(ref_doc);
do _t = 1 to nobst;
set test(keep=ref_doc rename=ref_doc=test_doc) point=_t nobs=nobst;
if cur_doc=test_doc then keepvar=1;
end;
if keepvar;
drop keepvar;
run;
but that would be presumably the slowest of the solutions (Depending on
how long it takes to sort the table, the merge might be slower I
suppose; but either way the INTERSECT or the Hash solution should be
faster).
-Joe
On Tue, Feb 7, 2012 at 4:30 PM, Warren Schlechte
<Warren.Schlechte@tpwd.state.tx.us> wrote:
Here's an easy datastep question that has me stumped.
I have the following dataset.
data test;
input cur_doc $ ref_doc $;
datalines;
11 11
. .
12 22
21 21
22 .
. 23
;
run;
I want to output a record of which Cur_doc have a matching Ref_doc, but
the match doesn't have to be in the same line.
So, for this test data, I would want the following output:
Cur_Doc
11
21
22
.
I was able to do this using Proc SQL code and the Intersect keyword.
* Found in both columns;
PROC SQL;
CREATE TABLE intersection AS
SELECT t1.'CUR_DOC'n
FROM test as t1
INTERSECT
SELECT t2.'REF_DOC'n
FROM test as t2
;
QUIT;
However, I feel I should also be able to do this in the data step. I
thought I could come up with a Do loop solution of the type:
Data out;
Set test; /* Read in Each Record Through Implicit Looping*/
Do I = 1 to last.record;
Set test; /* For each implicit record, read in all records through
explicit looping*/
/* Compare each record from the outside/implicit loop with every
record in the inside/explicit loop */
/* If we find a match, output it */
End; /* All records explicitly*/
Run; /* each record implicitly */
But I have failed.
Thanks for your help.
Warren Schlechte
HOH Fisheries Science Center
5103 Junction Hwy
Mt. Home, TX 78058
Phone 830.866.3356 x214 <tel:830.866.3356%20x214>
Fax 830.866.3549
Learn how you can help Texas State Parks <http://bit.ly/sVdilb>