Date: Thu, 9 Feb 2012 13:53:14 +0000
Reply-To: "Zdeb, Michael S" <mzdeb@ALBANY.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Zdeb, Michael S" <mzdeb@ALBANY.EDU>
Subject: Re: Compare One Variable to All Records of Another
In-Reply-To: <81F8139F381BE844AE05CA6525FF2AAE129CCE@tpwd-mx9.tpwd.state.tx.us>
Content-Type: text/plain; charset="us-ascii"
hi ... you can get rid of the explicit loop if you tell SAS to only execute the first loop once
otherwise, it populates the array on each pass through the data step and since you have already hit the end of the data set on the first pass, the data step thinks it is done when you try to read TEST a second time
so ... (I also got rid of the RETAIN, not sure what is was doing since the first loop gives the variable i a starting point of 1)
data want (keep=curr);
array x(100) _temporary_ (100*0);
if _n_ eq 1 then do i=1 to nobsi;
set test nobs=nobsi;
x(i) = input(ref_doc,10.);
put x(i);
end;
set test (rename=(cur_doc=curr));
put curr;
do _t=1 to nobst;
set test nobs=nobst point= _t;
if x(_t) eq input(curr,10.) then do;
put x(_t) curr;
output;
end;
end;
run;
Mike Zdeb
U@Albany School of Public Health
One University Place (Room 119)
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
________________________________________
From: SAS(r) Discussion [SAS-L@LISTSERV.UGA.EDU] on behalf of Warren Schlechte [Warren.Schlechte@TPWD.STATE.TX.US]
Sent: Thursday, February 09, 2012 8:04 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Compare One Variable to All Records of Another
Mike,
I don't doubt it since my version takes 3 passes through the data.
I was curious. Is there some way to have the second loop (the _r loop) implement implicitly? Right now all three loops are explicit. but it seems I should be able to use SAS' implicit looping. However, ever piece of code I have tried stops the implicit loop before all the elements are read. After reading several of the SUGI papers, and attending one of Toby's DOW talks, I thought I should be able to have an implicit second loop.
Warren
-----Original Message-----
From: Zdeb, Michael S [mailto:mzdeb@albany.edu]
Sent: Wed 2/8/2012 3:41 PM
To: Warren Schlechte
Cc: SAS-L@LISTSERV.UGA.EDU
Subject: RE: Compare One Variable to All Records of Another
hi ... that will take longer than the direct addressing in the code I suggested
if you use Google to search for 'dorfman sas key indexing' you'll come up with lots of hits for papers where Paul Dorfman discusses using arrays as I did in the array code, plus comparisons to using hash objects, plus a whole lot more
ps if you read Paul's papers, you'll inevitably run into the only SAS-related papers that I've read that include the word "propaedeutics"
Mike Zdeb
U@Albany School of Public Health
One University Place (Room 119)
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
________________________________________
From: Warren Schlechte [Warren.Schlechte@tpwd.state.tx.us]
Sent: Wednesday, February 08, 2012 1:06 PM
To: Zdeb, Michael S
Cc: SAS-L@LISTSERV.UGA.EDU
Subject: RE: Compare One Variable to All Records of Another
Mike,
This is an interesting approach.
I was thinking of something like this, but instead of placing a 1 in the
array vector, I was thinking of simply placing the numbers themselves
within an array in order of appearance, then looping through and
checking.
This looks like it works. I combined your idea with Joe's to get this.
data want_array2 (keep=curr);
array x(100) _temporary_ (100*0);
retain i 0;
/* Load the array */
do i=1 to nobsi;
set test nobs=nobsi;
x(i) = input(ref_doc,10.);
put x(i);
end;
/* For each observation load cur_doc */
do _r=1 to nobsr;
set test nobs=nobsr point = _r;
curr=cur_doc;
put curr;
/* Now compare to each element of the loaded array */
do _t=1 to nobst;
set test nobs=nobst point= _t;
if x(_t) eq input(curr,10.) then do;
put x(_t) curr;
output;
end;
end;
end;
run;
Warren Schlechte
Learn how you can help Texas State Parks
-----Original Message-----
From: Zdeb, Michael S [mailto:mzdeb@albany.edu]
Sent: Tuesday, February 07, 2012 9:47 PM
To: Warren Schlechte
Subject: RE: Compare One Variable to All Records of Another
hi ... avoiding the hash solution, but similar ...
make the look up table an array
(little simpler if REF_DOC and CUR_DOC are numeric variables)
data want (keep=cur_doc);
array x(100) _temporary_ (100*0);
do until (done1);
set test (where=(^missing(ref_doc))) end=done1;
x(input(ref_doc,10.)) = 1;
end;
do until (done2);
set test (where=(^missing(cur_doc))) end=done2;
if x(input(cur_doc,10.)) eq 1 then output;
end;
run;
you could also make the look up table a delimited character string ...
data want (keep=cur_doc);
length ref $100;
do until (done1);
set test (where=(^missing(ref_doc))) end=done1;
ref = catx('|',ref,ref_doc);
end;
do until (done2);
set test (where=(^missing(cur_doc))) end=done2;
if find(ref,trim(cur_doc)) then output;
end;
run;
Mike Zdeb
U@Albany School of Public Health
One University Place (Room 119)
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
________________________________________
From: SAS(r) Discussion [SAS-L@LISTSERV.UGA.EDU] on behalf of Warren
Schlechte [Warren.Schlechte@TPWD.STATE.TX.US]
Sent: Tuesday, February 07, 2012 5:47 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Compare One Variable to All Records of Another
Amazing! Thanks Joe!
I was trying to avoid the hash solution, because I have yet to
understand hash code.
I had created the merge code previously, so what I was truly looking for
was the one datastep solution.
Warren Schlechte
Learn how you can help Texas State Parks <http://bit.ly/sVdilb>
From: Joe Matise [mailto:snoopy369@gmail.com]
Sent: Tuesday, February 07, 2012 4:44 PM
To: Warren Schlechte
Cc: SAS-L@listserv.uga.edu
Subject: Re: Compare One Variable to All Records of Another
What are the rules for 'data step' solution?
A hash (data step) solution:
data want;
if _n_ = 1 then do;
format ref_doc $8.;
declare hash h(dataset:'test');
h.defineKey('ref_doc');
h.defineDone();
call missing(ref_doc);
end;
set test;
rc = h.find(key:cur_doc);
if rc=0 and not missing(cur_doc) then output;
keep cur_doc;
run;
Presumably you'd want to PROC SORT NODUPKEY it if you wanted only one
from your presumably larger real dataset per value, or else use a
hiter() to actually find the records and remove them from matching.
Merge solution of course is trivial:
proc sort data=test;
by cur_doc;
run;
proc sort data=test out=test1;
by ref_doc;
run;
data want;
merge test(in=a) test1(in=b drop=cur_doc rename=ref_doc=cur_doc);
by cur_doc;
if a and b;
if not missing(ref_doc);
run;
If you want a pure, one data step solution, which roughly mirrors what
you suggested, you can use
data want;
set test;
if not missing(ref_doc);
do _t = 1 to nobst;
set test(keep=ref_doc rename=ref_doc=test_doc) point=_t nobs=nobst;
if cur_doc=test_doc then keepvar=1;
end;
if keepvar;
drop keepvar;
run;
but that would be presumably the slowest of the solutions (Depending on
how long it takes to sort the table, the merge might be slower I
suppose; but either way the INTERSECT or the Hash solution should be
faster).
-Joe
On Tue, Feb 7, 2012 at 4:30 PM, Warren Schlechte
<Warren.Schlechte@tpwd.state.tx.us> wrote:
Here's an easy datastep question that has me stumped.
I have the following dataset.
data test;
input cur_doc $ ref_doc $;
datalines;
11 11
. .
12 22
21 21
22 .
. 23
;
run;
I want to output a record of which Cur_doc have a matching Ref_doc, but
the match doesn't have to be in the same line.
So, for this test data, I would want the following output:
Cur_Doc
11
21
22
.
I was able to do this using Proc SQL code and the Intersect keyword.
* Found in both columns;
PROC SQL;
CREATE TABLE intersection AS
SELECT t1.'CUR_DOC'n
FROM test as t1
INTERSECT
SELECT t2.'REF_DOC'n
FROM test as t2
;
QUIT;
However, I feel I should also be able to do this in the data step. I
thought I could come up with a Do loop solution of the type:
Data out;
Set test; /* Read in Each Record Through Implicit Looping*/
Do I = 1 to last.record;
Set test; /* For each implicit record, read in all records through
explicit looping*/
/* Compare each record from the outside/implicit loop with every
record in the inside/explicit loop */
/* If we find a match, output it */
End; /* All records explicitly*/
Run; /* each record implicitly */
But I have failed.
Thanks for your help.
Warren Schlechte
HOH Fisheries Science Center
5103 Junction Hwy
Mt. Home, TX 78058
Phone 830.866.3356 x214 <tel:830.866.3356%20x214>
Fax 830.866.3549
Learn how you can help Texas State Parks <http://bit.ly/sVdilb>