LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (February 2012, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 9 Feb 2012 13:53:14 +0000
Reply-To:     "Zdeb, Michael S" <mzdeb@ALBANY.EDU>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Zdeb, Michael S" <mzdeb@ALBANY.EDU>
Subject:      Re: Compare One Variable to All Records of Another
Comments: To: Warren Schlechte <Warren.Schlechte@TPWD.STATE.TX.US>
In-Reply-To:  <81F8139F381BE844AE05CA6525FF2AAE129CCE@tpwd-mx9.tpwd.state.tx.us>
Content-Type: text/plain; charset="us-ascii"

hi ... you can get rid of the explicit loop if you tell SAS to only execute the first loop once

otherwise, it populates the array on each pass through the data step and since you have already hit the end of the data set on the first pass, the data step thinks it is done when you try to read TEST a second time

so ... (I also got rid of the RETAIN, not sure what is was doing since the first loop gives the variable i a starting point of 1)

data want (keep=curr); array x(100) _temporary_ (100*0);

if _n_ eq 1 then do i=1 to nobsi; set test nobs=nobsi; x(i) = input(ref_doc,10.); put x(i); end;

set test (rename=(cur_doc=curr));

put curr; do _t=1 to nobst; set test nobs=nobst point= _t; if x(_t) eq input(curr,10.) then do; put x(_t) curr; output; end; end;

run;

Mike Zdeb U@Albany School of Public Health One University Place (Room 119) Rensselaer, New York 12144-3456 P/518-402-6479 F/630-604-1475

________________________________________ From: SAS(r) Discussion [SAS-L@LISTSERV.UGA.EDU] on behalf of Warren Schlechte [Warren.Schlechte@TPWD.STATE.TX.US] Sent: Thursday, February 09, 2012 8:04 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Compare One Variable to All Records of Another

Mike,

I don't doubt it since my version takes 3 passes through the data.

I was curious. Is there some way to have the second loop (the _r loop) implement implicitly? Right now all three loops are explicit. but it seems I should be able to use SAS' implicit looping. However, ever piece of code I have tried stops the implicit loop before all the elements are read. After reading several of the SUGI papers, and attending one of Toby's DOW talks, I thought I should be able to have an implicit second loop.

Warren

-----Original Message----- From: Zdeb, Michael S [mailto:mzdeb@albany.edu] Sent: Wed 2/8/2012 3:41 PM To: Warren Schlechte Cc: SAS-L@LISTSERV.UGA.EDU Subject: RE: Compare One Variable to All Records of Another

hi ... that will take longer than the direct addressing in the code I suggested

if you use Google to search for 'dorfman sas key indexing' you'll come up with lots of hits for papers where Paul Dorfman discusses using arrays as I did in the array code, plus comparisons to using hash objects, plus a whole lot more

ps if you read Paul's papers, you'll inevitably run into the only SAS-related papers that I've read that include the word "propaedeutics"

Mike Zdeb U@Albany School of Public Health One University Place (Room 119) Rensselaer, New York 12144-3456 P/518-402-6479 F/630-604-1475

________________________________________ From: Warren Schlechte [Warren.Schlechte@tpwd.state.tx.us] Sent: Wednesday, February 08, 2012 1:06 PM To: Zdeb, Michael S Cc: SAS-L@LISTSERV.UGA.EDU Subject: RE: Compare One Variable to All Records of Another

Mike,

This is an interesting approach.

I was thinking of something like this, but instead of placing a 1 in the array vector, I was thinking of simply placing the numbers themselves within an array in order of appearance, then looping through and checking.

This looks like it works. I combined your idea with Joe's to get this.

data want_array2 (keep=curr); array x(100) _temporary_ (100*0); retain i 0; /* Load the array */ do i=1 to nobsi; set test nobs=nobsi; x(i) = input(ref_doc,10.); put x(i); end; /* For each observation load cur_doc */ do _r=1 to nobsr; set test nobs=nobsr point = _r; curr=cur_doc; put curr; /* Now compare to each element of the loaded array */ do _t=1 to nobst; set test nobs=nobst point= _t; if x(_t) eq input(curr,10.) then do; put x(_t) curr; output; end; end; end; run;

Warren Schlechte Learn how you can help Texas State Parks

-----Original Message----- From: Zdeb, Michael S [mailto:mzdeb@albany.edu] Sent: Tuesday, February 07, 2012 9:47 PM To: Warren Schlechte Subject: RE: Compare One Variable to All Records of Another

hi ... avoiding the hash solution, but similar ...

make the look up table an array (little simpler if REF_DOC and CUR_DOC are numeric variables)

data want (keep=cur_doc); array x(100) _temporary_ (100*0); do until (done1); set test (where=(^missing(ref_doc))) end=done1; x(input(ref_doc,10.)) = 1; end; do until (done2); set test (where=(^missing(cur_doc))) end=done2; if x(input(cur_doc,10.)) eq 1 then output; end; run;

you could also make the look up table a delimited character string ...

data want (keep=cur_doc); length ref $100; do until (done1); set test (where=(^missing(ref_doc))) end=done1; ref = catx('|',ref,ref_doc); end; do until (done2); set test (where=(^missing(cur_doc))) end=done2; if find(ref,trim(cur_doc)) then output; end; run;

Mike Zdeb U@Albany School of Public Health One University Place (Room 119) Rensselaer, New York 12144-3456 P/518-402-6479 F/630-604-1475

________________________________________ From: SAS(r) Discussion [SAS-L@LISTSERV.UGA.EDU] on behalf of Warren Schlechte [Warren.Schlechte@TPWD.STATE.TX.US] Sent: Tuesday, February 07, 2012 5:47 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Compare One Variable to All Records of Another

Amazing! Thanks Joe!

I was trying to avoid the hash solution, because I have yet to understand hash code.

I had created the merge code previously, so what I was truly looking for was the one datastep solution.

Warren Schlechte

Learn how you can help Texas State Parks <http://bit.ly/sVdilb>

From: Joe Matise [mailto:snoopy369@gmail.com] Sent: Tuesday, February 07, 2012 4:44 PM To: Warren Schlechte Cc: SAS-L@listserv.uga.edu Subject: Re: Compare One Variable to All Records of Another

What are the rules for 'data step' solution?

A hash (data step) solution:

data want;

if _n_ = 1 then do;

format ref_doc $8.;

declare hash h(dataset:'test');

h.defineKey('ref_doc');

h.defineDone();

call missing(ref_doc);

end;

set test;

rc = h.find(key:cur_doc);

if rc=0 and not missing(cur_doc) then output;

keep cur_doc;

run;

Presumably you'd want to PROC SORT NODUPKEY it if you wanted only one from your presumably larger real dataset per value, or else use a hiter() to actually find the records and remove them from matching.

Merge solution of course is trivial:

proc sort data=test;

by cur_doc;

run;

proc sort data=test out=test1;

by ref_doc;

run;

data want;

merge test(in=a) test1(in=b drop=cur_doc rename=ref_doc=cur_doc);

by cur_doc;

if a and b;

if not missing(ref_doc);

run;

If you want a pure, one data step solution, which roughly mirrors what you suggested, you can use

data want;

set test;

if not missing(ref_doc);

do _t = 1 to nobst;

set test(keep=ref_doc rename=ref_doc=test_doc) point=_t nobs=nobst;

if cur_doc=test_doc then keepvar=1;

end;

if keepvar;

drop keepvar;

run;

but that would be presumably the slowest of the solutions (Depending on how long it takes to sort the table, the merge might be slower I suppose; but either way the INTERSECT or the Hash solution should be faster).

-Joe

On Tue, Feb 7, 2012 at 4:30 PM, Warren Schlechte <Warren.Schlechte@tpwd.state.tx.us> wrote:

Here's an easy datastep question that has me stumped.

I have the following dataset.

data test; input cur_doc $ ref_doc $; datalines; 11 11 . . 12 22 21 21 22 . . 23 ; run;

I want to output a record of which Cur_doc have a matching Ref_doc, but the match doesn't have to be in the same line.

So, for this test data, I would want the following output:

Cur_Doc 11 21 22 .

I was able to do this using Proc SQL code and the Intersect keyword. * Found in both columns; PROC SQL; CREATE TABLE intersection AS SELECT t1.'CUR_DOC'n FROM test as t1 INTERSECT SELECT t2.'REF_DOC'n FROM test as t2 ; QUIT;

However, I feel I should also be able to do this in the data step. I thought I could come up with a Do loop solution of the type:

Data out; Set test; /* Read in Each Record Through Implicit Looping*/ Do I = 1 to last.record; Set test; /* For each implicit record, read in all records through explicit looping*/ /* Compare each record from the outside/implicit loop with every record in the inside/explicit loop */ /* If we find a match, output it */ End; /* All records explicitly*/ Run; /* each record implicitly */

But I have failed.

Thanks for your help.

Warren Schlechte HOH Fisheries Science Center 5103 Junction Hwy Mt. Home, TX 78058 Phone 830.866.3356 x214 <tel:830.866.3356%20x214> Fax 830.866.3549

Learn how you can help Texas State Parks <http://bit.ly/sVdilb>


Back to: Top of message | Previous page | Main SAS-L page