LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 1998, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 4 May 1998 16:59:14 -0400
Reply-To:     "Dorfman, Paul" <pdorfma@UCS.ATT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         "Dorfman, Paul" <pdorfma@UCS.ATT.COM>
Subject:      Compare Variables from 2 data sets
Content-Type: text/plain; charset="iso-8859-1"

<abdu.elnagheeb@NATIONSBANK.COM>, in particular, wrote: ... I hope someone will respond to this question which is best asked using an example. I have 2 data sets (A and B) (sorted for convenience) as below:

Data A : has one variable (X): 5, 9, 11, 23, 29, 35 Data B : has one variable (Y): 3, 10, 15, 19, 27, 31, 37, 43

I want to do the following. For EVERY observation in data A I want to compare X to Y (in data B) and create a variable that will be the # of times that that OBSERVATION (of X) was less or equal to (<=) Y (all observations). In our example, comparing the value X=5 to all the Y values, I will get that X=5 <= Y 7 times (observations), 9 <= Y 7 times, 11 <= Y 6 times, 23 <= Y 4 times ...etc. Hence, I want to get a data set (C) that will look like this:

data C:

X NewVar 5 7 9 7 11 6 23 4 29 3 35 2

I have used arrays. The problem is, the number of observations in data A and B are in thousands (> 20,000) and I ran into problems because of that. What would the most efficient way of doing this? ...

Abed,

For your task, I still don't see any viable alternative to using an array. Resorting to ANY other approach would be equivalent to having to read at least more than 20000*20000=400 million records file.

If your problem has been running out of memory, its cure appears hazy without having seen the code and/or knowing what your platform is. I've never experienced a difficulty to allocate such small an array (since your variables are numeric, it occupies only 164K of task memory) under Windows, to say nothing of MVS. However, let me suggest a couple of things:

1. Does the fact that you have used the word "arrays", in plural, indicate that in your code, you have allocated more that one array? If so, it is unnecessary, since only one is needed.

2. Have you used non-temporary array(s)? If so, don't. A quick look at FULLSTIMER diagnostics reveals

Task memory - 1737K (949K data, 788K program) Task memory - 952K (164K data, 788K program)

for a non-temporary and temporary array of 20000 elements, respectively.

So, my code would look like

DATA _NULL_; IF 0 THEN SET Y NOBS=YNOBS; CALL SYMPUT('YNOBS',LEFT(PUT(YNOBS,BEST.))); STOP; RUN;

DATA COMPARE(KEEP=X COUNT); ARRAY AY(&YNOBS) _TEMPORARY_; IF _N_ = 1 THEN DO I=1 TO YNOBS; SET Y NOBS=YNOBS; AY(I) = Y; END; SET X; COUNT = 0; DO I=1 TO DIM(AY); IF X <= AY(I) THEN COUNT + 1; END; RUN;

I've tested it with your test data, and it results into

OBS X COUNT

1 5 7 2 9 7 3 11 6 4 23 4 5 29 3 6 35 2 .

To demonstrate that the approach is viable for "large" files, let's simulate them using a random generator:

DATA X; DROP I; DO I=1 TO 20000; X = CEIL(RANUNI(0)*100); OUTPUT; END; DATA Y; DROP I; DO I=1 TO 20000; Y = CEIL(RANUNI(0)*100); OUTPUT; END;

Running the 'compare' program against the datasets created in the above manner under OS/390 6.09e results into the following log message:

NOTE: The data set WORK.COMPARE has 20000 observations and 2 variables. NOTE: The DATA statement used 289.59 CPU seconds and 3944K.

Under W95 (166 MHz, 32M) it's not as speedy but still implementable:

NOTE: The data set WORK.COMPARE has 20000 observations and 2 variables. NOTE: The DATA statement used 17 minutes 56.03 seconds.

I hope it may help you.

Regards,

Paul.

++++++++++++++++++++++++++++++ Paul M. Dorfman Citibank UCS Decision Support Systems Jacksonville, FL ++++++++++++++++++++++++++++++


Back to: Top of message | Previous page | Main SAS-L page