Date: Fri, 9 Mar 2001 18:44:29 -0600
Reply-To: aldi@wubios.wustl.edu
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Aldi Kraja <aldi@WUBIOS.WUSTL.EDU>
Organization: Washington University
Subject: How big must be the sample not to have missing data?
Content-Type: text/plain; charset=us-ascii
Hi,
1. The following is a group of data produced with the use of ranuni
function.
Although it is tested to increase the max of i into 10000 in the second
data set still
the output is with missing matching values from the large data set.
2. In the output present there are no replications of the same
observation twice
or more for the age variable. But if they were how can one eliminate
them from
the same step of joining. It is used distinct: is it the only way in
SQL?
3. I was expecting the printing to be direct. What is the rule in SAS
SQL to get the printing
of the results without writing the proc print?
Any suggestions?
TIA,
Aldi
Final joined data
Obs IDA IDB AGEA AGEB WEIGHTA WEIGHTB
minDIFF
1 1 7220 34.6465 34.6466 95.8172 84.4527
.000091582
2 2 . 25.9712 . 89.8975 . .
3 3 7872 31.0139 31.0137 89.1686 81.6122
.000223857
4 4 4486 31.6921 31.6915 90.5763 72.0314
.000644265
5 5 4207 33.1614 33.1624 86.0535 75.7580
.000971482
6 6 2949 28.6512 28.6563 92.5809 83.8870
.005163798
7 7 . 25.0493 . 91.7295 . .
8 8 4674 31.3580 31.3578 96.5045 89.9858
.000137151
9 9 5381 34.9812 34.9812 85.5097 73.2802
.000007269
10 10 4664 31.0011 31.0028 86.6800 74.8549
.001720163
11 11 9047 32.9660 32.9660 83.0533 70.7210
.000071195
12 12 5370 31.5261 31.5277 99.6123 80.2764
.001586024
13 13 . 26.7580 . 83.4328 . .
14 14 4660 29.1892 29.1891 96.9098 83.8352
.000092419
15 15 3966 28.2025 28.2026 92.9584 83.2824
.000083941
16 16 9935 34.7091 34.7088 94.1438 76.0392
.000324500
17 17 . 26.0184 . 80.0373 . .
18 18 802 34.0614 34.0614 92.9665 74.9618
.000032164
19 19 . 26.9170 . 95.5842 . .
20 20 3821 28.0548 28.0544 81.8329 75.2944
.000439016
data small;
do id=1 to 20;
age=ranuni(298298)*10+25;
height=ranuni(329821)*5+55;
weight=ranuni(9084093)*20+80;
output;
end;
run;
data large;
do id=1 to 10000;
age=ranuni(898298)*10+28;
height=ranuni(529821)*5+50;
weight=ranuni(5084093)*20+70;
output;
end;
run;
PROC SQL;
CREATE TABLE joinAB AS
SELECT DISTINCT small.ID as IDA,large.ID as IDB,small.AGE as AGEA,
large.AGE as AGEB,small.WEIGHT as WEIGHTA,large.WEIGHT AS WEIGHTB,
ABS(small.AGE-large.AGE) AS minDIFF
from small left join large
on round((small.age),0.1)=round((large.age),0.1)
GROUP BY small.ID
HAVING minDIFF=MIN(minDIFF);
QUIT;
proc print data=joinAB;
title "Final joined data";
run;
--