LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2004, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 13 Jan 2004 11:59:29 -0500
Reply-To:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:   Re: Select distinct / variable length bug? (proc sql)
Comments:   To: "ben.powell@CLA.CO.UK" <ben.powell@CLA.CO.UK>
Content-Type:   text/plain

Ben:

I suspect that something more drastic is happening in your program. The longer values are being truncated.

The old 'first value instantiates variable length' rule applies to SET a b c UNION's of datasets; for example, data a; txt='aa'; run; data b; txt='bbbb'; run; data u; set a b; run;

With the longer values truncated in the UNION of the datasets, the formerly distinct values would match. That's the simplest explanation and one that you might want to check first.

Sig

-----Original Message----- From: ben.powell@CLA.CO.UK [mailto:ben.powell@CLA.CO.UK] Sent: Tuesday, January 13, 2004 11:37 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Select distinct / variable length bug? (proc sql)

Dear SAS-L / SQL-Heads,

I wonder if this is standard sql or a bug, or is dealt with by sas in the data step similarly and documented somewhere. Its just taken me a while to debug :(

Setting four datasets where I have previously established they each have all distinct variables I recheck that there is no overlap and find an impossibly large overlap, where there should be none. I narrow it down to one set and eventually determine the variable in question has a different length to the other three datasets (a glitch). So, what I guess is happening is the short variable is being searched for and found within the longer variable. It was my understanding that this should not happen with a standard *distinct* expression, and instead it sounds more like *like*. Was I wrong in this assumption?

Queries below:

/*ESTABLISH UNIQUE X COUNT.*/ proc sql noprint; select count(distinct x) into :count from Oct03j; %put NOTE: Distinct count = &count; quit;

... from Oct03b; ... from Nov03j; .... from Nov03b... etc

/*SET DATA.*/ data _cpd_master (keep = x); set Oct03j Oct03b Nov03j Nov03b; run;

/*CHECK FOR OVERLAP.*/ proc sql noprint; select count(distinct x) into :count from _cpd_master; %put NOTE: Distinct count = &count; quit;

/*END SAS CODE.*/

By applying this to the dataset in question:

data Oct03j; length x $10.;/*INSTEAD OF $8.*/ set Oct03j; run;

... it worked and returned the expected count of uniques. This strikes me as a very strange functionality?!

Any comments much appreciated.


Back to: Top of message | Previous page | Main SAS-L page