LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2004, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 16 Feb 2004 16:05:42 -0500
Reply-To:   "Droogendyk, Harry" <Harry.Droogendyk@CIBC.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Droogendyk, Harry" <Harry.Droogendyk@CIBC.COM>
Subject:   Re: sorting data on mainframe
Content-Type:   text/plain; charset="iso-8859-1"

Lou:

Your point is well taken.

However, the poster was reading with $char20., therefore the leading spaces would be preserved. Yes, if he'd read with $20. and inadvertently dropped the leading spaces, that could have been the cause of the excessive dups being dropped. I also understood the data to be of binary / packed format, from the data "look like a mess" statement.

-----Original Message----- From: Lou [mailto:lpogodajr292185@COMCAST.NET] Sent: Monday, February 16, 2004 3:54 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: sorting data on mainframe

"Droogendyk, Harry" <Harry.Droogendyk@CIBC.COM> wrote in message

news:F0161D3F7AC5D411A5BE009027E774D60E61D02E@gemmrd-scc013eu.gem.cibc.com.. . > If you want to verify the duplicates, i.e. to go back to the users and prove > that it ain't 1%, use something like the following to keep the dups. It may > be that they meant that 1% of the sort keys were duplicated. However, each > duplicate key has many duplicates. > > data dedupped > dups; > set a; > by i; > if first.i then > output dedupped; > else > output dups; > run; > > I wouldn't think the informat matters and based on the test below, it > doesn't.

The INFORMAT most definitely could matter. When the original poster is reading in a fast file, reading a 20 byte character variable with a $20. informat will drop any leading spaces in the value, while reading the variable with a $char20. informat will preserve anyleading spaces as part of the value.

You need to view this with a monospace font to be sure of seeing it correctly, but if we have five bytes with the values of space/space/a/b/c reading those bytes with a $5. informat will result in a value of "abc " while reading them in with a $char5.informat will result in a value of " abc".

And of course, these two values sort differently.

> 1 > 2 data a; > 3 informat fld $20.; > 4 do fld = '01'x, '3F'x, '9E'x, 'ff'x; > 5 output; > 6 output; > 7 end; > 8 run; > > NOTE: The data set WORK.A has 8 observations and 1 variables. > NOTE: The DATA statement used 0.01 CPU seconds. > > 9 > 10 proc sort data=a nodupkey; > 11 by fld; > 12 run; > > NOTE: 4 observations with duplicate key values were deleted. > NOTE: There were 8 observations read from the data set WORK.A. > NOTE: The data set WORK.A has 4 observations and 1 variables. > NOTE: The PROCEDURE SORT used 0.00 CPU seconds. > > > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of PD > Sent: Monday, February 16, 2004 1:16 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: sorting data on mainframe > > I have a data set that has a 20 byte long 'character' > variable, on > mainframe. The data set is a flat text file. > > When browsing the data on the mainframe, without turning on > the HEX > command at ISPF, the 20 bytes look like a mess. With HEX > command > turned on, the 20 bytes look like Ok, clean 20 bytes with > numbers and > characters, the way it is supposed to be. > > Now I need to read it into SAS for some processing. I need > to sort it > first. > > Question 1 is the informat: which informat should I use, > $char20. or > else? I tried $char20. and did a sorting, > > proc sort nodupkeys; by the_variable; run; > > Then I lost 2 thirds of the values / observations. Our > business people > told us there should be only about 1% duplicates. > > Question 2 is about proc sort: I read SAS documents about > hosts using > ASCII sort order vs. EBCDIC sorting order. I am not sure if > this is > relevant to my case here. Should I have added any options > when sorting > on the mainframe? > > Thanks. > > PD


Back to: Top of message | Previous page | Main SAS-L page