LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2004, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 18 Feb 2004 13:44:47 -0800
Reply-To:   "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject:   Re: sorting data on mainframe
Comments:   To: PD <sophe88@YAHOO.COM>
Content-Type:   text/plain; charset="iso-8859-1"

PD -

SAS is 99% environment independent, INFORMAT $char20. and proc sort nodupkeys should do the trick on MVS just like on a PC. $Char20. will read anything except perhaps special end-of-file markers and cr/lf's. Even that can be circumvented by reading the data as a stream.

Why don't you keep the offending duplicate records and look at them? (untested)

Data datax; Input.....;

Proc sort; By key;

data _null_; set datax; by key;

if first.key and last.key then do; file 'mvs.file.singles'; put key; end;

else do; *else if not first.key or not last.key then do; file 'mvs.file.dups'; put key; end;

And then look at them with ISPF?

When you first read the data in you could create a line counter and output it also, allowing you to compare back to the source.

Or simply read the file in and out to a new file and then use ISPF's compare utility (=3.12) to see if SAS reads it correctly?

Your log will tell you how many records are read and if SAS went to a new line, etc. You may want to post it.

Good luck

Paul Choate DDS Data Extraction (916) 654-2160

-----Original Message----- From: PD [mailto:sophe88@YAHOO.COM] Sent: Wednesday, February 18, 2004 1:06 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: sorting data on mainframe

Thanks for all your reply.

1. I did use $char20. to read it in. 2. It is a flat, FB text file generated on mainframe and intended for use on the mainframe. 3. The var in question is NOT packed Decimal of any kind.

Below is the 'messy' records without HEX turned on

00601.-.¦-¤á.²[!Ah§k 00601.-.¦-¤á.²¥;5r_Ø 00601.-.[...:ÐÆ<"þÁè 00602.-. -ú©À [d¼i.+ 00602.-.¥.z\|K[qd¤%ù 00602.-.ó-úE'+¯þ¥½>* 00602.-.--H .-·¬Ì¥uª 00602.-.÷..º2És;òþË( 00602.-.9-­.©¾s;òº[¤

The first 5 bytes are zip code. No problem there. After HEX is turned on, the data look like this (for the first two records)

00601.-.¦-¤á.²[!Ah§k FFFFF06166941EB5C8B9 0060100FA0F5FAAA1852 ------------------- 00601.-.¦-¤á.²¥;5r_Ø FFFFF06166941EB5F968 0060100FA0F5FA2E59D0

My concern is this, and only this so far:

If I use INFORMAT $char20. to read it in, and it is not correct, then this may have contributed to the fact I lose 2/3 of them when using "proc sort nodupkeys". That is why I am NOT ready to tell my business people that their notion of 1% dup is wrong. In other words, I don't yet have data evidence to support my allegation that they are wrong.

It could be I should have used another INFORMAT, not $char20.. OR I should have plugged in something at Proc sort (especially if $char20. is the right informat), to accomendate the fact that my host is OS390, a system that is not ASCII like Windows; the sort table or order embedded in the proc sort process may be different than if the SAS program is being used on Windows.

Thanks again for your input on this.

PD

ghellrieg@T-ONLINE.DE (Gerhard Hellriegel) wrote in message news:<200402180853.i1I8rn919598@listserv.cc.uga.edu>... > On Tue, 17 Feb 2004 09:24:27 -0800, Choate, Paul@DDS <pchoate@DDS.CA.GOV> wrote: > > >Hi PD - > > > >Why not post a little of the data (with hex=on) for us to look at? > > > >What is supposed to be in the file? You said character data, but is it one > >long string like a comment or address, or are there separate variables such > >as dates, dollar amounts, id's etc? > > > >What is the source of the data, a PC file? If the file's source was other > >than MVS, then how did it get on the mainframe (FTP, ind$file, proc upload)? > > > >What are the Data Set Information values of the dataset (ISPF 3.2)? > > > >I'd guess that maybe it's an ASCII file that wasn't properly uploaded, but > >I'd have to see it first. ASCII and EBCDIC are related by a translation > >table, depending on the file type you would move the file with a binary or > >text transfer, if the wrong transfer was used then the data might look > >garbled as you describe. > > > >hth > > > >Paul Choate > >DDS Data Extraction > >(916) 654-2160 > > > >-----Original Message----- > >From: PD [mailto:sophe88@YAHOO.COM] > >Sent: Monday, February 16, 2004 10:16 AM > >To: SAS-L@LISTSERV.UGA.EDU > >Subject: sorting data on mainframe > > > >I have a data set that has a 20 byte long 'character' variable, on > >mainframe. The data set is a flat text file. > > > >When browsing the data on the mainframe, without turning on the HEX > >command at ISPF, the 20 bytes look like a mess. With HEX command > >turned on, the 20 bytes look like Ok, clean 20 bytes with numbers and > >characters, the way it is supposed to be. > > > >Now I need to read it into SAS for some processing. I need to sort it > >first. > > > >Question 1 is the informat: which informat should I use, $char20. or > >else? I tried $char20. and did a sorting, > > > >proc sort nodupkeys; by the_variable; run; > > > >Then I lost 2 thirds of the values / observations. Our business people > >told us there should be only about 1% duplicates. > > > >Question 2 is about proc sort: I read SAS documents about hosts using > >ASCII sort order vs. EBCDIC sorting order. I am not sure if this is > >relevant to my case here. Should I have added any options when sorting > >on the mainframe? > > > >Thanks. > > > >PD > > > How do you read it in SAS? EBCDIC and ASCII should not do anything, except > for changing the order. > > Try to read it like: > > data in; > length c $50; /* what is your LRECL?? */ > infile xxx; > input; > c=_infile_; > run; > > proc sort nodupkey; > by c; > run; > > What are the results now?


Back to: Top of message | Previous page | Main SAS-L page