| Date: | Thu, 15 Dec 2011 07:52:39 -0800 |
| Reply-To: | "Schwarz, Barry A" <barry.a.schwarz@BOEING.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | "Schwarz, Barry A" <barry.a.schwarz@BOEING.COM> |
| Subject: | Re: Ebcidic to Ascii with PDw.d conversion fix |
| In-Reply-To: | <CAPejfBVcaLfz0Hvctd_7hDyFcmxVggjVNTW6kGB5ExNqQrxqbw@mail.gmail.com> |
| Content-Type: | text/plain; charset="us-ascii" |
When the mainframe stores data in PD format (*packed* decimal), there are always an odd number of digits and a sign, each occupying a nibble (half byte). So 1234 would be stored in three bytes as 01x 23x 4cx. (0fx can also be used for positive numbers and 0dx is the normal sign for negative but the other values can also be used.) The problem you are running into is that ftp translated this thinking it was character data.
Since none of these bytes usually represent valid EBCDIC characters (4cx happens to be '<' but that is one of the exceptions), you really don't know what they were translated to. If you can determine this (by inspection or querying the originator), you have a shot at constructing a translation table to reverse it.
You also need to confirm that each of the possible 110+ bytes (00x-09x, 0cx, 10x-19x, 1cx, ...)is translated to a unique value. If not (such as 01x, 02x, and 03x all being translated to ffx), you cannot reconstruct the numeric fields with any expectation of accuracy.
Mainframe data is stored in records. There is no character in the data to indicate the end of a record but applications, including ftp, can determine the length of a record. When ftp transfers the data in non-binary mode, the receiving system adds the appropriate end of record character(s) to each record (0ax for Unix and 0a0dx for Windows). If you transfer the data back to the mainframe (also in non-binary mode), the records on the mainframe should be rebuilt properly except for the translation problem noted above.
Just to complicate matters, -10 is stored as 01x 0dx and that last byte can have special meaning.
The bottom line is any transfer involving non-text should be done in binary.
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Byron
> Kirby
> Sent: Wednesday, December 14, 2011 4:33 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Ebcidic to Ascii with PDw.d conversion fix
>
> G'evening!
> I've been struggling with this for a while now and have hit a wall.
>
> I received a file via FTP that was generated on mainframe but
> unfortunately it wasn't transferred in binary and it was variable
> block with Pad Decimal numeric data fields (we are a recipient and
> cant regenerate the file, though its been requested). After
> searching SAS-L archives I had a couple ideas on how to be able to
> process the file and have most of it worked out, but still a problem
> remains. Part of the problem is related to embedded Line feeds, but
> there may be other issues.
>
> The current approach:
> determined all character found in the file, used one of the characters
> not found {byte(254)} as a place holder and rejoined multi-line
> segments; then read the file as ASCII, reading in all PDw.d as
> $CHARw.d { later to be converted to numeric by: _MyNumVar_=
> input(put(translate(_myCharVar_,byte(254),byte(10)),$ebcidicW),
> s370fpdW.d); }
>
> And this approach to reading the data appears to be working, BUT about
> 3% of the lines seems to be missing a character or two (in seemingly
> random positions) and a few have too many!
>
>
> From a pattern prospective, its pretty easy to know where the end of a
> line should be since all records end with "EOC" or "EOR" and the new
> line begeins with a string of numbers.
>
> using that another failed attempt was to insert a carriage return
> between the EOC and the '0A'x, then try and run it on a windows
> machine but SAS still broke the data at each Linefeed, which i thought
> was odd.
>
> I started to FTP it to our to Mainframe to see if I could reblock it,
> but since there are embedded LF don;t think that will work either but
> could be wrong.
|