LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2010, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 5 Apr 2010 10:57:27 -0700
Reply-To:     "Terjeson, Mark" <Mterjeson@RUSSELL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Terjeson, Mark" <Mterjeson@RUSSELL.COM>
Subject:      Re: Fixed Width file
Comments: To: tw2@MAIL.COM
In-Reply-To:  A<8CCA2F707904A07-100C-11C5@web-mmc-m07.sysops.aol.com>
Content-Type: text/plain; charset="us-ascii"

Hi Tom,

Basically, the record length (line length) and the human-applied fields within are all fixed length. For example, if we take the infamous sashelp.class sample dataset and export it out as csv, we have a comma separated text file that looks like this:

Name,Sex,Age,Height,Weight Alfred,M,14,69,112.5 Alice,F,13,56.5,84 Barbara,F,13,65.3,98 Carol,F,14,62.8,102.5 Henry,M,14,63.5,102.5 James,M,12,57.3,83 Jane,F,12,59.8,84.5 Janet,F,15,62.5,112.5 Jeffrey,M,13,62.5,84 John,M,12,59,99.5 Joyce,F,11,51.3,50.5 Judy,F,14,64.3,90 Louise,F,12,56.3,77 Mary,F,15,66.5,112 Philip,M,16,72,150 Robert,M,12,64.8,128 Ronald,M,15,67,133 Thomas,M,11,57.5,85 William,M,15,66.5,112

Obviously, if we remove the comma delimiters the fields within the records would be a bit tough to locate and parse out.

NameSexAgeHeightWeight AlfredM1469112.5 AliceF1356.584 BarbaraF1365.398 CarolF1462.8102.5 HenryM1463.5102.5 JamesM1257.383 JaneF1259.884.5 JanetF1562.5112.5 JeffreyM1362.584 JohnM125999.5 JoyceF1151.350.5 JudyF1464.390 LouiseF1256.377 MaryF1566.5112 PhilipM1672150 RobertM1264.8128 RonaldM1567133 ThomasM1157.585 WilliamM1566.5112

Back in the day when speed and storage were both very costly, you could easily decide you size of the file and pick which served you more beneficially. i.e. taking the time to locate and parse out the delimiters took a lot of horsepower vs. virtually no extra overhead to have fixed length fields and merely grab the known positions and lengths.

e.g.

Name Sex Age Height Weight Alfred M 14 69 112.5 Alice F 13 56.5 84 Barbara F 13 65.3 98 Carol F 14 62.8 102.5 Henry M 14 63.5 102.5 James M 12 57.3 83 Jane F 12 59.8 84.5 Janet F 15 62.5 112.5 Jeffrey M 13 62.5 84 John M 12 59 99.5 Joyce F 11 51.3 50.5 Judy F 14 64.3 90 Louise F 12 56.3 77 Mary F 15 66.5 112 Philip M 16 72 150 Robert M 12 64.8 128 Ronald M 15 67 133 Thomas M 11 57.5 85 William M 15 66.5 112

You can recall that the import wizard in Excel provides these two main choices of parsing the incoming text file with fixed or delimited. You can see that each of the fields are a fixed width and are padded out with spaces. Including the last field which is also padded out which means each line (record) will be a fixed total length as well.

Nowadays with storage to cheap and CPU speed so cheap and readily available the choices now are more along the lines of what is easiest or compatible with different language and application products that we use.

The above comments are mostly discussing the characteristics of incoming data files or the reading thereof. Output to various display or reporting needs will many times require us to turn a record into some variation of a fixed length text string in order to have the fields displayed land in vertically uniform columns regardless of the varying individual field values and thus we have created fixed length records for each text string even though not sent to a resultant file.

Thus FIXED WIDTH or FIXED LENGTH meaning the same thing. As far as ASCII FILE, people commonly say "ASCII file" to mean a text file. Most prevalently, the PC, DOS, or Windows type computers are found to primarily use the ASCII standard character set. http://www.arachnoid.com/javascript/ascii.html http://www.lookuptables.com/ There are times you might be involved with files to or from mainframes which are based on the EBCDIC character set. http://www.natural-innovations.com/computing/asciiebcdic.html

Of course any file is just made up of bytes. And a byte-is-a-byte-is-a-byte-is-a-byte-is-a-byte. It is just the meanings and complexity of purpose we humans put upon these bytes or groups of bytes in a file that we start applying different names to these file merely because we are grouping bytes with a common set of rules upon the file's content.

So for ASCII text files or EBCDIC text files, this just means that for ASCII when ever you see a byte value of 65 that we humans want to consistently represent that byte value 65 as an uppercase-A. Since a byte is 8 bits this means that in base-2 we can accomodate byte values of 0-255. http://www.pnwsug.org/sites/test.pnwsug.org/files/proceedings/PN20MarkTe rjesonBinary.pdf In ASCII, characters 0-9 are byte values 48-57, uppercase A-Z are 65-90, lowercase a-z are 97-122, control-characters ^A-^Z are 1-31, null is 0, space is 32, and various punctuation characters fill in between 33-47, 58-64, 91-96, and 123-126, with the DELete/rubout key as 127. These are the lower half of the 0-255 which make up the 0-127 or non-highbit portion of the character set. In base-2(binary) when the uppermost bit7 is a 0 you get the decimal values 0-127. Otherwise known as 7bit-ASCII. http://www.neurophys.wisc.edu/comp/docs/ascii/ The lower half of the ASCII character set rarely has change the meanings of each byte values for several decades now. In base-2(binary) when the uppermost bit7 is a 1 you get the decimal values 128-255. These have been used for several purposes over the years and different meanings have been applied, but much of the time these highbit bytes where bit7 is 1 the values 128-255 have been used for graphic types of characters. The 8-bit ASCII does vary meanings and usages for these values occasionally.

The EBCDIC found more with mainframe operating systems also has some roots back into earlier days of teletype and other 4 and 8 bit processor devices. For example, you will find the alphabet chopped up into segments separated into the 1-9 portion of the hex byte values. e.g. A-I is C1-C9, and then a break with J-R is D1-D9, etc., and opposite of ASCII the lowercase letters in EBCDIC are lower values than the uppercase letters, with number-characters 0-9 as F0-F9. http://www.natural-innovations.com/computing/asciiebcdic.html

This did make letters and numbers easier to see when dumping the guts of EBCDIC files in their packed byte value formats. But some of the reason was for ease of circuit design for discrete wiring of more primitive devices pre-Integrated Circuit chips.

There is a myriad of reasons and uses that different folks have come up with. I have just mentioned a very few.

By far, I would guess that it would safe to say that the ASCII 7bit character set is most commonly seen in computers, hand-held devices, and current day radio data transmissions.

Hope this is helpful.

Mark Terjeson Investment Business Intelligence Investment Management & Research Russell Investments 253-439-2367

Russell Global Leaders in Multi-Manager Investing

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Tom White Sent: Monday, April 05, 2010 9:49 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Fixed Width file

What is it usually meant by the wording FIXED WIDTH ASCII file? Thanks. Tom


Back to: Top of message | Previous page | Main SAS-L page